AI Fitness Plan API

Overview

A production API that accepts a user profile (age, goal, available equipment, weekly schedule) and returns a structured, personalised training plan. Responses stream token-by-token so users see output within seconds rather than waiting for the full plan to generate.

Architecture

The generation pipeline is built with LangGraph, breaking the process into three nodes: input validation and enrichment, plan generation with GPT-4o, and output format enforcement. Each node has a clear contract, making the pipeline easy to debug and extend.

FastAPI handles the HTTP layer, with StreamingResponse and server-sent events delivering the streamed output to clients.

Cost control

Running LLM inference at scale requires careful cost management. Two mechanisms keep costs predictable:

Response caching via Redis — similar profiles return cached plans (1-hour TTL), reducing API calls by ~30% in steady state.
Token budgeting — the system prompt enforces an output length ceiling. A 7-day plan doesn’t need 4,000 tokens.

Deployment

Deployed on Railway with environment-based configuration. Rate limiting (10 requests/user/hour) is enforced at the API layer using a sliding-window algorithm backed by Redis.