AI · 1 module

Building with LLMs: APIs, Tokens & Cost

The mechanics and economics of calling models from your own code. Learn how LLM APIs are billed, what counts as tokens, latency, streaming and rate limits — and remember it with spaced repetition.

practice cards
18
practice cards
per day
~10 min
per day
level
Intermediate
level
modules
1
modules
About this topic

Calling models from code

Using an LLM in an app is different from chatting with one. You call an API, you pay per token, and both your prompt and the model's response count toward the bill. Understanding that pricing model is the difference between a feature that scales and one that quietly burns money.

This track focuses on the foundations every LLM app rests on: how API usage is billed, what counts toward token cost, what latency and streaming mean for user experience, and how rate limits and truncation shape what your code has to handle.

It uses spaced repetition so these essentials stick. It pairs naturally with AI & LLM Fundamentals (how models work) and Prompt Engineering (getting good output).

What you'll learn

1 module, seed to bloom

Each module is a set of practice cards — 18 in total. Answer, review, and watch your knowledge grow from seed to full bloom.

APIs, Tokens & Cost

The builder's view of using a model — token billing, latency, streaming, rate limits, and cost control

18 cards
Try before you plant

Sample questions

A taste of the real cards. Pick an answer, then reveal the explanation.

Sample · Building with LLMs: APIs, Tokens & Cost

How is LLM API usage usually billed?

  • APer token — you pay for the tokens in your input and in the output combined
  • BPer request — every call costs a flat fee no matter how long the call is
  • CPer minute — you are billed for the total time the model spends running it
  • DPer user — a fixed monthly charge covers unlimited calls for each account
Sample · Building with LLMs: APIs, Tokens & Cost

Which parts of a request count toward token cost?

  • ABoth input and output — your prompt and the model's reply are each counted
  • BOnly the input — just your prompt is billed, and the reply is always free
  • COnly the output — the model's reply is billed, and your prompt is the free part
  • DNeither directly — billing is by the request count, not by any token totals
Sample · Building with LLMs: APIs, Tokens & Cost

What does "streaming" a response mean?

  • ASending tokens as they are generated — output appears piece by piece live
  • BSending the reply as a video — the answer is rendered as a short clip
  • CSending many replies at once — several full answers arrive in parallel
  • DSending the reply to storage — the output is saved to a file, not shown
Sample · Building with LLMs: APIs, Tokens & Cost

What is a "rate limit" on an API?

  • AA cap on calls per time window — how many requests you may send per minute
  • BA cap on the reply length — the most tokens a single answer may contain
  • CA cap on the model size — the largest model your account is allowed to use
  • DA cap on accuracy spent — the share of correct answers granted to you per day
How Gnoseed works

Learn it once, keep it for good

1

Answer a question

Each card is one practical concept with multiple options. Pick what you think is right.

2

Get the full answer

See the correct option plus a clear explanation, and a link to deeper docs when one is available.

3

Review at the right time

A spaced-repetition engine (SM-2 or FSRS) resurfaces each card just before you would forget it.

Why learn this

Why these foundations matter

Control your costs

Knowing exactly what you pay for — every token in and out — is how you keep an LLM feature affordable at scale.

Design for latency

Understanding latency and streaming lets you build responses that feel fast instead of frozen.

Handle the real world

Rate limits and truncated responses are not edge cases — they are everyday behaviour your code must expect.

FAQ

Common questions

Do I need to be a developer? +

This track is aimed at people building on top of model APIs, so some coding background helps — but the concepts (tokens, billing, latency) are explained plainly.

How long does it take? +

About 10 minutes a day. Spaced repetition means short, frequent sessions beat long cramming, so the essentials stick.

Is it free? +

Yes, completely free. No registration or credit card is required, and all your progress is stored locally in your browser.

Ready to build on LLMs?

Plant your first seed today. Ten minutes a day is all it takes to grow the foundations real LLM apps rest on.

Start learning free