AI · 1 module

Building with LLMs: APIs, Tokens & Cost

The mechanics and economics of calling models from your own code. Learn how LLM APIs are billed, what counts as tokens, latency, streaming and rate limits — and remember it with spaced repetition.

Plant your first seed See sample questions

practice cards: 18; practice cards
per day: ~10 min; per day
level: Intermediate; level
modules: 1; modules

About this topic

Calling models from code

Using an LLM in an app is different from chatting with one. You call an API, you pay per token, and both your prompt and the model's response count toward the bill. Understanding that pricing model is the difference between a feature that scales and one that quietly burns money.

This track focuses on the foundations every LLM app rests on: how API usage is billed, what counts toward token cost, what latency and streaming mean for user experience, and how rate limits and truncation shape what your code has to handle.

It uses spaced repetition so these essentials stick. It pairs naturally with AI & LLM Fundamentals (how models work) and Prompt Engineering (getting good output).

What you'll learn

1 module, seed to bloom

Each module is a set of practice cards — 18 in total. Answer, review, and watch your knowledge grow from seed to full bloom.

APIs, Tokens & Cost

The builder's view of using a model — token billing, latency, streaming, rate limits, and cost control

18 cards

Try before you plant

Sample questions

A taste of the real cards. Pick an answer, then reveal the explanation.

Sample · Building with LLMs: APIs, Tokens & Cost

How is LLM API usage usually billed?

APer token — you pay for the tokens in your input and in the output combined
BPer request — every call costs a flat fee no matter how long the call is
CPer minute — you are billed for the total time the model spends running it
DPer user — a fixed monthly charge covers unlimited calls for each account

Sample · Building with LLMs: APIs, Tokens & Cost

Which parts of a request count toward token cost?

ABoth input and output — your prompt and the model's reply are each counted
BOnly the input — just your prompt is billed, and the reply is always free
COnly the output — the model's reply is billed, and your prompt is the free part
DNeither directly — billing is by the request count, not by any token totals

Sample · Building with LLMs: APIs, Tokens & Cost

What does "streaming" a response mean?

ASending tokens as they are generated — output appears piece by piece live
BSending the reply as a video — the answer is rendered as a short clip
CSending many replies at once — several full answers arrive in parallel
DSending the reply to storage — the output is saved to a file, not shown

Sample · Building with LLMs: APIs, Tokens & Cost

What is a "rate limit" on an API?

AA cap on calls per time window — how many requests you may send per minute
BA cap on the reply length — the most tokens a single answer may contain
CA cap on the model size — the largest model your account is allowed to use
DA cap on accuracy spent — the share of correct answers granted to you per day

How Gnoseed works

Learn it once, keep it for good

Answer a question

Each card is one practical concept with multiple options. Pick what you think is right.

Get the full answer

See the correct option plus a clear explanation, and a link to deeper docs when one is available.

Review at the right time

A spaced-repetition engine (SM-2 or FSRS) resurfaces each card just before you would forget it.

Why learn this

Why these foundations matter

Control your costs

Knowing exactly what you pay for — every token in and out — is how you keep an LLM feature affordable at scale.

Design for latency

Understanding latency and streaming lets you build responses that feel fast instead of frozen.

Handle the real world

Rate limits and truncated responses are not edge cases — they are everyday behaviour your code must expect.

FAQ

Common questions

Do I need to be a developer? +

This track is aimed at people building on top of model APIs, so some coding background helps — but the concepts (tokens, billing, latency) are explained plainly.

How long does it take? +

About 10 minutes a day. Spaced repetition means short, frequent sessions beat long cramming, so the essentials stick.

Is it free? +

Yes, completely free. No registration or credit card is required, and all your progress is stored locally in your browser.

Ready to build on LLMs?

Plant your first seed today. Ten minutes a day is all it takes to grow the foundations real LLM apps rest on.

Start learning free

Building with LLMs: APIs, Tokens & Cost

Calling models from code

1 module, seed to bloom

APIs, Tokens & Cost

Sample questions

Learn it once, keep it for good

Answer a question

Get the full answer

Review at the right time

Why these foundations matter

Control your costs

Design for latency

Handle the real world

Common questions

Related topics

Ready to build on LLMs?