What is a "rate limit" on an API?

Question

Accepted Answer

A cap on calls per time window — how many requests you may send per minute — A rate limit caps how many requests (or tokens) you can send per time window. It's not a reply-length, model-size, or accuracy cap.

Answer

A cap on the reply length — the most tokens a single answer may contain

Answer

A cap on the model size — the largest model your account is allowed to use

Answer

A cap on accuracy spent — the share of correct answers granted to you per day

What is a "rate limit" on an API?

Why this is the answer

More Building with LLMs: APIs, Tokens & Cost flashcards