AI · Flashcard

What does time to first token (TTFT) measure?

  • AThe latency from sending a request until the first output token appears
  • BThe total time taken to generate every token in a full response
  • CThe time the model spends loading its weights into GPU memory
  • DThe average number of tokens the model produces every second

Why this is the answer

TTFT captures how responsive a stream feels — it covers queuing and prefill before generation starts, and is separate from total or per-token speed.

Official docs
Study in Gnoseed →