DevOps · 5 modules

Kubernetes Ops: Reliability & Security

Keep Kubernetes workloads healthy and secure. Master graceful shutdowns, health probes, RBAC, secrets, observability and recovery — what keeps services up and safe — with spaced repetition.

Plant your first seed See sample questions

practice cards: 105; practice cards
per day: ~10 min; per day
level: Intermediate → Advanced; level
modules: 5; modules

About this topic

Keeping it running and safe

Deploying a workload is the easy part; keeping it healthy and secure is the job. Graceful shutdowns and rollouts avoid dropped requests on every deploy. Health probes tell Kubernetes when a Pod is ready and when to restart it. RBAC and security contexts limit blast radius, secrets keep credentials out of plain sight, and observability is how you find the problem before users do.

This track covers that reliability-and-security core: graceful shutdown and rollouts, liveness/readiness/startup probes, security context and RBAC, secrets and workload identity, and observability and recovery.

Spaced repetition keeps these practices fresh, because the cost of forgetting one shows up as downtime or a breach.

What you'll learn

5 modules, seed to bloom

Each module is a set of practice cards — 105 in total. Answer, review, and watch your knowledge grow from seed to full bloom.

Graceful Shutdown & Rollouts

SIGTERM lifecycle, rolling update mechanics, ConfigMap propagation, rollback strategy

23 cards

Health Probes

Readiness vs liveness vs startup, probe mechanisms, parameter tuning, antipatterns

20 cards

Security Context & RBAC

Pod security contexts, Linux capabilities, seccomp profiles, Pod Security Standards, RBAC roles and bindings, ServiceAccount hardening

22 cards

Secrets & Workload Identity

Secret storage and encoding, env vars vs volume mounts, immutable Secrets, etcd encryption, ESO vs CSI Driver, projected tokens, cloud workload identity

20 cards

Observability & Recovery

USE/RED metrics, Events TTL, crash debugging, OOMKilled vs throttling, ephemeral containers, probe failures, rollback strategy, PodDisruptionBudgets, audit logging

20 cards

Try before you plant

Sample questions

A taste of the real cards. Pick an answer, then reveal the explanation.

Sample · Kubernetes Ops: Reliability & Security

What signal does Kubernetes send to a container's main process when a Pod is being terminated?

ASIGTERM — the standard termination signal sent to PID 1 inside the container, giving the process a chance to shut down gracefully
BSIGKILL — the immediate termination signal sent to PID 1 inside the container, stopping the process without any cleanup window
CSIGINT — the interrupt signal sent to PID 1 inside the container, equivalent to a Ctrl+C initiated by the kubelet process
DSIGHUP — the hangup signal sent to PID 1 inside the container, indicating the controlling terminal session has ended

Sample · Kubernetes Ops: Reliability & Security

What decision does a readiness probe help Kubernetes make about a Pod?

AWhether the Pod should receive traffic right now — a failing readiness probe removes the Pod from Service endpoints without restarting it
BWhether the Pod is stuck and should be restarted — a failing readiness probe causes the kubelet to kill and recreate the container
CWhether the Pod has finished initializing — a failing readiness probe prevents liveness checks from running until startup completes
DWhether the Pod should be evicted from the node — a failing readiness probe triggers the scheduler to move the Pod to a healthier node

Sample · Kubernetes Ops: Reliability & Security

What does the runAsNonRoot field in a security context do?

AIt validates at Pod startup that the container will not run as UID 0 — if the image user is root, the container fails to start
BIt automatically switches the container process to UID 1000 at startup — overriding whatever user the image specifies
CIt removes root capabilities from the container process after startup — the process starts as root then drops to non-root
DIt enables a user namespace so the container appears as root inside but maps to non-root on the host system

Sample · Kubernetes Ops: Reliability & Security

What does the USE method measure for infrastructure monitoring?

AUtilization, Saturation, and Errors — three signals that cover how busy a resource is, whether it has queued work, and whether it is failing
BUptime, Scalability, and Efficiency — three metrics that track system availability, growth capacity, and resource optimization overhead
CUsage, Stability, and Exceptions — three indicators that measure resource consumption, system reliability, and failure occurrences overall
DUtilization, Speed, and Errors — three dimensions that cover resource load, processing latency, and failure rates across all components

How Gnoseed works

Learn it once, keep it for good

Answer a question

Each card is one practical concept with multiple options. Pick what you think is right.

Get the full answer

See the correct option plus a clear explanation, and a link to deeper docs when one is available.

Review at the right time

A spaced-repetition engine (SM-2 or FSRS) resurfaces each card just before you would forget it.

Why learn this

Why reliability and security matter

Zero-downtime deploys

Graceful shutdown and the right probes mean users never see a rollout happen.

Limit the blast radius

RBAC, security contexts and secrets management contain what a mistake or breach can touch.

See problems first

Observability and recovery practices catch issues before they become incidents.

Sleep better on call

Workloads that shut down cleanly and report their health honestly page you far less.

FAQ

Common questions

Do I need the basics first? +

Yes — this is operational depth. If Pods, Deployments and Services are new, start with the Kubernetes Fundamentals track.

How long does it take? +

About 10 minutes a day. Spaced repetition means short, frequent sessions beat long cramming, so the practices stick.

Is it free? +

Yes, completely free. No registration or credit card is required, and all your progress is stored locally in your browser.

Ready to run Kubernetes reliably?

Plant your first seed today. Ten minutes a day is all it takes to grow real reliability and security skills.

Start learning free

Kubernetes Ops: Reliability & Security

Keeping it running and safe

5 modules, seed to bloom

Graceful Shutdown & Rollouts

Health Probes

Security Context & RBAC

Secrets & Workload Identity

Observability & Recovery

Sample questions

Learn it once, keep it for good

Answer a question

Get the full answer

Review at the right time

Why reliability and security matter

Zero-downtime deploys

Limit the blast radius

See problems first

Sleep better on call

Common questions

Related topics

Ready to run Kubernetes reliably?