ManimBench V0.6 benchmark release Open benchmark for ManimCE animation quality

Blog

Notes on how ManimBench works, what we measure, and how the benchmark is built.

June 2026 · 7 min read V0.6 validity release Capability score, pass rate, coverage, render success, and failure buckets are now separate leaderboard evidence.
June 2026 · 8 min read V0.5 engine release API generation, checkpointed batches, Composer through Cursor, immutable manifests, and draft/live publishing.
May 2026 · 8 min read Introducing ManimBench Can coding models write ManimCE animations that explain math clearly? Fixed prompts, sandboxed renders, and public rankings.
May 2026 · 7 min read Archived V0.4 public suite The previous six-task suite, kept for historical comparison now that the V0.6 suite is active.
May 2026 · 9 min read How scoring works Source checks, sandbox renders, visual sanity probes, and optional human review.
May 2026 · 7 min read Sandbox execution and visual sanity Why official runs use an isolated container, what goes in the manifest, and what frame sampling checks.
May 2026 · 6 min read Why the benchmark is file-backed Generation and scoring are separate. Same prompts, saved outputs, any tool or API.