ManimBench V0.6 benchmark release Open benchmark for ManimCE animation quality
Suite
V0.6 public suite
Models
0
Updated
Not yet
Top capability
n/a

V0.6 validity release
cleaner model ranking

  • Capability score
  • Pass rate
  • Coverage rate
  • Render success
  • Failure buckets
  • Schema 0.6.0
V0.6 public suite0Not yet

Leaderboard

#ModelScorePassCoverageRenderFailuresCostTimeReview

V0.6 capability score

Higher is better. Rank score is separate from pass rate, coverage, and render success.

V0.6 is active

V0.6 changes ranking validity: source-term hints are advisory, while coverage, render success, pass rate, and failure buckets are published separately.

  • Ranking. Capability score ranks models. Pass rate, coverage, render success, and failure buckets explain the evidence behind it.
  • Validity. Missing source and render crashes remain serious failures. Exact source-term misses no longer fail otherwise valid rendered animations by themselves.
  • Publishing. Official V0.6 rankings require a fresh full-suite rerun.

Run V0.6 locally

Use this order: install the engine, generate source files, render and score those exact files, then build the report from the same run id.

  1. Install the engine and local render extras. Run this once from the repository root. python -m pip install -e ".[dev,render]"
  2. Generate outputs for API models. This writes one Python file per task under outputs/<model>/. Add --smoke only for a one-task setup check. manimbench generate-batch --models gpt-5-5,opus-4-8 --provider auto --output-dir outputs --parallel 2
  3. Generate Composer output if you include Composer. Composer runs through Cursor Agent CLI, not OpenRouter. manimbench generate --model composer-2-5 --provider cursor --output-dir outputs
  4. Render and score the generated folders. Include one --model-output entry for each model you generated. Remove the Composer line if you are not testing it. manimbench run-file-matrix \ --model-output gpt-5-5=outputs/gpt-5-5 \ --model-output opus-4-8=outputs/opus-4-8 \ --model-output composer-2-5=outputs/composer-2-5 \ --sandbox container \ --parallel 4 \ --run-id v06-public
  5. Build the report from that run. Review score, pass rate, coverage, render success, and failure buckets before publishing. manimbench report --run-dir runs/v06-public

Output contract: every generated file must define one ManimCE MainScene. V0.6 expects six files: coordinate_system_animation.py, derivative_motion_story.py, matrix_transformation_grid.py, geometric_area_proof.py, probability_distribution_simulation.py, and fourier_series_decomposition.py.