ManimBench

Suite: V0.6 public suite
Models: 0
Updated: Not yet
Top capability: n/a

V0.6 validity release
cleaner model ranking

Capability score
Pass rate
Coverage rate
Render success
Failure buckets
Schema 0.6.0

Leaderboard

#	Model	Score	Pass	Coverage	Render	Failures	Cost	Time	Review

V0.6 capability score

Higher is better. Rank score is separate from pass rate, coverage, and render success.

V0.6 is active

V0.6 changes ranking validity: source-term hints are advisory, while coverage, render success, pass rate, and failure buckets are published separately.

Ranking. Capability score ranks models. Pass rate, coverage, render success, and failure buckets explain the evidence behind it.
Validity. Missing source and render crashes remain serious failures. Exact source-term misses no longer fail otherwise valid rendered animations by themselves.
Publishing. Official V0.6 rankings require a fresh full-suite rerun.

Read the V0.6 operator docs Read the V0.6 release note

Run V0.6 locally

Use this order: install the engine, generate source files, render and score those exact files, then build the report from the same run id.

Install the engine and local render extras. Run this once from the repository root. python -m pip install -e ".[dev,render]"
Generate outputs for API models. This writes one Python file per task under outputs/<model>/. Add --smoke only for a one-task setup check. manimbench generate-batch --models gpt-5-5,opus-4-8 --provider auto --output-dir outputs --parallel 2
Generate Composer output if you include Composer. Composer runs through Cursor Agent CLI, not OpenRouter. manimbench generate --model composer-2-5 --provider cursor --output-dir outputs
Render and score the generated folders. Include one --model-output entry for each model you generated. Remove the Composer line if you are not testing it. manimbench run-file-matrix \ --model-output gpt-5-5=outputs/gpt-5-5 \ --model-output opus-4-8=outputs/opus-4-8 \ --model-output composer-2-5=outputs/composer-2-5 \ --sandbox container \ --parallel 4 \ --run-id v06-public
Build the report from that run. Review score, pass rate, coverage, render success, and failure buckets before publishing. manimbench report --run-dir runs/v06-public

Output contract: every generated file must define one ManimCE MainScene. V0.6 expects six files: coordinate_system_animation.py, derivative_motion_story.py, matrix_transformation_grid.py, geometric_area_proof.py, probability_distribution_simulation.py, and fourier_series_decomposition.py.