Install
Use the engine repository for all runs:
git clone https://github.com/NtrpyDev/manim-bench.git
cd manim-bench
python -m pip install -e ".[dev,render]"
Active suite
V0.5 is the default suite. V0.4 remains available by path for historical reproduction.
manimbench list-tasks
manimbench --suite benchmarks/v0.4/suite.yaml list-tasks
The V0.5 task IDs are:
coordinate_system_animationderivative_motion_storymatrix_transformation_gridgeometric_area_proofprobability_distribution_simulationfourier_series_decomposition
Generate
OpenRouter is the default API gateway for public models that OpenRouter publishes. The engine skips complete outputs unless --force is passed.
export OPENROUTER_API_KEY=...
manimbench generate-batch \
--models gpt-5-5,opus-4-8 \
--smoke \
--parallel 2
Composer 2.5 is different. OpenRouter does not publish a Composer slug, so the engine uses Cursor Agent CLI for actual Composer output.
cursor-agent login
manimbench generate \
--model composer-2-5 \
--provider cursor \
--smoke
manimbench generate \
--model composer-2-5 \
--provider cursor
Generation state is written to .manimbench/runs/<run_id>/state.json. API and CLI calls are appended to .manimbench/runs/<run_id>/generation.log.
Render and report
manimbench run-file-matrix \
--model-output composer-2-5=outputs/composer-2-5 \
--model-output gpt-5-5=outputs/gpt-5-5 \
--sandbox container \
--parallel 4 \
--run-id v05-public
manimbench report --run-dir runs/v05-public
Official publishable runs should use the container sandbox. The manifest records suite hashes, prompt hash, provider route, OpenRouter slugs where available, git commit, scoring version, and Docker image digest.
Publish
Publishing is atomic at the site-repository level: the bundle is validated, committed once, then pushed to the draft or live branch.
manimbench publish \
--run-dir runs/v05-public \
--target draft \
--site-repo ../manimbench-site
manimbench publish \
--run-dir runs/v05-public \
--target live \
--site-repo ../manimbench-site
Live publish requires a complete run unless --allow-partial is provided. Official live publish also requires a recorded Docker image digest for container runs.
More detail: V0.5 suite, OpenRouter, Cursor Composer, and publish to site.