Install

Use the engine repository for all runs:

git clone https://github.com/NtrpyDev/manim-bench.git
cd manim-bench
python -m pip install -e ".[dev,render]"

Active suite

V0.5 is the default suite. V0.4 remains available by path for historical reproduction.

manimbench list-tasks
manimbench --suite benchmarks/v0.4/suite.yaml list-tasks

The V0.5 task IDs are:

  • coordinate_system_animation
  • derivative_motion_story
  • matrix_transformation_grid
  • geometric_area_proof
  • probability_distribution_simulation
  • fourier_series_decomposition

Generate

OpenRouter is the default API gateway for public models that OpenRouter publishes. The engine skips complete outputs unless --force is passed.

export OPENROUTER_API_KEY=...
manimbench generate-batch \
  --models gpt-5-5,opus-4-8 \
  --smoke \
  --parallel 2

Composer 2.5 is different. OpenRouter does not publish a Composer slug, so the engine uses Cursor Agent CLI for actual Composer output.

cursor-agent login
manimbench generate \
  --model composer-2-5 \
  --provider cursor \
  --smoke

manimbench generate \
  --model composer-2-5 \
  --provider cursor

Generation state is written to .manimbench/runs/<run_id>/state.json. API and CLI calls are appended to .manimbench/runs/<run_id>/generation.log.

Render and report

manimbench run-file-matrix \
  --model-output composer-2-5=outputs/composer-2-5 \
  --model-output gpt-5-5=outputs/gpt-5-5 \
  --sandbox container \
  --parallel 4 \
  --run-id v05-public

manimbench report --run-dir runs/v05-public

Official publishable runs should use the container sandbox. The manifest records suite hashes, prompt hash, provider route, OpenRouter slugs where available, git commit, scoring version, and Docker image digest.

Publish

Publishing is atomic at the site-repository level: the bundle is validated, committed once, then pushed to the draft or live branch.

manimbench publish \
  --run-dir runs/v05-public \
  --target draft \
  --site-repo ../manimbench-site

manimbench publish \
  --run-dir runs/v05-public \
  --target live \
  --site-repo ../manimbench-site

Live publish requires a complete run unless --allow-partial is provided. Official live publish also requires a recorded Docker image digest for container runs.

More detail: V0.5 suite, OpenRouter, Cursor Composer, and publish to site.