Archived V0.4 public suite

Six tasks, six files, one MainScene per file. V0.4 replaced the earlier single-video showcase.

One polished animation hid weak tasks. V0.4 splits the public suite into six submissions, each aimed at a different failure mode.

Task design

Each task targets roughly two minutes of render time at 60 FPS. Tasks cover layout, graphs, linear transforms, measured geometry, probability visuals, and a longer math explanation. Most scoring is automated; human review handles edge cases.

Shared output contract:

One Python file per task: outputs/<task_id>.py
from manim import *
One primary scene class named MainScene
60 FPS, 120 second cap per task

Models get a master prompt plus task-specific requirements. Required labels are checked in source and expected on screen where applicable.

The six tasks

Basic Manim layout (`basic_manim_layout`)

Titles, grouped objects, alignment, and pacing on a simple scene.

Calculus derivative graph (`calculus_derivative_graph`)

Plot a function, mark a point, draw a tangent, label axes and equations.

Linear algebra transformation (`linear_algebra_transformation`)

Animate a matrix acting on a grid or basis vectors with readable labels.

Geometry measurement diagram (`geometry_measurement_diagram`)

Lengths, angles, and a shaded region in one annotated diagram.

Probability distribution (`probability_distribution`)

Bars or curves with parameter labels; common failure mode is overlapping text.

Advanced math explanation (`advanced_math_explanation`)

Multi-step intuition around the Fourier heat equation across a longer scene.

Out of scope for V0.4

No 3D scenes, external asset pipelines, or open-ended prompts. That keeps official runs reproducible in a network-disabled container.

Older suites (v1 44-task set, v0.3 showcase) remain in the repo for history. Public rankings use benchmarks/v0.4/suite.yaml.

Interpreting suite scores

The suite score aggregates per-task results. One failed task on an otherwise strong run still matters for production use.

V0.4 results stay archived now that the V0.5 suite is active. Task semantics are not edited in place after a suite goes public.

See public suite docs and how scoring works.

Task design

The six tasks

Basic Manim layout (basic_manim_layout)

Calculus derivative graph (calculus_derivative_graph)

Linear algebra transformation (linear_algebra_transformation)

Geometry measurement diagram (geometry_measurement_diagram)

Probability distribution (probability_distribution)

Advanced math explanation (advanced_math_explanation)