Documentation
The benchmark is reproducible from the repository. These are the source documents for running, scoring, reviewing, and publishing results.
Methodology
Task coverage, sandbox policy, source checks, visual sanity, and usage accounting.
Blog
Notes on methodology, scoring design, and benchmark decisions.
FAQ
Answers about official runs, the public suite, and reading the leaderboard.
Quickstart
Install the package, prepare V0.4 outputs, and run the guided launcher.
Public suite
Read the V0.4 task coverage and official result policy.
Scoring
See how source checks, render checks, visual sanity checks, and review fit together.
Model workspaces
Create file-backed workspaces for any AI coding agent.
Run comparison
Auto-discover ready model outputs and build a comparison report.
Publishing
Prepare report bundles and publish results to the public site.
Deploy site
Build and deploy the static report bundle.
Launcher
Interactive guided run flow for local and container sandbox backends.
Repository
Full source, task definitions, and issue tracker on GitHub.