This repository is a comparator-based Lean benchmark for formal mathematics.
Benchmark authors write trusted problem statements once in shared Lean modules, and the
tooling generates one comparator workspace per problem under generated/.
A submission is scored entirely by comparator results: a problem counts as solved iff comparator accepts the submitted solution.
The main user-facing entrypoint is:
lake exe lean-eval --helpUse this path if you are adding or editing benchmark problems.
lake exe cache get
lake buildPut the statement in one of the shared modules under LeanEval/ and mark it with
@[eval_problem].
@[eval_problem]
theorem my_new_problem : ... := by
sorryCurrent source modules live in topic folders such as:
LeanEval/NumberTheory/LeanEval/Topology/LeanEval/ComplexAnalysis/LeanEval/EasyProblems.lean
Each tagged declaration must be listed by exactly one file under
manifests/problems/. One file per problem,
named <id>.toml, with top-level keys (no [[problem]] wrapper). The
filename stem must match the id field. The holes array names every
@[eval_problem]-tagged declaration in the module that the problem
owns; for the common single-theorem case it has one element.
# manifests/problems/my_new_problem.toml
id = "my_new_problem"
title = "My new problem"
test = false
module = "LeanEval.SomeModule"
holes = ["my_new_problem"]
submitter = "Your Name"
notes = "Optional notes."
source = "Optional citation or URL."
informal_solution = "Optional proof sketch or reference."The required fields are:
id(must equal the filename stem)titletestmoduleholessubmitter
The one-file-per-problem layout means two PRs adding distinct problems never conflict on the manifest.
A problem may bundle several defs, instances and theorems — list
them all in holes. Comparator then asks the participant to fill every
listed declaration in their Submission.lean. Two conventions:
- Name every instance. The generator addresses holes by their
declaration name, and Lean's auto-generated names for anonymous
instances (e.g.instTopologicalSpaceJacobian) are not stable. Useinstance instAddCommGroup : ... := sorryrather thaninstance : ... := sorry. - Use
holeseven for one declaration. There is notheorem = "..."shorthand; a singleton hole is justholes = ["my_thm"].
See LeanEval/Sandbox/DefHoleExample.lean
and LeanEval/Sandbox/InstanceHoleExample.lean
for the smallest working examples, or
LeanEval/Geometry/JacobianChallenge.lean
for a realistic multi-hole problem.
lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-buildvalidate-manifest checks that @[eval_problem] declarations and manifest entries
match. check-problem-build builds the problem modules so warning-producing Lean changes
do not slip through. Both are cheap and catch the most common mistakes before a CI
roundtrip.
That's it — push your branch and open a PR. CI regenerates the comparator workspaces
under generated/ and verifies they build, so you do not need to commit anything
under generated/ yourself. Once the PR merges, a separate workflow regenerates
generated/ on main and pushes the result.
If CI fails with a generation or build error, you'll need to fix the source. The fastest local equivalent is:
lake exe lean-eval generate --problem my_new_problem
lake exe lean-eval check-generated-builds --problem my_new_problemUse this path if you want to prove benchmark problems without touching the trusted benchmark files.
Generated workspaces live under generated/, one directory per problem. The current
catalog is summarized in generated/index.json.
Examples:
generated/two_plus_two/generated/list_append_singleton_length/generated/cyclotomic_integer_house_le_two/
Copy a clean starter workspace:
lake exe lean-eval start-problem two_plus_twoThat creates workspaces/two_plus_two/ by default. You can also pass a destination:
lake exe lean-eval start-problem two_plus_two /tmp/two_plus_twocd workspaces/two_plus_two
lake updateSolver-owned files are:
Submission.leanSubmission/Helpers.lean- any additional Lean files you add under
Submission/
Trusted files you should not edit in the normal solver workflow are:
Challenge.leanSolution.leanconfig.jsonlakefile.toml
Challenge.lean contains the benchmark statement. Solution.lean is the fixed bridge
that tells comparator to check your theorem from the Submission namespace.
lake testlake test shells out to three external tools that you must install yourself:
landrun (the sandbox), lean4export (exports oleans to text), and comparator
(the verifier). All three are pinned to immutable commits; the authoritative pin
table lives in SECURITY.md ("Trusted dependencies and pin
policy"), and CI installs exactly these in
.github/workflows/ci.yml. Reproduce that setup
locally:
# landrun — install at the pinned commit (NOT @main; the version string is
# unreliable, and tagged releases through v0.1.15 lack fixes comparator needs).
go install github.com/zouuup/landrun/cmd/landrun@5ed4a3db3a4ad930d577215c6b9abaa19df7f99f
export PATH="$(go env GOPATH)/bin:$PATH"
# lean4export — clone, check out the pin, and build.
git clone https://github.com/leanprover/lean4export.git
( cd lean4export
git checkout 12581a6b680d8478175596338eb2d53383a323e3
lake build lean4export )
export PATH="$PWD/lean4export/.lake/build/bin:$PATH"
# comparator — clone, check out the pin (adds `def`-hole support), and build.
git clone https://github.com/leanprover/comparator.git
( cd comparator
git checkout 71b52ec29e06d4b7d882726553b1ceb99a2499e0
lake build comparator )
export PATH="$PWD/comparator/.lake/build/bin:$PATH"lean4export and comparator are Lean programs: lake build compiles each with
the Lean toolchain pinned in its own lean-toolchain at the commit above (the
v4.30.0-rc2 toolchain). This must match the toolchain that builds the workspace,
because comparator builds Challenge.olean with the workspace toolchain and then
reads it back with lean4export. If the two differ you get
failed to read file '.../Challenge.olean', incompatible header — that error
means a Lean/olean version mismatch (or a stale .lake artifact left over from
an earlier toolchain), never a problem with your proof. If you hit it, rebuild
lean4export (and comparator) at the pinned commits with the rc2 toolchain
rather than your elan default, and clear the affected workspace's .lake/build
before retrying.
Once the tools are on your PATH, verify the whole pipeline against the starter
problem before attempting a real one — this builds and scores two_plus_two end
to end, so it isolates an install problem from a proof problem:
lake exe lean-eval check-comparator-installationIf you keep the comparator binary somewhere off your PATH, point lake test at
it explicitly (landrun and lean4export must still be on PATH, since
comparator invokes them):
COMPARATOR_BIN=/path/to/comparator lake testFrom the repo root:
lake exe lean-eval run-eval
lake exe lean-eval run-eval --jsonThe scorer prefers workspaces/<problem-id>/ when present and falls back to
generated/<problem-id>/ otherwise.
To submit a solution to the public leaderboard, open a submission issue on the submissions repository:
That repository owns the hosted submission pipeline and the stored results.
This repository (leanprover/lean-eval) holds only the problem set and the
comparator/sandbox integration.
Participants may use Mathlib freely.
If a proof needs helper code that is not already in Mathlib, that code must be included
inside the submission workspace itself. Multi-file submissions are allowed through
Submission.lean and extra local modules under Submission/.
For benchmark-repo submissions (a PR that edits a generated/ workspace in place),
validate changed paths with:
lake exe lean-eval validate-submission --file generated/two_plus_two/Solution.leanThe current validator accepts:
- modifications to
generated/<problem-id>/Solution.leanandgenerated/<problem-id>/Submission.lean - additions, modifications, deletions, renames, or copies of
.leanfiles undergenerated/<problem-id>/Submission/ - additions (only) of markdown files anywhere inside
generated/<problem-id>/, other than the generatedREADME.md - additions (only) of a top-level
generated/<problem-id>/LICENCEorgenerated/<problem-id>/LICENSEfile
In practice, solvers should normally work in Submission.lean and Submission/.
LeanEval/: trusted authored problem statementsmanifests/problems/: one TOML file per problem, named<id>.tomlgenerated/: generated comparator workspacesscripts/: generation, validation, and scoring helpersPLAN.md: deferred design and roadmap notes
For a local health pass over the repository:
lake exe lean-eval validate-manifest
lake exe lean-eval check-problem-build
lake exe lean-eval generate --check
lake exe lean-eval check-generated-builds
lake exe lean-eval run-evalThere is also an end-to-end workflow self-check:
lake exe lean-eval check-eval-workflow