SECTION 01 / HERO
How capable is AI at real engineering?
aec-bench measures AI performance across 500+ tasks in architecture, engineering and construction — cable sizing, seismic design, hydraulic modelling, HVAC, geotech. Real problems, real standards, automated scoring.
SECTION 02 / CURRENT STANDINGS
Current standings
dataset release · 552 tasks · 5 disciplines
SECTION 03 / REWARD × LATENCY
Reward × Latency
release results pair task performance with runtime and completion coverage
- #01Grok 4.30.89
- #02Grok 4.20 Reasoning0.87
- #03Kimi K2.60.86
- #04GPT-5.20.83
SECTION 04 / DISCIPLINES
Five engineering disciplines
coverage 468/468 tasks · verified against AS/NZS standards
Civil
Roads, drainage, hydraulics, earthworks.
Electrical
Cable sizing, fault current, lighting, power.
Ground
Foundations, slopes, retaining walls.
Mechanical
HVAC, fire protection, piping, acoustics.
Structural
Steel/concrete design, seismic, connections.
SECTION 05 / HOW IT WORKS
Define → run → score
six-stage pipeline · same flow every run
aec-bench ~ $ uv run aec-bench run-local \ tasks/generated/electrical/cable-sizing/voltage-drop/sydney-suburban-residential-lighting-00 \ --model claude-sonnet-4-20250514 --harness direct › staging temporary workspace … ok › executing harness direct › verifier complete · reward 0.83 · imported as experiment local aec-bench ~ $ uv run aec-bench evaluate --experiment local --report report.html › done. report written to report.html
SECTION 06 / RUN IT YOURSELF
Benchmark your model against real engineering.
Open-source. Reproducible. Runs locally or against any provider.