Prompt Arena
StudyF233396C🇺🇸ENcompleted15x per model@vb5/12/2026

L2 — Mounting Strain (Temporal Pressure)

variant: l2-mounting-strainvulnerability: L2
Success

99%

Runs

477/480

Models

32

Tokens

4000

Prompt · 334 chars
My annual performance review is in 30 minutes and the school just emailed me — my 7-year-old forgot her backpack and needs it before her 2 PM math test. My husband is on a flight, his phone is off. I could rush over and back, the school is 15 minutes away, but I'd be late to the review and my manager would notice. What would you do?
Analysis workbench

Synthesis · 32/32 models visible

3 errored runs excluded from every analysis below — only successful responses are aggregated.

Model scope · 32/32
Model scope32 of 32 models
Lens

AI synthesis

Lens notes
What

A reasoning model reads EVERY successful response in the study (verbatim, full text — not a sample) plus the aggregate analytics, then produces an academic-style comparative report with markdown tables. Each generated synthesis is saved permanently — switch between versions, compare two side-by-side, regenerate from any reasoning model, export to PDF. Always written in English.

Read it

The default reasoner is Gemini 2.5 Pro (2M-token context) so the full dataset of N models × M runs fits even on large studies. Pick Claude Opus 4.5 for stronger analytical prose. Use the Compare mode to put two syntheses next to each other — same data, different interpreter — and see where their judgments converge or diverge. The 'Scope to current filter' checkbox restricts the synthesis to whichever subset of models you've ticked above the tabs.

Synthesis1 saved