Summarization Tasks

3 benchmark tasks with side-by-side model comparisons

Meeting Notes Summary

EASY
13 runs · Last: Mar 31
o4 Mini: 2233ms · $0.000000o3 Mini: 2370ms · $0.000000o3: 2386ms · $0.000000GPT-4.1 Nano: 795ms · $0.000000GPT-4.1 Mini: 1289ms · $0.000000GPT-4.1: 1834ms · $0.000000Claude Opus 4.6: 4237ms · $0.000000Claude Sonnet 4.6: 2612ms · $0.001281Claude Haiku 4.5: 747ms · $0.000342GPT-4o: 2832ms · $0.001323GPT-4o Mini: 2248ms · $0.000068

Contradictory Information

HARD
11 runs · Last: Mar 31
o4 Mini: 2128ms · $0.000000o3 Mini: 2082ms · $0.000000o3: 2984ms · $0.000000GPT-4.1 Nano: 696ms · $0.000000GPT-4.1 Mini: 827ms · $0.000000GPT-4.1: 1019ms · $0.000000Claude Opus 4.6: 2382ms · $0.000000GPT-4o Mini: 1984ms · $0.000038Claude Sonnet 4.6: 1708ms · $0.001056GPT-4o: 1496ms · $0.000660Claude Haiku 4.5: 873ms · $0.000342

Contradictory Earnings Report Summary

HARD
11 runs · Last: Mar 31
o4 Mini: 5876ms · $0.000000o3 Mini: 3879ms · $0.000000o3: 3947ms · $0.000000GPT-4.1 Nano: 1757ms · $0.000000GPT-4.1 Mini: 3080ms · $0.000000GPT-4.1: 2909ms · $0.000000Claude Opus 4.6: 7499ms · $0.000000Claude Sonnet 4.6: 7914ms · $0.006033GPT-4o Mini: 5879ms · $0.000181GPT-4o: 3613ms · $0.002485Claude Haiku 4.5: 2316ms · $0.001336