A small, public, lighthearted research artifact: same prompt, same starting state, two Codex reasoning levels, two runnable Windows desktop widgets, and three blind LLM judges. The short version: ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results