Artificial intelligence is still no match for humans when it comes to watching a how-to video and then assembling flat-pack furniture. A new study from researchers at Cornell, Cornell Tech, MBZUAI, and UC Berkeley put two dozen prominent AI systems, including GPT-5 and Google Gemini, through a custom benchmark called Flat-Pack Bench. The benchmark used videos of real IKEA furniture assembly and asked AI systems multiple-choice questions about things like which parts connect, which step comes next, and how to track individual pieces across a moving video.
Humans answered correctly more than 94-percent of the time, while the best AI systems topped out around 41-percent. Many barely beat random guessing. The most revealing finding was what happened when researchers stripped out the video and gave the AI only still images. Human accuracy collapsed by more than 50-percent, confirming they actually needed the video to answer the questions. The AI systems barely changed, suggesting most of them weren’t actually using the video information to begin with, just relying on their existing training knowledge about how furniture typically goes together.
Source: Unite.ai