A consortium of 64 mathematicians built SOOHAK, a new AI benchmark with 439 handwritten tasks, including 99 that are deliber…
A new benchmark, SOOHAK, designed by mathematicians, demonstrates that current large language models confidently provide solutions to intentionally unsolvable mathematical problems.
This revelation is significant because it exposes a critical blind spot in AI's reasoning capabilities, particularly for advanced models like Google's Gemini 3 Pro, which still falters on these fabricated challenges. It highlights the gap between pattern recognition and genuine logical deduction, impacting the reliability of AI in fields requiring absolute mathematical certainty.
Future research should focus on how models are trained to identify or flag unsolvable problems, rather than simply generating plausible-sounding, but incorrect, answers. Understanding the failure modes of models like Gemini 3 Pro on SOOHAK will be crucial for developing more robust and trustworthy AI systems.