Benchmark DeepSeek R1 Llama 3.2 OpenAI o1 (ChatGPT) Mathematics ~90%+ accuracy Strong in larger variants (e.g., 90B) ~83% on advanced benchmarks like the American Invitational Mathematics ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results