To do this, researchers at Stanford and the University of Washington used a method known as distillation — which allows smaller models to draw from the answers produced by larger ones — to refine s1 using answers from Google’s AI reasoning model, Gemini 2.0 Flash Thinking Experimental. Google’s terms of service note that you can’t use Gemini’s API to “develop models that compete with” the company’s AI models. The Verge reached out to Google with a request for comment but didn’t immediately hear back.

The researchers based s1 on Qwen2.5, an open-source model from Alibaba Cloud. They initially started with a pool of 59,000 questions to train the model on, but found that the larger data set didn’t offer “substantial gains” over a whittled-down set of just 1,000. The researchers say they trained the model on just 16 Nvidia H100 GPUs.

The s1 model also uses a technique called test-time scaling, allowing the model to “think” for a longer amount of time before producing an answer. As noted in the paper, researchers forced the model to continue reasoning by adding “Wait” to the model’s response. “This can lead the model to doublecheck its answer, often fixing incorrect reasoning steps,” the paper says.

OpenAI’s o1 reasoning model uses a similar approach, something the buzzy AI startup DeepSeek sought to replicate with the launch of its R1 model that it claims was trained at a fraction of the cost. OpenAI has since accused DeepSeek of distilling information from its models to build a competitor, violating its terms of service. As for s1, the researchers claim that s1 “exceeds o1-preview on competition math questions by up to 27%.”

The rise of smaller and cheaper AI models threatens to upend the entire industry. They could prove that major companies like OpenAI, Microsoft, Meta, and Google don’t need to spend billions of dollars training AI, while building massive data centers filled with thousands of Nvidia GPUs.

Share.
Exit mobile version