OpenAI has unveiled two new open-weight language models, gpt-oss-120b and gpt-oss-20b, marking a significant shift in its approach to artificial intelligence development.
These models, which can be freely downloaded, customised, and run on consumer hardware, are OpenAI’s first open-weight releases in over six years and are seen as a direct response to the rising influence of China’s DeepSeek and Meta’s Llama models.
The gpt-oss-120b model is designed to perform at a level comparable to OpenAI’s proprietary o4-mini model, while gpt-oss-20b is optimised for devices with limited memory, such as laptops, and matches the performance of o3-mini.
Both models are available under the Apache 2.0 licence, allowing broad commercial and research use, and can be accessed via platforms including Hugging Face, Databricks, Azure, and AWS. OpenAI says these models are intended “to empower everyone—from individual developers to large enterprises to governments—to run and customise AI on their own infrastructure”.
The decision to release these models comes after OpenAI’s chief executive officer, Sam Altman, acknowledged that the company had “been on the wrong side of history” by not offering open models sooner, especially as developers have increasingly turned to open alternatives for their flexibility and lower costs. Altman stated, “We’re excited to make this model, the result of billions of dollars of research, available to the world to get AI into the hands of the most people possible”
OpenAI’s move comes as competition in the AI sector intensifies globally. DeepSeek, a Chinese start-up, recently released its own open model, R1, which has been credited with narrowing the gap between Chinese and US AI technology. Meta’s Llama models have also contributed to the trend towards freely available, customisable AI systems, although some industry observers note that these are “open-weight” rather than fully open source.
OpenAI has emphasised that safety was a central concern in the release of gpt-oss. The company subjected the models to extensive safety testing, including adversarial fine-tuning to simulate potential misuse, and claims the models did not reach high risk levels in internal and external evaluations.