Alibaba released Qwen 3.5 9B on March 1, 2026. At 9 billion parameters, it outperforms OpenAI GPT-OSS-120B on multiple benchmarks. Runs on a single consumer GPU for free.
Benchmarks
- MMLU-Pro: 82.5% vs GPT-OSS-120B 80.8%
- GPQA Diamond: 81.7% vs 80.1%
- MMMLU Multilingual: 81.2% vs 78.2%
- MMMU-Pro Vision: 70.1% vs GPT-5-Nano 57.2%
GPT-OSS-120B is 13 times larger.
Architecture
Hybrid of Gated Delta Networks (linear attention) and Sparse Mixture-of-Experts.
Run It Locally
- BF16: single RTX 3090 (24GB GPU)
- 4-bit quantization: roughly 5GB
- Available on Ollama, Hugging Face, vLLM
Our Verdict
Best small AI model of 2026. Architecture innovation beats raw scale.
Rating: 9.5/10
