Top Open-Source Small Language Models for Generative AI Applications

Small Language Models (SLMs) are language models that contain, at most, a few billion parameters—significantly fewer than Large Language Models (LLMs), which can have tens, hundreds of billions, or even trillions, of parameters. SLMs are well-suited for resource-constrained environments, as well as on-device and real-time generative AI applications. Many of them can run locally on a laptop using tools like LM Studio or Ollama . These models are typically derived from larger models using techniques such as quantization and distillation. In the following, some well developed SLMs are introduced.

Note: All the models mentioned here are open source. However, for details regarding experimental use, commercial use, redistribution, and other terms, please refer to the license documentation.

Phi 4 Collection by Microsoft

This Collection features a range of small language models, including reasoning models, ONNX- and GGUF-compatible formats, and multimodal models. The base model in the collection has 14 billion parameters, while the smallest models have 3.84 billion. Strategic use of synthetic data during training has led to improved performance compared to its mother model (primarily GPT-4). Currently, the collection includes three versions of reasoning-focused SLMs, making it one of the best solutions for reasoning tasks.

👉 Licence: MIT
👉 Collection on Hugging Face
👉 Technical Report

Gemma 3 Collection by Google

This collection features multiple versions, including Image-to-Text, Text-to-Text, and Image-and-Text-to-Text models, available in both quantized and GGUF formats. The models vary in size, with 1, 4.3, 12.2, and 27.4 billion parameters. Two specialized variants have been developed for specific applications: TxGemma, optimized for therapeutic development, and ShieldGemma, designed for moderating text and image content.

👉 Licence: Gemma
👉 Collection on Hugging Face
👉 Technical Report
👉 ShieldGemma on Hugging Face
👉 TxGemma on Hugging Face

Mistral Models

Mistral AI is a France-based AI startup and one of the pioneers in releasing open-source language models. Its current product lineup includes three compact models: Mistral Small 3.1, Pixtral 12B, and Mistral NEMO. All of them are released under Apache 2.0 license.

Mistral 3.1 is a multimodal and multilingual SLM having 24 billion parameters and 128K context window. Currently there are two versions: Base and Instruct.
👉 Base Version on Hugging Face
👉 Instruct Version on Hugging Face
👉 Technical Report

Pixtral 12B is a natively multimodal model trained on interleaved image and text data, delivering strong performance on multimodal tasks and instruction following while maintaining state-of-the-art results on text-only benchmarks. It features a newly developed 400M parameter vision encoder and a 12B parameter multimodal decoder based on Mistral NEMO. The model supports variable image sizes, aspect ratios, and multiple images within a long context window of up to 128k tokens.
👉 Pixtral-12B-Base-2409 on Hugging Face
👉 Pixtral-12B-2409 on Hugging Face
👉 Technical Report

Mistral NeMo is a 12B model developed in collaboration with NVIDIA, featuring a large 128k-token context window and state-of-the-art reasoning, knowledge, and coding accuracy for its size.
👉 Model on Hugging Face
👉 Technical Report

Llama Models by Meta

Meta is one of the leading contributors to open-source AI. In recent years, it has released several versions of its Llama models. The latest series is Llama 4, although all models in this collection are currently quite large. Smaller models may be introduced in the future or in upcoming sub-versions of Llama 4, but for now, that hasn’t happened. The most recent collection that includes smaller models is Llama 3.2. It features models with 1.24 billion and 3.21 billion parameters with 128k context windows. Additionally, there is a 10.6 billion-parameter multimodal version designed for Image-and-Text-to-Text tasks. This collection includes small variants of Llama Guard — fine-tuned language models designed for prompt and response classification. They can detect unsafe prompts and responses, making them useful for implementing safety measures in LLM-based applications.

👉 License: LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
👉 Collection on Hugging Face
👉 Technical Paper

Qwen 3 Collection by Alibaba

The Chinese tech giant Alibaba is another major player in open-source AI. It releases its language models under the Qwen name. The latest version is Qwen 3, which includes both small and large models. The smaller models range in size, with parameter counts of 14.8 billion, 8.19 billion, 4.02 billion, 2.03 billion, and even 752 million. This collection also includes quantized and GGUF formats.

👉 Licence: Apache 2.0
👉 Collection on Hugging Face
👉 Technical Report

This list is not limited to these five. You can explore more open-source models at: