Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

[🏠 Homepage] [📖 Arxiv Paper] [🤗 Models] [🤗 Datasets(coming soon)] [💻 Code(coming soon)]

Introduction

We introduce Bee-8B, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality.

Bee-8B is trained on our new Honey-Data-15M corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, HoneyPipe, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy.

This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning, establishing a new standard for fully open MLLMs.

Key Features

High-Quality, Large-Scale Dataset: We release Honey-Data-15M, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities.
Fully Open-Source Data Curation Suite: We provide not just the data, but the entire methodology. HoneyPipe and its underlying framework DataStudio offer the community a transparent and reproducible pipeline, moving beyond static dataset releases.
State-of-the-Art Open Model: Our model, Bee-8B, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.

News

[2025.10.13] 🐝 Bee-8B is Released! Our model is now publicly available. You can download it from Hugging Face.

Quickstart

Below, we provide simple examples to show how to use Bee-8B with 🤗 Transformers. You can dynamically control the model's response by selecting one of two modes: set enable_thinking=True for thinking mode, or enable_thinking=False for non-thinking mode. The default is thinking mode.

Using 🤗 Transformers to Chat

import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

model_path = "Open-Bee/Bee-8B-SFT"

# Load model
model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

# Load processor
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

# Define conversation messages
messages = [{
    "role":
    "user",
    "content": [
        {
            "type": "image",
            "image": "https://huggingface.co/Open-Bee/Bee-8B-SFT/resolve/main/assets/logo.png",
        },
        {
            "type": "text",
            "text": "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
        },
    ],
}]

# Apply chat template
text = processor.apply_chat_template(messages,
                                     tokenize=False,
                                     add_generation_prompt=True,
                                     enable_thinking=True)

# Load image
image_url = "https://huggingface.co/Open-Bee/Bee-8B-SFT/resolve/main/assets/logo.png"
image = Image.open(requests.get(image_url, stream=True).raw)

# Process inputs
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")

# Generate output
generated_ids = model.generate(**inputs, max_new_tokens=16384, temperature=0.6)
output_ids = generated_ids[0][len(inputs.input_ids[0]):]

# Decode output
output_text = processor.decode(output_ids, skip_special_tokens=True)

# Print result
print(output_text)

Experimental Results

Evaluation of Bee-8B against other MLLMs. We distinguish between fully open (*) and semi-open (†) models. The top and second-best scores for each benchmark are highlighted.

New State-of-the-Art: Bee-8B establishes a new performance standard for fully open MLLMs, proving highly competitive with recent semi-open models across a wide array of benchmarks.
Excellence in Complex Reasoning: Thanks to the CoT-enriched Honey-Data-15M, Bee-8B shows its most significant advancements in complex math and reasoning. It achieves top scores on challenging benchmarks like MathVerse, LogicVista, and DynaMath.
Superior Document and Chart Understanding: The model demonstrates powerful capabilities in analyzing structured visual data, securing the top rank on the CharXiv benchmark for both descriptive and reasoning questions.

Acknowledgements

Bee-8B is developed based on the architectures and codebases of the following projects: R-4B, LLaVA-OneVision, SigLIP2, Qwen3, and evaluated using VLMEvalKit. We sincerely thank these projects for their outstanding contributions to the open-source community.