Tucano2
An open suite of large language models (LLMs) with 0.5-3.7 billion parameters, designed to address the gap in open-source development for Portuguese.
- Paper • 2603.03543 • Published • 5
Polygl0t/Tucano2-0.6B-Base
Text Generation • 0.7B • Updated • 15Note 🧱 Base version of Tucano2 0.6B. Use as a foundation for post-training.
Polygl0t/Tucano2-qwen-0.5B-Base
Text Generation • 0.5B • Updated • 26Note 🧱 Base version of Tucano2 0.5B. Use as a foundation for post-training.
Polygl0t/Tucano2-qwen-0.5B-Instruct
Text Generation • 0.5B • Updated • 30 • 1Note 💬 Instruct version of Tucano2 0.5B. Suited for chat applications.
Polygl0t/Tucano2-qwen-0.5B-Think
Text Generation • 0.5B • Updated • 55Note 🤔 Think version of Tucano2 0.5B. Suited for reasoning tasks.
Polygl0t/Tucano2-qwen-1.5B-Base
Text Generation • 2B • Updated • 269Note 🧱 Base version of Tucano2 1.5B. Use as a foundation for post-training.
Polygl0t/Tucano2-qwen-1.5B-Instruct
Text Generation • 2B • Updated • 280 • 1Note 💬 Instruct version of Tucano2 1.5B. Suited for chat applications.
Polygl0t/Tucano2-qwen-1.5B-Think
Text Generation • 2B • Updated • 25Note 🤔 Think version of Tucano2 1.5B. Suited for reasoning tasks.
Polygl0t/Tucano2-qwen-3.7B-Base
Text Generation • 4B • Updated • 17Note 🧱 Base version of Tucano2 3.7B. Use as a foundation for post-training.
Polygl0t/Tucano2-qwen-3.7B-Instruct
Text Generation • 4B • Updated • 48 • 1Note 💬 Instruct version of Tucano2 3.7B. Suited for chat applications.
Polygl0t/Tucano2-qwen-3.7B-Think
Text Generation • 4B • Updated • 39Note 🤔 Think version of Tucano2 3.7B. Suited for reasoning tasks.
Polygl0t/gigaverbo-v2
Viewer • Updated • 375M • 56Note 📚 Pretraining dataset.
Polygl0t/gigaverbo-v2-synth
Viewer • Updated • 11.2M • 53Note 📚 Synthetic dataset.
Polygl0t/gigaverbo-v2-sft
Viewer • Updated • 4.09M • 52Note 📚 Supervised fine-tuning dataset.
Polygl0t/gigaverbo-v2-preferences
Viewer • Updated • 28.4k • 35Note 📚 Preference dataset.
Polygl0t/GigaVerbo-v2-ablation-EDU-Synth-1.5B
Text Generation • 2B • Updated • 16Note 🔬 Ablation Experiment (Edu+Synth)
Polygl0t/GigaVerbo-v2-ablation-EDU-1.5B
Text Generation • 2B • Updated • 14Note 🔬 Ablation Experiment (Edu)
Polygl0t/GigaVerbo-v2-ablation-Synth-1.5B
Text Generation • 2B • Updated • 13Note 🔬 Ablation Experiment (Synth)
Polygl0t/GigaVerbo-v2-ablation-NonEDU-1.5B
Text Generation • 2B • Updated • 13Note 🔬 Ablation Experiment (NonEdu)
Polygl0t/portuguese-edu-qwen-annotations
Viewer • Updated • 700k • 2Note 📚 Annotations to train classifiers/filters (Educational).
Polygl0t/portuguese-toxicity-qwen-annotations
Viewer • Updated • 700k • 2Note 📚 Annotations to train classifiers/filters (Toxicity).
Polygl0t/portuguese-instruct-quality-qwen-annotations
Viewer • Updated • 500k • 2Note 📚 Annotations to train classifiers/filters (Instructions).
Polygl0t/portuguese-bertimbau-edu-classifier
Text Classification • 0.1B • Updated • 13Note 🎯 Quality Filter (Educational)
Polygl0t/portuguese-bertimbau-large-edu-classifier
Text Classification • 0.3B • Updated • 13Note 🎯 Quality Filter (Educational)
Polygl0t/portuguese-bertimbau-toxicity-classifier
Text Classification • 0.1B • Updated • 15Note 🎯 Quality Filter (Toxicity)
Polygl0t/portuguese-bertabaporu-large-toxicity-classifier
Text Classification • 0.4B • Updated • 13Note 🎯 Quality Filter (Toxicity)
Polygl0t/portuguese-qwen3-4b-instruct-quality-classifier
Text Classification • 4B • Updated • 14Note 🎯 Quality Filter (Instructions)
Polygl0t/portuguese-qwen3-4b-instruct-quality-judge
Text Generation • 4B • Updated • 13Note 🎯 Quality Filter (Instructions)
Polygl0t/tokenizers
Viewer • Updated • 8.98M • 2Note 📚 Data used to train the Tucano2 tokenizer.
Polygl0t/gsm8k-pt
Viewer • Updated • 8.76k • 22Note 🏆 An evaluation for mathematical reasoning in Portuguese.
Polygl0t/IFEval-PT
Viewer • Updated • 300 • 20Note 🏆 An evaluation for instruction following in Portuguese.
Polygl0t/portuguese-eval-logs-olmo2-smollm3
Viewer • Updated • 203 • 19Note 🔬 Evaluation suite experiments.