Sutra-Instruct-350M

Sutra-Instruct-350M is a custom-built, 350-million parameter causal language model trained using nanaGPT architecture.

🧠 Model Architecture & Details

Architecture: Custom nanoGPT-based Transformer
Parameter Count: 350M
Format: safetensors
Embeddings: Tied (lm_head and wte share memory)
Creator: Abhiray

📚 Training Pipeline

This model was not fine-tuned from an existing corporate base model (like Llama or Mistral). Its brain was initialized from absolute zero and trained through a rigorous two-phase pipeline:

Phase 1: Pre-Training (The Foundation) The base logic was built by streaming a highly curated mix of academic and coding datasets:

HuggingFaceFW/fineweb-edu (High-level English and academic structure)
open-web-math/open-web-math (Mathematical logic and formatting)
bigcode/starcoderdata (Python syntax and code structure)
roneneldan/TinyStories (Basic grammar and narrative flow)

Phase 2: Supervised Fine-Tuning (SFT) Once the model learned how to speak, it was fine-tuned using the yahma/alpaca-cleaned dataset to teach it the standard Instruction: and Response: conversational format.

⚙️ Recommended Generation Settings

Because this is a compact 350M parameter model, standard generation settings may result in looping or wild hallucinations. For the absolute best outputs, use the following configuration:

Temperature: 0.5
Top-K: 50
Repetition Penalty: 1.3
Max Length: 400-500
one can use generation_config.json file in repo

⚠️ Limitations & Bias

Hallucinations: As a 350M parameter model, Sutra does not have the physical parameter count to act as a factual encyclopedia. It will confidently hallucinate historical dates, math solutions, and trivia.
Coding: While it understands Python syntax and will output beautifully formatted code blocks (thanks to StarCoder), complex logical scripts may fail.
Best Use Case: Sutra excels at structural formatting, grammar, summarizing provided context, generate short stories, and acting as a lightweight, lightning-fast local testing model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Abhiray/Sutra-Instruct-350M

Quantizations

1 model

Datasets used to train Abhiray/Sutra-Instruct-350M

Collection including Abhiray/Sutra-Instruct-350M

Sutra Instruct model

Collection

2 items • Updated 4 days ago