Sutra-Instruct-350M

Sutra-Instruct-350M is a custom-built, 350-million parameter causal language model trained using nanaGPT architecture.

🧠 Model Architecture & Details

  • Architecture: Custom nanoGPT-based Transformer
  • Parameter Count: 350M
  • Format: safetensors
  • Embeddings: Tied (lm_head and wte share memory)
  • Creator: Abhiray

πŸ“š Training Pipeline

This model was not fine-tuned from an existing corporate base model (like Llama or Mistral). Its brain was initialized from absolute zero and trained through a rigorous two-phase pipeline:

Phase 1: Pre-Training (The Foundation) The base logic was built by streaming a highly curated mix of academic and coding datasets:

  • HuggingFaceFW/fineweb-edu (High-level English and academic structure)
  • open-web-math/open-web-math (Mathematical logic and formatting)
  • bigcode/starcoderdata (Python syntax and code structure)
  • roneneldan/TinyStories (Basic grammar and narrative flow)

Phase 2: Supervised Fine-Tuning (SFT) Once the model learned how to speak, it was fine-tuned using the yahma/alpaca-cleaned dataset to teach it the standard Instruction: and Response: conversational format.

βš™οΈ Recommended Generation Settings

Because this is a compact 350M parameter model, standard generation settings may result in looping or wild hallucinations. For the absolute best outputs, use the following configuration:

  • Temperature: 0.5
  • Top-K: 50
  • Repetition Penalty: 1.3
  • Max Length: 400-500
  • one can use generation_config.json file in repo

⚠️ Limitations & Bias

  • Hallucinations: As a 350M parameter model, Sutra does not have the physical parameter count to act as a factual encyclopedia. It will confidently hallucinate historical dates, math solutions, and trivia.
  • Coding: While it understands Python syntax and will output beautifully formatted code blocks (thanks to StarCoder), complex logical scripts may fail.
  • Best Use Case: Sutra excels at structural formatting, grammar, summarizing provided context, generate short stories, and acting as a lightweight, lightning-fast local testing model.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Abhiray/Sutra-Instruct-350M

Quantizations
1 model

Datasets used to train Abhiray/Sutra-Instruct-350M

Collection including Abhiray/Sutra-Instruct-350M