🤖 MicroSupra-1k

So... have you ever seen a model that runs on a 3 dollars hardware? No? If no, Now you're seeing!

MicroSupra-1k is a bacteria base model(lol) trained on 300 million tokens of Fineweb-Edu for 3 epochs as the first version of our MicroSupra series.

Model Config

Parameters: 1046 (0.001M)
Architecture: LLaMa
Vocab size: 1024
Hidden Size: 1
Intermediate Size: 2
Hidden Layers: 1
Attention Heads: 1
Max Position Embeddings: 256
Learning rate: 5e-3



	
		
	
	
		Final Loss
	

This model reached a final train loss after 3 epochs of 6.046.

	
		
	
	
		Examples
	

Prompt: "My name is "

Output:: "My name is ed and. as the, to. the, in
ingt thee the ofingi in
 the., anda.-eo
 ofles, b the,er,s fing.ssp the the
, of of, the,al, d to the m, the, to toed,
seng,,.y. in the,., in and them the thened.sing to
 the of of andan the the,, the
 to..,,sing,,.aring the the. of.al.,s ofcal ar s..e and.sssor of, and and."



Prompt: "The main concept of physics is "

Output:: "The main concept of physics is  a,
s and the. thet to, theing.... the,a then,c,i to, thee in b. toed.,,e theyalp the in,er thees- s,el,,,,
 and, the of ine,,s the of cs of thesss the. f. to. thesining andor dar,,al the,. of p.
 the.s the.,,s. anded,e. of, ofed, l toinging and themsr the of of. to
 to thes thes aen,., ofes of a."



Prompt: "Question: What is the capital of France?\nAnswer: "

Output:: "Question: What is the capital of France?
Answer:,. and to the. toc. ofs the m,a thee.. the, f ofling. as.,,y bt, the p
, in, the,,ees toed ing to.
o,
 thes. the..,s the.ed and andang,,ed the of,,ms. of, thei the, the,ey,,s l.ing toe the the,se the to, the, the,aror, the of-. in the. the. the,e the of ds to,ic the the aal at the..
ingssy s and and"

	
		
	
	
		Usage 🚀
	

print("[*] Loading libraries...")
import torch
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast

model_path = "SupraLabs/MicroSupra-1k"

print("[*] Loading tokenizer...")
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)

print("[*] Loading model...")
model = LlamaForCausalLM.from_pretrained(model_path)
model.eval()

prompt = "Question: What is the capital of France?\nAnswer:"
print(f"[*] Prompt: {prompt!r}")

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=150,
        do_sample=True,
        temperature=0.35,
        top_p=0.85,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))


	
		
	
	
		Why did SupraLabs create this???
	

Because we are experimenting sizes, experiments(like 1Bit quant, distillation(NEW THINGS ARE COMING WITH DISTILLATION! GET TUNED!), pruning) all to better your experience! We are working on big things!

	
		
	
	
		Training guide
	

We trained MicroSupra on a GTX750 Ti 4GB in 1 Minute for 3 epochs.

The model was trained on the first 300 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

	
		
	
	
		Final thoughts
	

Even without any intelligence, it shows that scaling laws are real. This ant model doesn't know how to talk, but we all know it emotions 🤖🫶

Downloads last month: 3

Safetensors

Model size

1.04k params

Tensor type

F32

Dataset used to train SupraLabs/MicroSupra-1k

Collection including SupraLabs/MicroSupra-1k

All Supra models

Collection

ALL the family(micro, nano, small, large, ALL SIZES AND EXPERIMENTS!) • 5 items • Updated about 5 hours ago • 1