BabyBabelLM GPTBERT
Collection
BabyBabelLM (Multilingual BabyLM) with GPT-BERT Architecture
β’
11 items
β’
Updated
This repository contains checkpoints for the multilingual small (tier 1) variant of BabyBabeLLM. This is a multilingual BabyLM trained on Tier 1 languages from the multilingual BabyLM corpus (Jumelet et al 2025).
*_15_16.bin β main model weights *_15_16_ema.bin β EMA smoothed weights *_15_16_state_dict.bin β PyTorch state dict pytorch_model.bin β extracted EMA weights (for AutoModel)from transformers import AutoModel, AutoTokenizer
repo = "suchirsalhan/babybabellm-multismall"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModel.from_pretrained(repo)
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)
multismall indicates the language/config variant.