J.O.S.I.E.-Qwen3-10M-Base-Phase1 / README.md

Goekdeniz-Guelmez

Update README.md

04d8e0e verified about 1 month ago

preview code

raw

history blame contribute delete

780 Bytes

metadata

license: apache-2.0
datasets:
  - mlx-community/recycling_the_web-400K
language:
  - en
base_model:
  - Goekdeniz-Guelmez/J.O.S.I.E.-Qwen3-10M-Random
pipeline_tag: text-generation
tags:
  - mlx

This is the first Pre-trained version of the Qwen3-10M model on 1 eoch of a 400K sample subset from facebooks recycling the web dataset. This phase 1 ppre-traininng traines the base model with a context size of 246, in phase 2 it will be extended to 1024 tokens.

My hardware is a M4 MacMini with 32GB RAM.

Stats:

bach_size = 32
val_batches = 16
epochs 1
optimizer_name = "adamw"
scheduler = "cosine_decay"

WandB runn: