AI & ML interests

None defined yet.

Recent Activity

Crowneliusย  updated a model 12 days ago
Crowfeather/Crowfeather-50m
Crowneliusย  published a model 12 days ago
Crowfeather/Crowfeather-50m
Crowneliusย  updated a Space about 2 months ago
Crowfeather/README
View all activity

Crowneliusย 
posted an update 6 days ago
view post
Post
3973
Day 4-6 [05/05/2026]
Howdy,

Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...

I got dreams, man. The datasets I could build with 40k would be insane.
Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.

What would you do with it?
I would turn arxiv into a dataset. Turn each arxiv paper into a QnA.
Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.

Food for thought.
Anyways, I think I'm going to make a post once a week.
In the meantime you can find me building small llm's in discord here:
https://discord.gg/4DdwS9D8x9
  • 6 replies
ยท
Crowneliusย 
posted an update 10 days ago
view post
Post
5277
Day 3 - 05/02/2026
Scamp ships, hits the wall. New plan...

Scamp came back from training today... Didn't go so well, I'm still unsure...

Fast benchmark, temperature 0.7, top_p 0.9:
- "Capital of France is" produced "covered by the Crown" (grammatical, factually wrong)
- "23 + 19 = ?" produced "23. Answer: 23. Answer: 23..." (loops, math broken)
- "def fibonacci(n):" produced a list of letters

It speaks English. It can't reason. At 8K vocab and 50M params, it was never going to.

Next build: 412M MoE-3E. Three experts (math, language, code), top-1 routing, random init, let specialization emerge from gradient signal alone. Tried seeded Branch-Train-MiX first then dropped it. Adds compute for no clear win when the router will find its own attractors anyway.

Big lesson today came from limit testing on A100 80GB. Surprise, every planned phase ran out of memory even on 80GB. Root cause: at vocab 262144 (Gemma 3 standard), the output logits dominate during forward and backward. Fix: Liger Kernel's fused cross-entropy. It streams the loss computation instead of materialising the full B by T by vocab tensor. Without it the build would not run.

Scamp proved the pipeline runs end-to-end on real hardware. The 412M run starts tomorrow. If routing balances naturally and math finally crystallises, ships as Crowfeather-412M-3E with GGUF in F16, Q8, Q5, and Q4.

So... the training may have produced a poet if I had done it better. But I didn't, so instead... we get a malformed robot named Scamp... This is progress.

-Shane

P.S Join discord for discussion: https://discord.gg/8ZscHNmJYE and
I post my finished stuff here:
CompactAI-O
  • 2 replies
ยท
Crowneliusย 
posted an update 11 days ago
view post
Post
3659
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026
Que sera, what will he be?

Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.

Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.

Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.

Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.

Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.

The bank's after my credit card. Until then, full steam.

Next model gets graphs. I swear.

-Shane
  • 3 replies
ยท
Crowneliusย 
posted an update 12 days ago
view post
Post
3804
[DAY ONE] PROJECT CROWFEATHER 4/30/2026
...The day I forgot to attach wandb.ai
Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs.

Crowfeather/Crowfeather-50m

54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet.

Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card.

What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien.

What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning.

The series:
Every additional training run becomes another model card here
Every model card gets a matching post on this profile
Continuation goes to Colab next, picking up from step 17500 out of 100k

Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](@Crownelius ) and [@Crowfeather](
Crowfeather
) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away.

Graphs will be available on my NEXT model lol

-Shane
  • 3 replies
ยท
Crowneliusย 
posted an update 13 days ago
view post
Post
5898
My Huggingface journey has been a trip!
I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.

Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.

Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.

The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.

I never knew the clanker hater from a year ago would be saying this.

Thank you all from the bottom of my heart.

Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!
  • 3 replies
ยท
Crowneliusย 
updated a Space about 2 months ago
Crowneliusย 
published a Space about 2 months ago