1 1 58

ottomate

alfredo-ottomate

https://ottomate.io

AI & ML interests

None yet

Recent Activity

liked a model about 3 hours ago

openbmb/VoxCPM2

liked a model about 3 hours ago

microsoft/TRELLIS.2-4B

liked a model about 10 hours ago

utter-project/mHuBERT-147

View all activity

Organizations

liked 2 models about 3 hours ago

openbmb/VoxCPM2

Text-to-Speech • Updated 6 days ago • 10.9k • 862

microsoft/TRELLIS.2-4B

Image-to-3D • Updated Dec 27, 2025 • 752

liked a model about 10 hours ago

utter-project/mHuBERT-147

Feature Extraction • 94.4M • Updated Dec 19, 2024 • 50.9k • • 103

liked a model 2 days ago

jinaai/jina-embeddings-v5-text-nano-retrieval

liked a model 5 days ago

batiai/gemma-4-E4B-it-GGUF

Text Generation • 8B • Updated 5 days ago • 1.54k • 3

liked 2 models 6 days ago

zed-industries/zeta-2

Text Generation • 8B • Updated 22 days ago • 1.95k • 133

tiiuae/Falcon-OCR

Image-to-Text • Updated 13 days ago • 4.62k • 71

liked a Space 7 days ago

OWSM V4 Demo

🌍

This is a demo for OWSM-V4 CTC and medium model.

liked a Space 18 days ago

Cohere Transcribe WebGPU

⚡

Run Cohere Transcribe locally in your browser on WebGPU.

reactedto Parveshiiii's post with 🔥 18 days ago

Post

2889

Just did something I’ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesn’t have to be.

microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok

liked a model 18 days ago

CohereLabs/cohere-transcribe-03-2026

Automatic Speech Recognition • Updated 6 days ago • 184k • 865

repliedto omarkamali's post 19 days ago

I mean more that I would expect the ASR-side of things to worry only about transcription, rather than context, because then one could use an SLM/LLM to actually fix the transcription. I think that ASR models today want to do too much, when they should just do one thing. If I say "I read a book" and the ASR picks up "I red a book" it should be fine, because one could always post-process the output. But today, ASR models do all-in-one, which means that it is even harder to steer them (i.e. wake word detection or out-of-vocabulary terms), so in the end you're in a spot where these ASR models have a lot of overhead, but you still need to post-process the output, when instead they should just be dumb and leave the post-processing to better suited models.

repliedto omarkamali's post 19 days ago

Everything has shifted from fast word recognition to monolithic LLM-based context recognition. With IPA, ASR/STT models could focus solely on words and leave post-processing to other, more capable models. There hasn't really been a good, small ASR model that is truly capable of running locally on low-powered devices. I'm still using Vosk models because they are just good for what they are, but they're approaching the 7-year mark now, which is absurd.

repliedto omarkamali's post 19 days ago

Regarding the video, at first I thought it was a joke because they looked like tokenized words haha

The 10% speed and VRAM usage improvements sound absolutely revolutionary. It would really be a massive breakthrough if you pull it off.

Also, I commented on your post on Twitter, but I'll say it here too: this would work absolutely wonders for speech-to-text and text-to-speech since it also has baked in IPA phonemes. You should definitely consider exploring that angle, because those spaces desperately need improvement.

repliedto omarkamali's post 19 days ago

You just killed 23 dyslexic people (and counting) with that video, be ca use of the we ird wo rd split ting. hahaha

Jokes aside, this looks absolutely amazing, but I think tokenizers are there because this might not work fast enough at scale. I'd be excited and extremely happy to be proven wrong, because the concept is certainly great.

reactedto omarkamali's post with 🤯 19 days ago

Post

1871

I just might have cracked tokenizer-free LLMs. No vocab, no softmax.

I'm training a 22M params LLM rn to test this "thing" and it's able to formulate coherent sentences 🤯

Bear in mind, this is a completely new, tokenizer-free LLM architecture with built-in language universality.

Check the explainer video to understand what's happening. Feedback welcome on this approach!