DNA, mRNA, proteins, AI. I spent the last year going deep into computational biology as an ML engineer. This is Part I of what I found. π§¬
In 2024, AlphaFold won the Nobel Prize in Chemistry.
By 2026, the open-source community had built alternatives that outperform it.
That's the story I find most interesting about protein AI right now. Not just the science (which is incredible), but the speed at which open-source caught up. Multiple teams, independently, reproduced and then exceeded AlphaFold 3's accuracy with permissive licenses. The field went from prediction to generation: we're not just modeling known proteins anymore, we're designing new ones.
I spent months mapping this landscape for ML engineers. What the architectures actually are (spoiler: transformers and diffusion models), which tools to use for what, and which ones you can actually ship commercially.
Public reports allege that Anthropic gobbled up trillions of tokens of copyrighted material and public data to build their castle. π°π Now that they're sitting on top, they're begging for special laws to protect their profits while pulling the ladder up behind them. πͺπ«
But the hypocrisy meter just broke! π They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic
Stop sending sensitive data across the network. Sanitize it directly in the browser. π‘
A recent blog post by A. Christmas provides a practical guide on how to achieve exactly that. They demonstrated a powerful form of anonymization: PII masking at the edge. The vision is simple but profound: keep sensitive data off the network entirely by sanitizing it in the browser.
With the Ai4Privacy pii-masking-200k dataset serving as the foundation for their work. It provided the high-quality, diverse examples of PII needed to fine-tune a specialized DistilBERT model, one that is accurate, fast, and light enough to run client-side.
This is the future we are working towards: a world where developers are empowered with the tools and data to build powerful AI systems that respect user privacy by design. This is exactly why we build our datasets, and we're thrilled to showcase this project that turns the principles of data privacy into a practical, deployable solution.
@CohereLabs just released πΏ Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages π! But thereβs a catch:
Tiny Aya is just a language model. It doesnβt support tool calling, the key capability that turns frontier models into powerful *agents*. So the real question is:
How hard is it to turn Tiny Aya into an agent?
Turns outβ¦ itβs simple, thanks to Hugging Face TRL. Weβre sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.