Building on HF

3 5 5

Juan Julián

juanjucm

AI & ML interests

Machine Learning Engineer

Recent Activity

new activity 7 days ago

internlm/JanusCoder-8B:Update `pipeline_tag` from `Image-Text-to-Text` to `Text-Generation`

upvoted an article 8 days ago

Bringing Autonomous Driving RL to OpenEnv and TRL

published an article 17 days ago

From Benchmark Theater to Real Performance: A Case for Goodput

View all activity

Organizations

New activity in internlm/JanusCoder-8B 7 days ago

Update `pipeline_tag` from `Image-Text-to-Text` to `Text-Generation`

#2 opened 7 days ago by

juanjucm

upvoted an article 8 days ago

Article

Bringing Autonomous Driving RL to OpenEnv and TRL

12 days ago

•

published an article 17 days ago

Article

From Benchmark Theater to Real Performance: A Case for Goodput

17 days ago

•

New activity in nvidia/NVIDIA-Nemotron-Nano-12B-v2 27 days ago

Update tool parser scripts for vLLM v.0.15.0

#6 opened 27 days ago by

juanjucm

replied to their post about 1 month ago

🔵 zai-org/GLM-4.7-Flash

https://ai.azure.com/catalog/models/zai-org-glm-4.7-flash

replied to their post about 1 month ago

🔵 unsloth/GLM-4.7-Flash-GGUF

https://ai.azure.com/catalog/models/unsloth-glm-4.7-flash-gguf

posted an update about 1 month ago

Post

273

Last week,

zai-org dropped zai-org/GLM-4.7-Flash. Now, we bring it to Microsoft Foundry!

- 🏆 30B-A3B MoE, the strongest model in the 30B class. It excels at coding tasks, agentic workflows and reasoning.
- 🤏🏻 Lighter version of his 358B big brother, balancing performance and efficiency.

Not light enough for you? We are also adding

unsloth unsloth/GLM-4.7-Flash-GGUF to the catalog, with GPU and CPU support powered by llama.cpp 🔥

Go join the hype and deploy them from the Hugging Face collection on Microsoft Foundry!

2 replies

reacted to alvarobartt's post with 🔥 about 1 month ago

Post

3129

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

reacted to sergiopaniego's post with 🔥 about 1 month ago

Post

2594

New TRL + OpenEnv example! 💥

Fine tune an LLM for playing Sudoku using an RL env via OpenEnv

Includes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.

Enjoy!

Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb

Script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py

1 reply

upvoted an article about 2 months ago

Article

Security, Governance and Performance for Dell On-Prem AI Builders

Jan 21

•

published an article about 2 months ago

Article

Security, Governance and Performance for Dell On-Prem AI Builders

Jan 21

•

upvoted an article about 2 months ago

Article

VLM-OCR Recipes on GPU Infrastructure

Jan 15

•

reacted to pagezyhf's post with 🔥 4 months ago

Post

2919

🚀 Big news for AI builders!

We’re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

🔍 Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning — vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

👉 Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.

1 reply

upvoted an article 6 months ago

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16, 2025

•

reacted to pagezyhf's post with 🚀 8 months ago

Post

1574

In our recent push to make more models available on Azure, we recently added SmolLM v3 in the catalog! 🚀

@juanjucm wrote a really detailed guide on how to deploy on Azure AI 🤗

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-smollm3

If you want to see other models, please let us know