SFEvent (Open-Source AI Meetup)

1024m

authored a paper about 15 hours ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published 3 days ago

1024m

authored a paper 3 days ago

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published 9 days ago • 1

merterbak

posted an update 3 days ago

Post

2201

OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗

merterbak/gpt-oss-20b-demo

AtAndDev

posted an update 17 days ago

Post

318

Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...

Qwen ftw!

DECENTRALIZE DECENTRALIZE DECENTRALIZE

rootacess

authored a paper about 1 month ago

Robust Learning of Diverse Code Edits

Paper • 2503.03656 • Published Mar 5 • 3

jeffboudier

posted an update about 2 months ago

Post

476

AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz

jeffboudier

posted an update about 2 months ago

Post

1683

Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.

Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.

Hugging Face organizations can sign up here: https://huggingface.co/training-cluster

AtAndDev

posted an update 2 months ago

Post

2911

deepseek-ai/DeepSeek-R1-0528

This is the end

1 reply

·

jeffboudier

posted an update 2 months ago

Post

2461

👏 Congrats @jinanz adding TimesFM times series forecasting to Transformers!

Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.co/blog/Nutanix/introducing-timesfm-for-time-series-forcasting

1024m

authored a paper 2 months ago

Uncovering Cultural Representation Disparities in Vision-Language Models

Paper • 2505.14729 • Published May 20 • 1

jeffboudier

posted an update 3 months ago

Post

496

Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!

Blog post has all the details: https://huggingface.co/blog/dell-ai-applications

jeffboudier

posted an update 3 months ago

Post

2595

Transcribing 1 hour of audio for less than $0.01 🤯

@mfuntowicz cooked with 8x faster Whisper speech recognition - whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!

How they did it: https://huggingface.co/blog/fast-whisper-endpoints

1-click deploy with HF Inference Endpoints: https://endpoints.huggingface.co/new?repository=openai%2Fwhisper-large-v3-turbo&vendor=aws&region=us-east&accelerator=gpu&instance_id=aws-us-east-1-nvidia-l4-x1&task=automatic-speech-recognition&no_suggested_compute=true

merterbak

posted an update 3 months ago

Post

2687

Qwen 3 technical report released🚀
Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf

merterbak

posted an update 3 months ago

Post

2603

Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages.
Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4
Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master
Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf

jeffboudier

posted an update 3 months ago

Post

3023

So many orgs on HF would really benefit from security and governance built into Enterprise Hub - I wrote a guide on why and how upgrade: https://huggingface.co/spaces/jeffboudier/how-to-upgrade-to-enterprise

For instance, did you know about Resource Groups?

merterbak

posted an update 3 months ago

Post

1706

Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. 🚀

Phi4 mini reasoning(SFT): microsoft/Phi-4-mini-reasoning
Phi-4 reasoning(SFT): microsoft/Phi-4-reasoning
Phi-4 reasoning plus (SFT + RL): microsoft/Phi-4-reasoning-plus
Demo: https://github.com/marketplace/models/azureml/Phi-4-reasoning/playground
Articles: https://arxiv.org/pdf/2504.21318
https://arxiv.org/pdf/2504.21233
Blog: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/

1 reply

·

mkluczek

posted an update 3 months ago

Post

329

Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍

We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.

Total embedding resources after the update:
- 51 TB of AI-embeddings generated from processed Sentinel data,
- over 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

Datasets:
Major-TOM/Core-S2L2A-MMEarth
Major-TOM/Core-S2L1C-DeCUR
Major-TOM/Core-S1RTC-DeCUR

#EarthObservation #AI #CloudFerro #asterisklabs #ESA

merterbak

posted an update 3 months ago

Post

4905

Qwen 3 models released🔥
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

✅ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
✅Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
✅ Three stage done while pretraining:
• Stage 1: General language learning and knowledge building.
• Stage 2: Reasoning boost with STEM, coding, and logic skills.
• Stage 3: Long context training
✅ It supports MCP in the model
✅ Strong agent skills
✅ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
✅ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.

merterbak

posted an update 4 months ago

Post

3625

FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.

✅ First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. 🤖
✅ Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. 💡
✅ Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.

FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner

ethanhe

authored a paper 4 months ago

Training Video Foundation Models with NVIDIA NeMo

Paper • 2503.12964 • Published Mar 17 • 7

AI & ML interests

Team members 580

SFEvent's activity