AI & ML interests

Research and Policy Questions raised by (open) AI technology

Recent Activity

Updated text

#1 opened 9 days ago by
cosborne
giadapย 
posted an update 11 days ago
view post
Post
2948
๐Ÿ’ฌ From Replika to everyday chatbots, millions of people are forming emotional bonds with AI, sometimes seeking comfort, sometimes seeking intimacy. But what happens when an AI tells you "I understand how you feel" and you actually believe it?

At Hugging Face, together with @frimelle and @yjernite , we dug into something we felt wasn't getting enough attention: the need to evaluate AI companionship behaviors. These are the subtle ways AI systems validate us, engage with us, and sometimes manipulate our emotional lives.

Here's what we found:
๐Ÿ‘‰ Existing benchmarks (accuracy, helpfulness, safety) completely miss this emotional dimension.
๐Ÿ‘‰ We mapped how leading AI systems actually respond to vulnerable prompts. ๐Ÿ‘‰ We built the Interactions and Machine Attachment Benchmark (INTIMA): a first attempt at evaluating how models handle emotional dependency, boundaries, and attachment (with a full paper coming soon).

Check out the blog post: https://huggingface.co/blog/giadap/evaluating-companionship

๐Ÿšข We also shipped two visualization tools with Gradio to see how different models behave when things get emotionally intense:
- AI-companionship/intima-responses-2D
- giadap/INTIMA-responses
yjerniteย 
posted an update 12 days ago
view post
Post
3974
๐—™๐—ถ๐—ฟ๐˜€๐˜ ๐—š๐—ฃ๐—”๐—œ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜„๐—ถ๐˜๐—ต ๐—˜๐—จ ๐——๐—ฎ๐˜๐—ฎ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ป๐—ฐ๐˜† ๐—ง๐—ฒ๐—บ๐—ฝ๐—น๐—ฎ๐˜๐—ฒ? ๐Ÿ‡ช๐Ÿ‡บ

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! ๐Ÿ“Š๐Ÿ“š)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd ๐Ÿ‘€


In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month ๐Ÿค— ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too ๐Ÿ’ก)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations ๐Ÿ™Œ Definitely a step forward for transparency ๐Ÿ”

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
yjerniteย 
updated a Space 15 days ago
giadapย 
posted an update 22 days ago
view post
Post
1240
๐Ÿค– Technology means power, and whoever owns the technology owns the power.

Thrilled to share insights from my recent interview with MIT Technology Review about the growing movement toward local LLMs and what it means for AI democratization. Read here: https://www.technologyreview.com/2025/07/17/1120391/how-to-run-an-llm-on-your-laptop/

๐Ÿค” Why this matters: When we use "free" online AI services, we're often the product. Our conversations become training data, our personal stories get "cooked into" models, and our privacy becomes a commodity. But there's an alternative path forward.

๐Ÿ’ก The power shift is real: Local LLMs aren't just about privacy; they're about redistributing AI power away from a handful of tech giants. When individuals, organizations, and even entire nations can run their own models, we're democratizing access to AI capabilities.

๐Ÿค— At Hugging Face, we're proud to be at the center of this transformation. Our platform hosts the world's largest library of freely downloadable models, making cutting-edge AI accessible to everyone -- from researchers and developers to curious individuals who want to experiment on their laptops or even smartphones.

The technical barriers that once required $$$ server racks are crumbling. Today, anyone with basic computer skills can download a model, run it locally, and maintain complete control over their AI interactions. No sudden algorithm changes, no data harvesting, no corporate gatekeeping.

This is about technical convenience, but especially about technological sovereignty. When AI power is concentrated in a few hands, we risk creating new forms of digital dependency. Local models offer a path toward genuine AI literacy and independence.

๐Ÿš€ The future of AI should be open, accessible, and in the hands of the many, not the few. What are your thoughts on AI democratization? Have you experimented with local models yet?
evijitย 
posted an update 25 days ago
view post
Post
276
New blog post alert! "What is the Hugging Face Community Building?", with @yjernite and @irenesolaiman

What 1.8 Million Models Reveal About Open Source Innovation: Our latest deep dive into the Hugging Face Hub reveals patterns that challenge conventional AI narratives:

๐Ÿ”— Models become platforms for innovation Qwen, Llama, and Gemma models have spawned entire ecosystems of specialized variants. Looking at derivative works shows community adoption better than any single metric.

๐Ÿ“Š Datasets reveal the foundation layer โ†’ Most downloaded datasets are evaluation benchmarks (MMLU, Squad, GLUE) โ†’ Universities and research institutions dominate foundational data โ†’ Domain-specific datasets thrive across finance, healthcare, robotics, and science โ†’ Open actors provide the datasets that power most AI development

๐Ÿ›๏ธ Research institutions lead the charge: AI2 (Allen Institute) emerges as one of the most active contributors, alongside significant activity from IBM, NVIDIA, and international organizations. The open source ecosystem spans far beyond Big Tech.

๐Ÿ” Interactive exploration tools: We've built several tools to help you discover patterns!

ModelVerse Explorer - organizational contributions
DataVerse Explorer - dataset patterns
Organization HeatMap - activity over time
Base Model Explorer - model family trees
Semantic Search - find models by capability

๐Ÿ“š Academic research is thriving: Researchers are already producing valuable insights, including recent work at FAccT 2025: "The Brief and Wondrous Life of Open Models." We've also made hub datasets, weekly snapshots, and other data available for your own analysis.

The bottom line: AI development is far more distributed, diverse, and collaborative than popular narratives suggest. Real innovation happens through community collaboration across specialized domains.

Read: https://huggingface.co/blog/evijit/hf-hub-ecosystem-overview
giadapย 
posted an update about 1 month ago
view post
Post
2265
I've been posting bits and pieces about this research, but now I can finally say: new paper alert ๐Ÿšจ

My colleague @brunatrevelin and I just shared a paper exploring why traditional consent frameworks are breaking down in AI contexts (forthcoming chapter in a collective book).

The current model places impossible burdens on users to manage countless consent decisions. Meanwhile, AI systems learn to mimic our voices and writing styles from data we unknowingly provided years ago.

What's next? We need to shift from individual responsibility to collective accountability.

This means:
- Organizations designing systems that respect human agency by default
- Developers building ethics into models from the start
- Policymakers creating frameworks beyond minimal compliance

Blog post: https://huggingface.co/blog/giadap/consentful-ai
Paper: Can AI be Consentful? (2507.01051)
  • 2 replies
ยท
giadapย 
posted an update about 2 months ago
view post
Post
1913
๐Ÿ—ฃ๏ธ Whose voice do we hear when AI speaks?

Every language carries its own cultural values and worldviews. So, when we build AI systems, we're not just deciding how they speak but also whose perspectives they represent.

Even choosing which dialect to train on in Norway becomes a question of inclusion and power. In Kenya, will AI speak Swahili from Nairobi or coastal regions? What about indigenous languages with rich oral traditions but limited written text, like Quechua in Peru or Cherokee in North America?

The path forward? Building WITH communities, not just FOR them. Working with local partners (libraries, universities, civil society), testing for cultural alignment, and asking hard questions about representation.

Just published some thoughts on this after my keynote in Norway a few weeks ago: https://huggingface.co/blog/giadap/when-ai-speaks
  • 1 reply
ยท