Abstract
The paper discusses underexplored privacy risks in Large Language Models (LLMs) beyond verbatim memorization, including data collection, inference-time context leakage, autonomous agent capabilities, and surveillance through deep inference attacks, and calls for a broader interdisciplinary approach to address these threats.
The discourse on privacy risks in Large Language Models (LLMs) has disproportionately focused on verbatim memorization of training data, while a constellation of more immediate and scalable privacy threats remain underexplored. This position paper argues that the privacy landscape of LLM systems extends far beyond training data extraction, encompassing risks from data collection practices, inference-time context leakage, autonomous agent capabilities, and the democratization of surveillance through deep inference attacks. We present a comprehensive taxonomy of privacy risks across the LLM lifecycle -- from data collection through deployment -- and demonstrate through case studies how current privacy frameworks fail to address these multifaceted threats. Through a longitudinal analysis of 1,322 AI/ML privacy papers published at leading conferences over the past decade (2016--2025), we reveal that while memorization receives outsized attention in technical research, the most pressing privacy harms lie elsewhere, where current technical approaches offer little traction and viable paths forward remain unclear. We call for a fundamental shift in how the research community approaches LLM privacy, moving beyond the narrow focus of current technical solutions and embracing interdisciplinary approaches that address the sociotechnical nature of these emerging threats.
Community
The community is over-fixated on memorization when it comes to privacy, leaving many new areas and attack vectors unexplored. We discuss the deceptive consent mechanisms, the inference risks and other avenues that are opened with using LLMs as search engines and provide a roadmap with crystalized problems for the community.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond Data Privacy: New Privacy Risks for Large Language Models (2025)
- User Privacy and Large Language Models: An Analysis of Frontier Developers'Privacy Policies (2025)
- Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey (2024)
- AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning (2025)
- Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models (2025)
- The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration (2025)
- Enterprise AI Must Enforce Participant-Aware Access Control (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper