arxiv:2510.01645

Position: Privacy Is Not Just Memorization!

Published on Oct 2

· Submitted by

Fatemeh Mireshghallah on Oct 7

CMU-LTI

Upvote

Authors:

Abstract

The paper discusses underexplored privacy risks in Large Language Models (LLMs) beyond verbatim memorization, including data collection, inference-time context leakage, autonomous agent capabilities, and surveillance through deep inference attacks, and calls for a broader interdisciplinary approach to address these threats.

AI-generated summary

The discourse on privacy risks in Large Language Models (LLMs) has disproportionately focused on verbatim memorization of training data, while a constellation of more immediate and scalable privacy threats remain underexplored. This position paper argues that the privacy landscape of LLM systems extends far beyond training data extraction, encompassing risks from data collection practices, inference-time context leakage, autonomous agent capabilities, and the democratization of surveillance through deep inference attacks. We present a comprehensive taxonomy of privacy risks across the LLM lifecycle -- from data collection through deployment -- and demonstrate through case studies how current privacy frameworks fail to address these multifaceted threats. Through a longitudinal analysis of 1,322 AI/ML privacy papers published at leading conferences over the past decade (2016--2025), we reveal that while memorization receives outsized attention in technical research, the most pressing privacy harms lie elsewhere, where current technical approaches offer little traction and viable paths forward remain unclear. We call for a fundamental shift in how the research community approaches LLM privacy, moving beyond the narrow focus of current technical solutions and embracing interdisciplinary approaches that address the sociotechnical nature of these emerging threats.

View arXiv page View PDF Add to collection

Community

niloofarm

Paper submitter 15 days ago

The community is over-fixated on memorization when it comes to privacy, leaving many new areas and attack vectors unexplored. We discuss the deceptive consent mechanisms, the inference risks and other avenues that are opened with using LLMs as search engines and provide a roadmap with crystalized problems for the community.

librarian-bot

15 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.01645 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.01645 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.01645 in a Space README.md to link it from this page.