🎲 [ICLR 2025] DICE: Data Influence Cascade in Decentralized Learning

Community Article Published July 17, 2025

Upvote

Tongtian Zhu

TongtianZhu

TLDR: We introduce DICE, the first framework for measuring data influence in fully decentralized learning.

Tags: Data_Influence, Decentralized_Learning

Authors: Tongtian Zhu¹ Wenhao Li¹ Can Wang¹ Fengxiang He²

¹Zhejiang University ²The University of Edinburgh

📄 Openreview • 💻 Code • 📚 arXiv • 🔗 Twitter • 🖼️ Poster • 📊 Slides • 🎥 Video (Chinese)

Updates in progress. More coming soon! 😉

🗓️ 2025-07-17 — Updated main results

Main Results

Theorem (Approximation of r-hop DICE-GT)

The r-hop DICE-GT influence $\mathcal{I}_{\mathrm{DICE-GT}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})$ can be approximated as follows:

$\begin{equation} \begin{split} &\mathcal{I}_{\mathrm{DICE-E}}^{(r)}(\boldsymbol{z}_j^t, \boldsymbol{z}^{\prime})\\ & = - \sum_{\rho=0}^{r} \sum_{ (k_1, \dots, k_{\rho}) \in P_j^{(\rho)} } \eta^{t} q_{k_\rho} \underbrace{ \left( \prod_{s=1}^{\rho} \boldsymbol{W}_{k_s, k_{s-1}}^{t+s-1} \right) }_{\text{communication graph-related term}} \times \underbrace{ \nabla L\bigl(\boldsymbol{\theta}_{k_{\rho}}^{t+\rho}; \boldsymbol{z}^{\prime}\bigr)^\top }_{\text{test gradient}} \\ & \quad \times \underbrace{ \left( \prod_{s=2}^{\rho} \left( \boldsymbol{I} - \eta^{t+s-1} \boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s-1}; \boldsymbol{z}_{k_s}^{t+s-1}) \right) \right) }_{\text{curvature-related term}} \times \underbrace{ \Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) }_{\text{optimization-related term}} \end{split} \end{equation}$

where $\Delta_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t) = \mathcal{O}_j(\boldsymbol{\theta}_j^t,\boldsymbol{z}_j^t)-\boldsymbol{\theta}_j^t$ , $k_{0} = j$ . Here $P_j^{(\rho)}$ denotes the set of all sequences $(k_1, \dots, k_{\rho})$ such that $k_s \in \mathcal{N}_{\mathrm{out}}^{(1)}(k_{s-1})$ for $s=1,\dots,\rho$ and $\boldsymbol{H}(\boldsymbol{\theta}_{k_s}^{t+s}; \boldsymbol{z}_{k_s}^{t+s})$ is the Hessian matrix of $L$ with respect to $\boldsymbol{\theta}$ evaluated at $\boldsymbol{\theta}_{k_s}^{t+s}$ and data $\boldsymbol{z}_{k_s}^{t+s}$ .

For the cases when $\rho = 0$ and $\rho = 1$ , the relevant product expressions are defined as identity matrices, thereby ensuring that the r-hop DICE-E remains well-defined.

Key Insights from DICE

Our theory uncovers the intricate interplay of factors that shape data influence in decentralized learning:

1. Asymmetric Influence and Topological Importance: The influence of identical data is not uniform across the network. Instead, nodes with greater topological significance exert stronger influence.
2. The Role of Intermediate Nodes and Loss Landscape: Intermediate nodes actively contribute to an "influence chain". The local loss landscape of these models also actively shapes the influence as it propagates through the network.
3. Influence Cascades with Damped Decay: Data influence cascades with "damped decay" induced by mixing parameter W. This decay, which can be exponential with the number of hops, ensures that influence is "localized".

Citation

Cite Our Paper 😀

If you find our work insightful, we would greatly appreciate it if you could cite our paper.

@inproceedings{zhu2025dice,
  title="{DICE: Data Influence Cascade in Decentralized Learning}",
  author="Tongtian Zhu and Wenhao Li and Can Wang and Fengxiang He",
  booktitle="The Thirteenth International Conference on Learning Representations",
  year="2025",
  url="[https://openreview.net/forum?id=2TIYkqieKw](https://openreview.net/forum?id=2TIYkqieKw)"
}

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote