Awesome Privacy-Preserving XAI

A Survey of Privacy-Preserving Model Explanations

Awesome Privacy-Preserving Explainable AI

I. Introduction

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings.

II. List of Approaches (Sortable)

Total number of rows: XX

Title	Year	Venue	Target Explanations	Attacks	Defenses	Code
Please Tell Me More: Privacy Impact of Explainability through the Lens of Membership Inference Attack	2024	SP	Feature-based	Membership Inference	Differential Privacy, Privacy-Preserving Models, DP-SGD	-
On the Privacy Risks of Algorithmic Recourse	2023	AISTATS	Counterfactual	Membership Inference	Differential Privacy	-
The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks	2023	TIST	Counterfactual	Linkage	Anonymisaion	-
Feature-based Learning for Diverse and Privacy-Preserving Counterfactual Explanations	2023	KDD	Counterfactual	-	Perturbation	[Code]
Private Graph Extraction via Feature Explanations	2023	PETS	Feature-based	Graph Extraction	Perturbation	[Code]
Privacy-Preserving Algorithmic Recourse	2023	ICAIF	Counterfactual	-	Differential Privacy	-
Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage	2023	ICML-Workshop	Counterfactual	Membership Inference	Differential Privacy	-
Probabilistic Dataset Reconstruction from Interpretable Models	2023	arXiv	Interpretable Surrogates	Data Reconstruction	-	[Code]
DeepFixCX: Explainable privacy-preserving image compression for medical image analysis	2023	WIREs-DMKD	Case-based	Identity recognition	Anonymisation	[Code]
XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models	2023	Preprint	Shapley	-	Multi-party Computation	-
-	2023	Github	ALE plot	-	Differential Privacy	[Code]
Inferring Sensitive Attributes from Model Explanations	2022	CIKM	Gradient-based, Perturbation-based	Attribute Inference	-	[Code]
Model explanations with differential privacy	2022	FAccT	Feature-based	-	Differential Privacy	-
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations	2022	FAccT	Counterfactual	Model Extraction	-	-
Feature Inference Attack on Shapley Values	2022	CCS	Shapley	Attribute/Feature Inference	Low-dimensional	-
Evaluating the privacy exposure of interpretable global explainers	2022	CogMI	Interpretable Surrogates	Membership Inference	-	-
Privacy-Preserving Case-Based Explanations: Enabling Visual Interpretability by Protecting Privacy	2022	IEEE Access	Example-based	-	Anonymisation	-
On the amplification of security and privacy risks by post-hoc explanations in machine learning models	2022	arXiv	Feature-based	Membership Inference	-	-
Differentially Private Counterfactuals via Functional Mechanism	2022	arXiv	Counterfactual	-	Differential Privacy	-
Differentially Private Shapley Values for Data Evaluation	2022	arXiv	Shapley	-	Differential Privacy	[Code]
Exploiting Explanations for Model Inversion Attacks	2021	ICCV	Gradient-based, Interpretable Surrogates	Model Inversion	-	-
On the Privacy Risks of Model Explanations	2021	AIES	Feature-based, Shapley, Counterfactual	Membership Inference	-	-
Adversarial XAI Methods in Cybersecurity	2021	TIFS	Counterfactual	Membership Inference	-	-
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI	2021	arXiv	Gradient-based	Model Extraction	-	[Code]
Robust Counterfactual Explanations for Privacy-Preserving SVM	2021	ICML-Workshop	Counterfactual	-	Private SVM	[Code]
When Differential Privacy Meets Interpretability: A Case Study	2021	RCV-CVPR	Interpretable Models	-	Differential Privacy	-
Differentially Private Quantiles	2021	ICML	Quantiles	-	Differential Privacy	[Code]
FOX: Fooling with Explanations : Privacy Protection with Adversarial Reactions in Social Media	2021	PST	-	Attribute Inference	Privacy-Protecting Explanation	-
Privacy-preserving generative adversarial network for case-based explainability in medical image analysis	2021	IEEE Access	Example-based	-	Generative Anonymisation	-
Interpretable and Differentially Private Predictions	2020	AAAI	Locally linear maps	-	Differential Privacy	[Code]
Model extraction from counterfactual explanations	2020	arXiv	Counterfactual	Model Extraction	-	[Code]
Model Reconstruction from Model Explanations	2019	FAT*	Gradient-based	Model Reconstruction, Model Extraction	-	-
Interpret Federated Learning with Shapley Values	2019	-	Shapley	-	Federated	[Code]
Collaborative Explanation of Deep Models with Limited Interaction for Trade Secret and Privacy Preservation	2019	WWW	Feature-based	-	Collaborative rule-based model	-
Model inversion attacks that exploit confidence information and basic countermeasures	2015	CCS	Confidence scores	Reconstruction, Model Inversion	-	-

III. Citations

Source: https://github.com/tamlhp/awesome-privex

Paper: https://arxiv.org/abs/2404.00673