A Survey of Privacy-Preserving Model Explanations
Awesome Privacy-Preserving Explainable AI
I. Introduction
As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings.
II. List of Approaches (Sortable)
Total number of rows: XX
Title Year Venue Target Explanations Attacks Defenses Code
Please Tell Me More: Privacy Impact of Explainability through the Lens of Membership Inference Attack 2024 SP Feature-based Membership Inference Differential Privacy, Privacy-Preserving Models, DP-SGD -
Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks 2024 arXiv Feature-based Membership Inference Game Theory -
On the Privacy Risks of Algorithmic Recourse 2023 AISTATS Counterfactual Membership Inference Differential Privacy -
The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks 2023 TIST Counterfactual Linkage Anonymisaion -
Feature-based Learning for Diverse and Privacy-Preserving Counterfactual Explanations 2023 KDD Counterfactual - Perturbation [Code]
Private Graph Extraction via Feature Explanations 2023 PETS Feature-based Graph Extraction Perturbation [Code]
Privacy-Preserving Algorithmic Recourse 2023 ICAIF Counterfactual - Differential Privacy -
Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage 2023 ICML-Workshop Counterfactual Membership Inference Differential Privacy -
Probabilistic Dataset Reconstruction from Interpretable Models 2023 arXiv Interpretable Surrogates Data Reconstruction - [Code]
DeepFixCX: Explainable privacy-preserving image compression for medical image analysis 2023 WIREs-DMKD Case-based Identity recognition Anonymisation [Code]
XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models 2023 Preprint Shapley - Multi-party Computation -
DP-XAI 2023 Github ALE plot - Differential Privacy [Code]
Inferring Sensitive Attributes from Model Explanations 2022 CIKM Gradient-based, Perturbation-based Attribute Inference - [Code]
Model explanations with differential privacy 2022 FAccT Feature-based - Differential Privacy -
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations 2022 FAccT Counterfactual Model Extraction - -
Feature Inference Attack on Shapley Values 2022 CCS Shapley Attribute/Feature Inference Low-dimensional -
Evaluating the privacy exposure of interpretable global explainers, Privacy Risk of Global Explainers 2022 CogMI Interpretable Surrogates Membership Inference - -
Privacy-Preserving Case-Based Explanations: Enabling Visual Interpretability by Protecting Privacy 2022 IEEE Access Example-based - Anonymisation -
On the amplification of security and privacy risks by post-hoc explanations in machine learning models 2022 arXiv Feature-based Membership Inference - -
Differentially Private Counterfactuals via Functional Mechanism 2022 arXiv Counterfactual - Differential Privacy -
Differentially Private Shapley Values for Data Evaluation 2022 arXiv Shapley - Differential Privacy [Code]
Exploiting Explanations for Model Inversion Attacks 2021 ICCV Gradient-based, Interpretable Surrogates Model Inversion - -
On the Privacy Risks of Model Explanations 2021 AIES Feature-based, Shapley, Counterfactual Membership Inference - -
Adversarial XAI Methods in Cybersecurity 2021 TIFS Counterfactual Membership Inference - -
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI 2021 arXiv Gradient-based Model Extraction - [Code]
Robust Counterfactual Explanations for Privacy-Preserving SVM, Robust Explanations for Private Support Vector Machines 2021 ICML-Workshop Counterfactual - Private SVM [Code]
When Differential Privacy Meets Interpretability: A Case Study 2021 RCV-CVPR Interpretable Models - Differential Privacy -
Differentially Private Quantiles 2021 ICML Quantiles - Differential Privacy [Code]
FOX: Fooling with Explanations : Privacy Protection with Adversarial Reactions in Social Media 2021 PST - Attribute Inference Privacy-Protecting Explanation -
Privacy-preserving generative adversarial network for case-based explainability in medical image analysis 2021 IEEE Access Example-based - Generative Anonymisation -
Interpretable and Differentially Private Predictions 2020 AAAI Locally linear maps - Differential Privacy [Code]
Model extraction from counterfactual explanations 2020 arXiv Counterfactual Model Extraction - [Code]
Model Reconstruction from Model Explanations 2019 FAT* Gradient-based Model Reconstruction, Model Extraction - -
Interpret Federated Learning with Shapley Values 2019 __ Shapley - Federated [Code]
Collaborative Explanation of Deep Models with Limited Interaction for Trade Secret and Privacy Preservation 2019 WWW Feature-based - Collaborative rule-based model -
Model inversion attacks that exploit confidence information and basic countermeasures 2015 CCS Confidence scores Reconstruction, Model Inversion - -
III. Citations
Source: https://github.com/tamlhp/awesome-privex
Paper:   https://arxiv.org/abs/2404.00673
© 2024 Privacy-Preserving Explainable AI  
Flag Counter