Canary Extraction in Natural Language Understanding Models
Published in ACL Main Conference, 2022
In this work we demonstrate a white-box model inversion attack on Natural Language Understanding models. We show that an adversary can obtain sensitive information from the training data if given access to the model parameters.
Recommended citation: Parikh, Rahil, Christophe Dupuy, and Rahul Gupta. "Canary Extraction in Natural Language Understanding Models." arXiv preprint arXiv:2203.13920 (2022). https://arxiv.org/pdf/2203.13920.pdf