Privacy Auditing of Tabular Synthetic Data Generators Using Membership Inference Attacks

Nicklaus Kim
MS, 2023
Cheng, Guang
Synthetic data is a promising technology with numerous benefits for data sharing and machine learning workflows, such as augmenting available data, bolstering fairness, and implementing privacy. As synthetic data becomes more and more prevalent, understanding its methodology and strengths and weaknesses in regards to the above promises becomes crucial. This thesis in particular explores the privacy implications of generative models for synthetic data by investigating whether these models deliver on their proposed privacy guarantees. We analyze the increasingly popular GAN-based methods for generating synthetic data, including CTGAN, DP-CTGAN, and PATE-GAN. To investigate the privacy delivered by these methods, we focus on the approach of adversarial privacy auditing, which utilizes a toolbox of adversarial attacks to detect privacy leakage in (differentially private) algorithms. We aim to extend previous work in privacy auditing, which typically focuses on general machine learning algorithms, to the scarcely examined case of synthetic data generators by analyzing a range of models and datasets. Our goal is to demonstrate the need for further exploration and application of privacy auditing in the scenario of synthetic data, provide insights by comparing behavior across different datasets, and offer simulation results for future investigations into various privacy preservation patterns. For example, experimental results on various datasets and target data records reveal differences in privacy outcomes, highlighting the important role of the data, independent of the synthetic data generator, in privacy preservation. The findings highlight the importance of data-centric privacy evaluations and the need for further work to achieve a truly tight privacy analysis. This research hopes to serve as a foundation for further studying adversarial privacy auditing and possibly for the development of robust defenses against privacy leakage in the future.
2023