FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Attributes

Official repository for FusionDP (submitted to VLDB 2026)

📄 arXiv version Official paper coming soon.

Datasets in sensitive domains often contain attributes with heterogeneous privacy requirements, where different features are subject to different access controls and compliance policies. For instance, in health datasets, specific identifiers like age and race are protected health information, while clinical measurements and lab results may be freely used for analytics. However, existing differential privacy (DP) mechanisms apply uniform protection across all features, leading to excessive noise injection that degrades the utility of machine learning models. We propose FusionDP, a framework that enables feature-level privacy control when training over partially sensitive data. FusionDP first leverages foundation models to impute sensitive features from non-sensitive ones, creating a privacy-preserving view of the data. It then uses a modified DP-SGD algorithm that trains on both original and imputed features while formally guaranteeing privacy for sensitive attributes. We evaluate FusionDP on 4 different classification tasks. Compared to standard DP-SGD baselines, FusionDP significantly improves model accuracy while maintaining rigorous feature-level privacy, demonstrating how exploiting feature-level heterogeneity enhances the privacy-utility tradeoff in sensitive data analytics.

Problem setting and training pipeline

We consider a scenario where only a subset of features requires privacy protection, while the rest can be used without restrictions.

Formally, let $\mathcal{X}$ denote the data space, where each data point $x \in \mathcal{X}$ can be decomposed into sensitive (private) components $x_{\text{priv}}$ and non-sensitive (public) components $x_{\text{pub}}$, such that $x = (x_{\text{priv}}, x_{\text{pub}})$. In real-world applications, $x_{\text{priv}}$ may include demographic attributes, rare diagnoses, education history, or identifiable spans in text that pose greater re-identification risks, or sensitive features specified by users. Meanwhile, $x_{\text{pub}}$ encompasses features like lab results, transactions, or non-identifying text tokens that do not require the same level of protection.

FusionDP is a two-step framework to achieve feature-DP with improved utility. The figure illustrates this training pipeline. We first use foundation models to generate hybrid samples where sensitive features are replaced by imputed values. We then train the model with a combined loss objective of public (in green) and private (in red) components. We clip and add noise only to the gradient of the private loss, which isolates and bounds the contribution of private features. Under this framework, we improve the private gradient component by leveraging the gradient calibration and proposing a representation-consistency regularizer to align hidden states of original and hybrid inputs.

Requirement

Install the environment:

conda env create -f environment.yml
conda activate fusiondp

Running FusionDP

Take the bank marketing dataset as an example, FusionDP can be ran using the following argument:

run python impute_bank.py to impute sensitive attributes with TabPFN and get bank_train.csv, bank_train_imputed.csv, bank_val.csv, bank_test.csv for training.

 python train_fusiondp_bank.py \
      --one_hot --mode fusiondp --epsilon $eps --epochs 10 --max_grad_norm $c \
          --alpha $a --beta $b

Datasets

Results

Contributors

Linghui Zeng $^{1}$, Ruixuan Liu $^{1}$, Atiquer Rahman Sarkar $^{2}$, Xiaoqian Jiang $^{3}$, Joyce C. Ho $^{1}$, Li~Xiong $^{1}$

$^{1}$ Emory University, $^{2}$ University of Manitoba, $^{3}$ University of Texas Health Science Center

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
additional tabular		additional tabular
mimic3 clinical notes		mimic3 clinical notes
tabular sepsis		tabular sepsis
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Attributes

Official repository for FusionDP (submitted to VLDB 2026)

Problem setting and training pipeline

Requirement

Running FusionDP

Datasets

Results

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FusionDP: Foundation Model-Assisted Differentially Private Learning for Partially Sensitive Attributes

Official repository for FusionDP (submitted to VLDB 2026)

Problem setting and training pipeline

Requirement

Running FusionDP

Datasets

Results

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages