R package synthpop. Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control. A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data.
Keywords for this software
References in zbMATH (referenced in 6 articles , 1 standard article )
Showing results 1 to 6 of 6.
- Ryan Hornby, Jingchen Hu: Bayesian Estimation of Attribute Disclosure Risks in Synthetic Data with the AttributeRiskCalculation R Package (2021) arXiv
- Derek Beaton: Generalized eigen, singular value, and partial least squares decompositions: The GSVD package (2020) arXiv
- Matthias Speidel, Jörg Drechsler, Shahab Jolani: The R Package hmi: A Convenient Tool for Hierarchical Multiple Imputation and Beyond (2020) not zbMATH
- Ryan Hornby, Jingchen Hu: Identification Risks Evaluation of Partially Synthetic Data with the IdentificationRiskCalculation R Package (2020) arXiv
- Gillian M. Raab, Beata Nowok, Chris Dibben: Guidelines for Producing Useful Synthetic Data (2017) arXiv
- Beata Nowok and Gillian Raab and Chris Dibben: synthpop: Bespoke Creation of Synthetic Data in R (2016) not zbMATH