Statistical Disclosure Control with R
The “Statistical Disclosure Control with R” training course has been designed for organisations, governmental departments, research institutes and private companies, who process, manage and analyse socio-economic microdata and want to safeguard the identity of individuals and sensitive information using modern statistical approaches. The course provides an in-depth knowledge on theory, specific statistical methods and practical applications of Statistical Disclosure Control – a growing area of research in data processing and statistics focused on minimising disclosure risk of socio-economic datasets.
During the course your delegates will learn modern approaches to calculating individual, cluster and global disclosure risks using sdcMicro package and custom-made R functions for risk assessment and disclosure analysis. They will understand and implement appropriate methods of reducing disclosure risk depending on type of variables to be treated and extent of the disclosure. Some of these methods may include top-, and bottom-coding, recoding continuous data, local and cell suppression, post-randomisation, micro-aggregation, adding noise, shuffling and reverse mapping and other more advanced methods designed to protect the safety and security of microdata. Finally, your attendees will learn about specific computing and data mining techniques used by intruders and hackers to obtain and disclose personal or sensitive information from open-source, publicly available datasets.
Basic course information
Minimum recommended duration: 3-5 full days or 6-10 half-days (can be spread across multiple weeks)
Programming languages used: R
Minimum number of attendees: 5
Course level: For beginner or pre-intermediate users of R.
Pre-requisites: Pre-intermediate skills in data management, processing and analytics in R language are recommended for delegates attending this course. Understanding of basic concepts of statistics e.g. exploratory data analysis, linear models, basic probability theory would be beneficial. It is advisable that the course is preceded with our “Applied Data Science with R”.
IT recommendations: In order to benefit from the contents of the course it is recommended that attendees have the most recent version of R and RStudio software installed on their personal/company laptops (any operating system). As R is a free environment you can download it directly from www.r-project.org website and RStudio is available at https://www.rstudio.com/products/rstudio/#Desktop. Please contact us should you have any questions related to the installation process or should you wish to use a different setup for your course.
Programme outline
The programme for each in-house training course is discussed and agreed individually with the client. The proposed contents of the course may include (but is not limited to) the following concepts and topics:
Understand and appreciate the motivation for Statistical Disclosure Control methods from data science, data protection and legal perspectives,
Apply a variety of modern data science techniques to process microdata in R language and its third-party packages for data manipulations and transformations,
Differentiate between data types and classes of variables from data science and SDC perspectives; implement SDC workflows in varying disclosure scenarios depending on selection of key categorical and continuous variables,
Generate contingency tables and estimate sample and population frequencies,
Perform calculations of individual, cluster and global disclosure risks,
Carry out essential Statistical Disclosure Control methods in R language such as recoding, local suppression, micro-aggregation and post-randomisation,
Calculate the effect of applied SDC modifications on individual and global risk of perturbed datasets, their information loss and data utility,
Report and communicate the results of SDC interventions,
Apply the above SDC methods to special cases e.g. datasets coming from different file formats (including proprietary tools such as Stata and SPSS), multiple datasets which can or cannot be linked together, Big Data etc.
Customise the course
We can adapt our in-house training courses to address your specific needs and requirements e.g.:
The course can be designed to include your own data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems,
The course period can be spread across multiple weeks/months depending on your needs and availability – this will allow your delegates to revise and practise the learnt skills before the next session and provide them with additional time to internalise all presented material,
The course can include a custom project spread across several weeks/months with a follow-up session at the end of the period,
As all our in-house training courses are quoted individually, the final cost quotation will be based on several factors: the number of attendees, days of training (plus additional support/project guidance if needed), location of the training, complexity of IT setup and the extent of course customisation.
Arrange this course at your organisation
If you are interested in this in-house training course, please press Ask For Quote button in the top part of the page to enquire about and request a quote for this course based on your specific needs and desired outcomes of the training.
In your enquiry please include the following information:
contact details to a person who should receive the quote,
number of delegates you would like to train,
approximate number of days (or half-days) you would like to arrange the course for (including additional support/project guidance if needed),
location of the training venue,
any details on course customisation or specific topics you would like the course to address – most importantly, please indicate desired outcomes of the course if different then presented above,
any other questions you may have.