Anonymising your research data
17/03/2017

In our last RDM blog post we discussed data protection in general; now let’s take a further look at anonymisation. The main message is to think about it early, planning well to ensure that data is safe, secure, and handled in line with the Data Protection Act and any funder expectations. Here are a few questions to ask yourself…
Is your consent right? Of course, one aspect is making sure your consent form covers which data will be shared and which destroyed; you must be careful with the wording. “Anonymised” data means it is impossible to identify an individual from that dataset, or from that dataset in conjunction with other data available. It’s not easy, so we must be careful about terms such as “anonymous” or “de-identified” and making guarantees. The ICPSR recommendations suggest stating exactly which data elements will be removed before sharing.
How will you collect and store personal data? Make sure you’re only collecting what is absolutely necessary, and that only authorised users have access to it. This may mean storing it on a device with a login (e.g. on your University network drive). Remember that if you make a backup on an external hard drive, for example, that device should also have a login.
If you’re collecting any sensitive personal data, you must also use encryption, in line with ICO recommendations; contact the IT Service Desk for installation and instructions on our encryption software. The definition of “sensitive personal data” is personal data relating to someone’s race/ethnicity, political or religious beliefs, trade union membership, physical/mental health, sexual life, or offences (committed or alleged).
Personal data, especially sensitive data, should not be stored on unsecured devices – mistakes can harm individuals and also be expensive (a police force was fined £120,000 for saving personal data on a memory stick which was then stolen!).
Which data needs sharing? Does your research have any funding from bodies that require you to preserve/share your data at the end of the project? For example, RCUK councils expect data to be preserved for usually 10 years since their last use, and shared as openly as possible, with any restrictions outlined in the early stages of the project and fully justified. Often, personal data can be shared as long as it is sufficiently anonymised; the UK Data Archive (usually the repository for ESRC projects) also offers a Secure Lab option, for controlled access to data that is not able to be shared openly.
Are you deleting data securely? As soon as research no longer requires the identifiable portions of data, these should be removed and future research should use de-identified or anonymised data. The version containing personal information (the key), may either need destroying, or retaining securely. If you are destroying it, contact the IT Service Desk for secure erasure, as just deleting a file will not prevent it being accessible on a device. Whilst the responsibility to ensure deletion of personal research data ultimately lies with the head of department, the nominated data manager for the project may be assigned the task of arranging deletion, depending on responsibilities laid out in your data management plan.
What anonymisation steps will you take? The UK Anonymisation Network has an excellent resource if you’re new to anonymisation: the anonymisation decision-making framework (pdf). You’ll probably need to think about removing direct identifiers (name, address, photos…) as well as removing/editing indirect identifiers, which can identify individuals when combined with other data (e.g. occupation and workplace). Some methods are:
-
Reducing precision: e.g. instead of birth dates, just give birth years.
-
Aggregating data: e.g. instead of job titles, use occupational categories (the ONS have a Standard Occupational Classification which can help); or instead of city, use geographical area.
-
Hide outliers: e.g. use income or age bands that do not highlight highest or lowest values, as these exceptional values can often identify individuals.
Audio-visual anonymisation is harder, especially if trying to automate it, as Google demonstrated recently when blurring out a cow’s face! It is preferable to obtain informed consent for reuse of audio-visual material, than to choose the route of blurring images or altering audio. However, you also need to be clear on how long the data will be kept for, and the process for withdrawing consent for the data to be reused.
Finally, is it worth anonymising your data? Let’s not forget that sometimes it isn’t! If the anonymisation work would be expensive, time-consuming, and greatly reduce the usefulness of the data, it is probably not worth doing. You would then need to consider whether the unanonymised version can be retained or reused under stricter access conditions. The UKAN framework compares data security to house security: if you make a fully secure house, with no doors or windows, its usefulness is reduced to nothing – there is always a balance to be found.
Public domain image from unsplash.com
Categories & Tags:
Leave a comment on this post:
You might also like…
Zotero: a powerful free tool for managing your references
Are you working on an assignment, research paper, thesis, or group project and need a way to stay on top of your references? Maybe you have tried using Mendeley to manage your references, but it ...
Sourcing company Betas in LSEG Workspace and Datastream
Following our introductory post on sourcing Betas, this post will go into a little more depth for those who may be seeking more complex data. Betas are accessible in LSEG's Workspace service, through Workspace itself, ...
You could save a life: The real impact of learning CPR
When writing this, my sister told me to tell you my age. I won't do that, but I will tell you that I was in my forties, with no previous heart problems when I ...
Need to create a reference list or citation quickly? Try MyBib or ZoteroBib!
Are you looking for a fast and free way to generate accurate citations and bibliographies for your assignments or research projects? Perhaps you've tried some reference management software and found that it wasn't really what ...
Downloading the FileOpen plugin for British Standards Online
You need to install and use a FileOpen plugin on your device to access any document you find on British Standards Online (BSOL). This protects BSOL’s digital assets from copying, piracy, and unauthorized sharing. You ...
Navigating Change from Private Sector to Humanitarian Supply Chain Management
Seven questions with alumna Miori Naito, Supply Chain Officer in Kenya on her inspiring career shift from commercial to humanitarian supply chain operations, the challenges and rewards of her bold move from Tokyo to ...