Answered By: Danielle Abrahamse
Last Updated: Jun 17, 2021     Views: 304

De-identification is the removal of information that can be used to identify individuals (usually limited to research participants, but which may include other individuals referred to in the data).

The first step is to remove all direct identifiers - pieces of information that can be used in isolation to identify an individual. Direct identifiers include things like names, email addresses, phone numbers, ID numbers (including staff numbers), etc. In quantitative datasets, often this will be as simple as removing the fields containing such information.

The second step is to remove all indirect identifiers - pieces of information that can be combined with other information to identify someone. An example of a set of indirect identifiers could be the following: an individual's position in an institution, combined with the date the data was collected, can sometimes be enough to identify an individual. Indirect identifiers are more common in qualitative data and are considerably harder to de-identify.

There are various tools that offer de-identification or anonymization. One such tool that can be used is Amnesia.