Sharing and publishing data with personal information

Research data containing information that can be linked directly, or indirectly by some kind of complementary means (e.g. a code key), to individual persons containing personal data. The Data Protection Regulation (GDPR) states that personal data must be protected with organizational and technical measures to ensure that unauthorised persons do not have access to the material and this means that this type of data is normally not allowed to be published or shared with unauthorized persons.

Only if certain conditions are met may personal data be disclosed or shared with others. It requires a legal basis, such as the use of the material in research, and that the data do not fall under confidentiality under the provisions of The Public Access to Information and Secrecy Act (2009:400). In the case of sensitive personal data, an approved ethical review is normally required by the party requesting the documents.

Please note that third country transfers of personal data (outside the EU/EEA area) may only take place if the recipient country ensures a high level of data protection.

If data is anonymised so that data can no longer be linked to individuals in any way, they are no longer personal data under the Data Protection Regulation and can normally be shared and published. However, formal anonymisation exists under the Data Protection Regulation only if there is no longer any code key or other possibility to attribute information in the data to individuals.

However, there are reasons why this type of irreversible anonymisation should be avoided:

  • In many studies there is a need to follow up participants in a longer perspective (ex. i longitudinella epidemiologiska studier).
  • In cases where published results need to be reviewed, all material in a study, including complete raw data, should be available.
  • Code keys or other documentation linking data to a person may belong to the type of documents that according to the Archives Act cannot be deleted until a certain time has elapsed.

As indicated in recital 26 below, the Data Protection Regulation imposes very high requirements on whether data based on personal data can be considered to be anonymised. However, even with anonymised data, one must take into account the risk that a combination of data in an otherwise anonymised material may lead to the re-identification of individuals. By combining values of different variables, e.g. occupation, disease, municipality and age, there is always a risk that individual persons can be identified even if direct identifiers are missing in the data.

To reduce the risk of identifying individuals, you can limit the number of variables and generalize values, for example by specifying a region instead of a city or age group instead of an exact year. Data then gets a certain degree of so-called k-anonymization, which means that the same value is shared by at least k persons and where k is an integer. Each combination of properties therefore occurs several times in a data set and fits into k-number of persons. K-anonymization can then be supplemented with measures such as l-diversity and t-nearness to further limit the risk of re-identification of individuals. A disadvantage of this type of generalization of values is, of course, that data can become less useful.

There are various tools that facilitate anonymization of data in data, for example ARX, sdcMicro and Amnesia. However, some types of data are difficult or impossible to anonymise, such as films, biometric data and genetic code.

The Statistics Department can provide guidance on anonymisation of personal data. Counseling is free of charge for researchers and doctoral students in the Faculty of Social Sciences. Other researchers within the university can get help at a cost and in terms of time.

Given that there is always a risk of re-identification of individuals, caution should be exercised when sharing and publishing anonymised data. In the case of anonymization, a standard benchmark is at least five as a k-value, but the conditions for different types of data vary. It is therefore important to make an assessment of the data being processed, as well as to justify and document on what grounds the chosen method and level of anonymisation can be considered sufficient.

More about anonymisation and publishing personal data:

When are personal data considered to be anonymised under the Data Protection Regulation (GDPR)?

Recital 26, which supports the interpretation of the Data Protection Regulation, states that:

The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person.

To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.

To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

Recital 26 (GDPR.eu)

FOLLOW UPPSALA UNIVERSITY ON

facebook
instagram
twitter
youtube
linkedin