Anonymization

Introduction

Automatic Data Anonymisation, based in Natural Language Processing technologies, is the process of removing or replacing sensitive information from textual sources to protect individuals of being exposed. Moreover, it allows for analysing data while being compliant with the GDPR. Nowadays, this technology has improved its potential uses and results due to the advance of Deep Learning technologies.

Modules

Language Detection

Automatic text language detection

Data Detection

Automatic detection of pieces of text (entities) containing sensitive information

Data Classification

Automatic classification of detected entities into categories such as PERSON, LOCATION and so on

Anonymization

Obfuscation of sensitive entities by replacement for placeholders that can contain symbols (“XXX”), the sensitive data category, or words similar to the original

Markets

Medicine

To carry out the development of technological solutions and research in the field of medicine, it is essential to be able to share information that contains especially sensitive personal data

Legal

To analyze, detect and replace sensitive data in legal documents, such as court rulings, contributing to open-data and transparency

Public Administration

To promote the sharing of de-identified data without traceable personal details, making it GDPR compliant

Use Cases

MAPA

An European project

Development of a toolkit for effective and reliable anonymisation of texts in the medical, legal, and administrative fields in 24 languages. As a result, it will promote the feasibility of sharing de-identified data without traceable personal details, making it GDPR compliant.

Try it