How eDiscovery is managing the challenge of what constitutes Personal Information


The terms Personal Information or Personal Data have been increasing in usage for a while, and with the recent focus on the European GDPR, and the Australian NDB (Notifiable Data Breaches).  This article considers these definitions from the perspective of document production in litigation and regulatory investigations – the process of eDiscovery. 

In eDiscovery we are used to redacting sensitive information - most typically legally privileged or commercially sensitive information.  However, prior to the commencement of the Royal Commission into the Financial Services Industry (FSRC) the volume of documents requiring redaction was fairly modest in most matters. This certainly changed with the FSRC, as all documents that are tendered at a public hearing are required to have contact information and customer names redacted. This has resulted in significant efforts in reviewing all documents that are going to be tendered to ensure that the documents have been adequately redacted.

What is Personal Information?

The Office of the Australian Information Commission sets out a very helpful guide on What is personal information. Types of personal information listed are:

  • sensitive information (including information or opinions about an individuals racial or ethnic origin;

  • political opinion;

  • religious beliefs;

  • sexual orientation;

  • criminal record;

  • health information;

  • credit information;

  • employee record; and

  • tax file number.

Under common examples, there are further definitions as:

Information about a person's private or family life:

  • a person's name;

  • signature;

  • home address;

  • email address;

  • telephone number;

  • date of birth;

  • medical records;

  • bank account details; and

  • employment details

Information about a person's working habits and practices:

  • a person's employment details:

    • work address and contact details;

    • salary;

    • job title; and

    • work practices as well as other business information such as if a sole trader has taken out a loan

Commentary or opinion about a person:

  • a referee's comments about a job applicant's career;

  • performance;

  • attitudes and aptitudes;

  • an opinion about an individual's attributes that is based on other information about them, such as an opinion formed about an individual’s gender and ethnicity, based on information such as their name or their appearance. This will be personal information about the individual even if it is not correct.

Information or opinion inferred about an individual from their activities, such as their tastes and preferences from online purchases they have made using a credit card, or from their web browsing history.

The Royal Commission into the Financial Services Industry has a practice guideline (PG4 Contact Information and customer names in documents to be tendered), which includes Personal Information such as:

  • direct telephone numbers;

  • email addresses;

  • residential addresses; and

  • signatures.

What is Personal Data?

Under the EU General Data Protection Regulation (GDRP) the definition of Personal Data includes  any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of a particular person, also constitute personal data.  Personal data that has been de-identified, encrypted or pseudonymised but can be used to re-identify a person remains personal data and falls within the scope of the law. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible.

The GDPR protects personal data regardless of the technology used for processing that data – it’s technology neutral and applies to both automated and manual processing, provided the data is organised in accordance with pre-defined criteria (for example alphabetical order). It also doesn’t matter how the data is stored – in an IT system, through video surveillance, or on paper; in all cases, personal data is subject to the protection requirements set out in the GDPR.

Examples of personal data

  • a name and surname;

  • a home address;

  • an email address such as;

  • an identification card number;

  • location data (for example the location data function on a mobile phone);

  • an Internet Protocol (IP) address;

  • a cookie ID;

  • the advertising identifier of your phone;

  • data held by a hospital or doctor, which could be a symbol that uniquely identifies a person.

Challenges for eDiscovery

With the increase in forms of AI and other automation in eDiscovery systems, there are some great tools with which we can identify personal information.   For simple data where there are patterns , it is relatively easy for an eDiscovery practitioner to automatically identify and redact personal information such as:

  • patterns of numbers like credit cards and ACNs

  • common textual strings like email addresses

In both of these examples a regular expression can be used that is designed to look for patterns of numbers: 4[0-9]{12}(?:[0-9]{3})?; or

for email addresses: ([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})

What can we do about it?

There are interesting developments in the dominant eDiscovery platforms with automatic 'search and redact', as well as ways to utilise keyword hit highlighting for key terms.

As you can see from the definitions above, personal information is a lot more complex than these simple patterns and a lot more subjective which would require significant levels of human oversight against both automated identification and redaction, and for the more subjective things such as medical records, commentary against a persons' career performance.


Matthew Golab, Head of Legal Informatics, Gilbert +Tobin, InfoGovANZ Advisory Board Member