How to secure sensitive personal data in any application– High Level architecture view

Recently, I read a few articles in the news that how millions of user accounts have been compromised and data breached. It is becoming the norm to be very honest. I am not so convinced with the enterprise’s architecture to store data especially sensitive data and the trade-off with more performance over resilience.

In this article, I’ve tried to cover the underlying problems and possible solutions.  It is not the silver bullet for the problem but at least the approach to guard sensitive information and make it harder to breach.

Okay, let us first understand.

What is sensitive personal data?

According to the ICO site,

“‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”.

This could be further elaborated in the very specific category but for simplicity, I would keep following common information take as an example.

  • Name
  • Email
  • Address

Now, Let’s take some simple view of the information saved in the data.

Unique Identity Name Email Address
1 Ray Jones r.jone@somesite.com 123, example park avenue

The above example is a typical tabular formatted representation of sensitive personal information. And most enterprises are giving the privilege to bulk view or edit to DBAs and system admins. And guess what it starts to become vulnerable from that point onwards. I am not going to elaborate on any individual or anything related to personal but the approach by-design to make it harder to breach.

How?

Let me explain by the following example

Instead of one tabular format with four columns, we should distribute it in four different tables. As shown below:

Table Name Unique Identity

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc 1

Table Name : ‘Name’

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc Ray Jones

Table Name : ‘Email’

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc r.jone@something.com

Table Name : Address

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc 123, example park avenue

 

Still, it’s raw data can be easily identified as sensitive personal information.  Right. How about the table’s value column data encrypted with a unique key for each record on all the three objects (name, email address). Voila.

So now our tables are looked like as given below:

Table Name Unique Identity

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc 4564saxdr

Table Name : ‘Name’

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc xeasd4f65we31as

Table Name : ‘Email’

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc asdfiowekznjxjopiueruu9ysdflakspop[oz80784kdxmklvnwh3794ry809kcjfpw403msd

Table Name : Address

Unique Identifier Value
xxyz-abcd-aaa-bbbb-cccc asdf7e7rljkxcjvkljhioe4hryyuSWDYUF8R98KJL;A\SJDFOHWY4Y8934P[OKp[slkdasd74f891e4r8f2w354r35897sd47

 

Thus it will make sense. If you are bulk query against the data (in my humble opinion – you should never allow anyone to bulk query against the personal sensitive information. That is the highest privilege and should avoid at all cost. Period.) you will get only encrypted data without a key it is useless. But also in this case without a schema structure, it will not make any sense. For example, how to establish a relation between two entities such as name and email.

That is the second layer of abstraction.

Again, someone from an insider and has knowledge could able to breach those. Although it is not easy as a standard procedure is if a key person is leaving than must reset all the known keys and replace it. I know it is really hard sometimes to do so but again it should be in the enforce policies.

 

As mentioned above, the key for encryption should be also protected with the highest security level, and not only that but for each record, the key should be unique not shared with any other entities. I will explain some mechanism for that.  Of course, it will add extra overhead to the process but it will assure resilience and data protection at the highest level.

 

Now This could be still vulnerable if we kept all four tables on the same server or at the same location. Bad practices again.

 

What is the solution for that?

Create a microservice API for each object type and should not exposé  to the outer world also even with API access should be granted through the limited app and behind the proxies (should not able to locate the location of the data store.)

I have attached a quick diagram for entire architecture as below:

 

sensitive-personal-information-datastore-architecture-diagram

As you can see each microservice is responsible for such a simple operation like query single data entity and pass it to the relevant requester services.

Now the consumer services are responsible for handling requests and generate responses by decrypting the data schema in the memory and dispose of it immediately after use. And these consumer services are also monitored by other management services for the limit rate of how many queries can be served against the live authenticated session. Thus this will helps to give an early warning against the breach of the authorization and prevent it from happening.

If you have any queries regarding how to handle or plan your digital transformation for your legacy application and want to consult, please feel free to contact me.

    Leave a Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.