What is Data De-identification?
Data De-Identification is the process of separating Personally Identifiable Data (PII) from the Protected Health Information (PHI) your system stores. It is the easiest way to become compliant without compromising your technical flexibility. The advantage is that you can store this de-identified data (PHI minus the PII) anywhere you want; the infrastructure and code that interacts with this data does not need to adhere to HIPAA. You have to be careful about how you de-identify the data, and you have to store the PII securely in a system like TrueVault. If you do this, you become compliant with a fraction of the work required to make your entire system compliant.
Compliant or “Compliant”?
You may wonder if this is above board, or a loophole in the law. Rest assured: this is totally sanctioned! The Department of Health & Human Services (who oversee HIPAA) give specific guidance on de-identification on their website. They explain which data are identifying (first name, last name, date of birth, etc), and have specific guidance about how/when you can re-identify.
You can get a legal perspective on this process in this blog post, guest-authored by a regulatory expert, JoAnna Nicholson.
If you want help seeing how this approach would work for your application, we’re happy to help. Send us a note and mention your interest in de-identification. We’ll walk you through everything.
De-Identifying Data
De-identification is an invisible process that your users never need to know about. Whenever you’re taking new data into your application, you decide which fields are identifiable and which are not. You send the identifiable fields to TrueVault using the Create Document endpoint. This endpoint returns an opaque random Document ID identifying this record in TrueVault. Next, you send this ID along with the de-identified data to your server.
Note that this all happens client-side. If you have a web app, this means your JavaScript code will split the data. If it’s an iOS or Android app, then your Swift or Java app code will do this de-identification.
Re-Identifying Data
Just like your users didn’t need to know that you were de-identifying data, they don’t have to be privy to the re-identification process either. When you want to show health data along side identifying data in your application, you load data from TrueVault and your server then merge it together. If you’re listing all diabetic patients assigned to the logged in Dr., then you’d first pull a list of patients with that condition from your system. Instead of returning their names, it would return their TrueVault Ids and whatever de-identified data you stored on your server. Then, you’d make a single request to TrueVault to read the several documents that store these patients’ names. Your application code can easily merge the results and you can display the unified set to your users.
Server-side or Client-side?
We recommend that you always de-identify and re-identify client side, not server side. Doing so can avoid HIPAA Compliance concerns on your server. A major advantage of de-identification using TrueVault to store PII is that your server environment does not need to be HIPAA Compliant. This is only true if your server only interacts with de-identified data. If your server ever touches the PII, even transiently, then your server must be HIPAA Compliant.
What does the code look like?
If you’ve ever used a credit card processing tool like Stripe or Authorize.net, then you’ve already practiced this pattern. Instead of storing credit card numbers on your server and dealing with PCI compliance, you send the credit card numbers to them and they give you back an token (ID) you can use to refer to that card in the future. This is the same process, but with PII instead of credit card numbers.
In practice, it adds some logic to your client application but saves you tons of work overall. We build an extensive, real-world sample application to give you a sense of how the final product might look. Check out the sample app on GitHub and let us know what you think. We’re always looking for ways to make this process easier, so if there’s anything documentation or samples you’d like to see, let us know.