30 November 2016

Hashing specific columns of CSV files

When working with Big Data, you'd come across many situations when data needs to be de-personalized before being processed or before handing it over to anyone even if it is done over an NDA because the data simply contains information that is too private. Perhaps mobile numbers, names and addresses of specific people.

If the data is in the form of a CSV file that you've been able to open in Excel and if the columns are well identified, then all you need is a simple program which can irreversibly hash the values stored in certain columns.

Note that this is not encryption. It's hashing. If data were encrypted, it would be possible to decrypt it. But with one-way-hashing, the data that is hashed cannot be restored into its original form.

I've created a free and open source Java program called hashCSV which will do this for you.




It's released under the MIT license, so you are free to fork the project and use it for personal or commercial use.

No comments: