Data Masking 101

Ronald Bradford
October 20, 2022

I continue to dig up and share this simple approach for production data masking via SQL to create testing data sets. Time to codify it into a post.

Rather than generating a set of names and data from tools such as Mockaroo , it is more practical to use actual data for a variety of testing reasons.

The SQL below is a self-explanatory approach of removing Personal Identifiable Information (PII) , but keeping data relevant. I use this approach for a number of reasons.

We are using production data rather than synthetic data. Data volume, distribution, and additional column values are realistic. This is a subset of an example, but dates and locations are therefore realistic
Indexes (and unique indexes) still work, and distribution across the index is adequate for searching. Technically the index will be a little larger in disk footprint.
You cannot reverse engineer the masked value into a real value with just this data set. An engineer in a test environment cannot obtain the underlying information.
If you identify an issue with data quality for any row of data, there is a way to present the uniqueness of that row. This enables a person with production access to match the underlying row. Of course, any unique identifier (auto increment or UUID) should also be modified to mask real data.

`
SELECT CONCAT(SUBSTR(first_name,1,2),REPEAT(’’,LENGTH(first_name)-2)) AS first_name,
CONCAT(SUBSTR(last_name,1,3),REPEAT(’’,LENGTH(last_name)-3),’ ‘, SUBSTRING(MD5(CONCAT(first_name,last_name)),1,6)) AS last_name,
CONCAT(SUBSTR(organization,1,3),REPEAT(’*’,LENGTH(organization)-3),’ ‘, SUBSTRING(MD5(CONCAT(organization)),1,6)) AS organization,
created, country
FROM customer
LIMIT 10;

+------------+--------------------+------------------+---------------------+---------+
| first_name | last_name | organization | created | country |
+------------+--------------------+------------------+---------------------+---------+
| Sa**** | Cor**** 4c23cd | Ski*** d21420 | 2022-09-20 03:30:14 | PH |
| Fu**** | Wat*** 8b97de | Jax***** e629c2 | 2022-04-08 03:20:22 | BY |
| Mo**** | Zis***** b11d94 | Rhy**** b4073a | 2022-10-06 15:58:38 | IR |
| So**** | Bad** 232cc2 | Rhy*** 1734bd | 2022-02-01 07:35:39 | ID |
| Ni***** | Ter***** d9ffb5 | Wor****** 6e476c | 2021-11-08 17:07:34 | IL |
| Ka****** | Scr***** 9201db | Jax**** 481fd8 | 2022-08-18 19:17:54 | BR |
| Li*** | Coz** 0447f6 | Nlo**** 11da59 | 2022-07-29 06:47:56 | HR |
| Ch***** | Hal******** f5d9c8 | Zoo**** c6e07d | 2022-09-28 04:54:30 | UA |
| Er****** | Ste******* d005f2 | Eid** ffc305 | 2022-04-28 18:50:11 | PT |
| Fo** | O'S***** b35c44 | Buz**** 2c8598 | 2022-09-11 02:05:55 | RU |
+------------+--------------------+------------------+---------------------+---------+
`

Tagged with: Data Data Best Practices Data Security MySQL

DataEngBytes Sydney Overview

Ronald Bradford
July 31, 2026

I have just returned from my first DataEngBytes conference in Sydney. Here is a quick summary on my thoughts for the event. I plan to go into more detail on great points from presentations in later posts.

MITRE ATLAS - A 2025 Review

Ronald Bradford
July 27, 2026

The MITRE ATLAS Matrix provides a working knowledge base of adversary tactics, techniques, and real-world case studies specifically for Artificial Intelligence (AI) and Machine Learning (ML) systems. It adapts the established MITRE ATT&CK adversary tactics and techniques matrix which is the standard reference for traditional IT cybersecurity.

Foundational Frameworks for GenAI Security Analysis

Ronald Bradford
July 24, 2026

Three frameworks form the analytical foundation of this series’ threat analysis, selected for their complementary coverage of the GenAI security landscape and their broad adoption across enterprise security practices. The MITRE Adversarial Threat Landscape for AI Systems (MITRE ATLAS) maps tactics and techniques specific to AI systems.

Data Masking 101

Related Posts

DataEngBytes Sydney Overview

MITRE ATLAS - A 2025 Review

Foundational Frameworks for GenAI Security Analysis