- Security
- A
How personal data leaks from non-prod: four real scenarios
Today Dmitry Larin, head of the database protection product line, and Anastasia Komarova, product marketing manager at Garda, are in touch.
We regularly communicate with IT and information security teams, including during pilots and interviews at industry events. Perhaps some of our readers have already answered our questions or participated in such discussions.
Over time, as a result of this dialogue, a picture has emerged: the specifics vary across different companies and industries, but the overall approach to data is almost always the same. Production is closed and under strict control, while everything happening around it is perceived as less critical. However, in practice, the most unpleasant situations arise not in production, but around it. In testing environments, dumps, BI dashboards, sandboxes, and temporary projects. It is there that personal data starts to live its own life. They are copied, transferred, forwarded, and reused. If you work with test databases, you have likely seen something similar.
To illustrate how this happens, let’s analyze several scenarios from the practice of a large online retailer.
Details about the hero of our story
The retail company operates nationwide, has several product teams, a large IT block, and about 1500 employees. The website and mobile application process tens of thousands of orders daily, and the customer base consists of hundreds of thousands of active users. During sales and holiday periods, the load increases significantly. Any mistake in order processing quickly stops being a “technical” issue and begins to directly affect the business.
The situations described below are typical. We see them not only in retail but also in banks, fintech, telecom, and e-commerce. Everywhere there are complex IT systems and large arrays of personal data.
Why do developers need "almost" real data during testing?
On synthetic or demonstration data, it's possible to verify that the "button is pressed," but it’s impossible to catch errors that only manifest in user scenarios. For example, a client may have not just one, but several delivery addresses. There could be returns, repeated deliveries, overlapping bonuses and promo codes. Or there may be rare combinations of order statuses, or other non-obvious edge cases in integrations.
Therefore, many companies have a practice where test environments are periodically "refreshed" with data from production. Direct access to the live database or its full replica with personal data is not granted at the first request. A request must be submitted, stating the purpose, system, timeframe, level of access, and approvers. Often, data owners, information security, and sometimes even lawyers get involved.
If all approvals are obtained, access is granted temporarily and strictly within the approved role. All actions are logged. From the perspective of protecting production, this scheme really works.
Problems begin when companies need to act quickly. For example, when someone "urgently needs fresh data, otherwise the bug cannot be reproduced." This begins to create a conflict between the speed of responding to business requests and data security.
Why is it often impossible to quickly gain access to production?
In calm periods, access to production can be obtained in a couple of days, but during peak times, the "run around the offices" can easily stretch into a whole week. The thing is, the request goes through a whole chain of approvals, and one has to deal with the fact that each responsible person has their own tasks. Thus, the needed employee may be on vacation, or the decision-maker may be in an urgent meeting.
And when all acceptances are received, database administrators (DBA), who physically grant access or make extracts, may simply not have a free slot for you. Moreover, every "okay" has to be obtained by literally pushing through a mass of clarifying questions: "which tables," "which fields," "why can't we take anonymized data."
Scenario 1. "A bug that doesn't appear in synthetic data"
Complaints began to arrive at the retailer's support and monitoring services. Some customers experienced a sudden "jumping" delivery address when placing an order: sometimes an outdated address appeared in the form, and other times the order was sent to a different region. This issue was not widespread, but it occurred with enough regularity to start affecting business metrics.
When the logs were analyzed, it became clear that in some scenarios the address was pulled as if the system were taking it from another customer record (as if the connections in the CRM were mixed up). This bug did not reproduce on synthetic test data and on the demonstration stand. The error only manifested in the production database.
In an effort to resolve the issue more quickly, a backend team member asked the DBA to update the test database with a fresh dump from production. In the email, he wrote: "The bug does not reproduce on the current data. Therefore, if we don't update quickly, we will be searching for the problem blindly for a long time." The DBA agreed and fulfilled the request.
Why does a copy of production not appear to violate security policies?
Formally, the administrator did not provide direct access to the live system; it was a copy in the testing environment, where only developers worked.
In such cases, DLP and other information security systems may not trigger, as they are primarily focused on the production perimeter, email, and corporate file storage. Meanwhile, testing segments are often less protected: they are considered "internal," and excessive control there quickly starts to hinder the development process.
How and why can a dump spread uncontrollably within a company?
After obtaining fresh data, the employee reproduced the bug and realized that the problem arose from atypical changes in the customer profile. To figure it out more quickly, he involved a colleague responsible for the profiles and addresses module in the CRM, asking him to run the scenario locally without wasting time obtaining access to the test environment. The quickest way to do this turned out to be taking a dump of part of the database and transferring the file directly. That's what they did. The archive was transferred via a regular "shared folder," bypassing any permissions.
Today, such situations occur regularly. At the same time, none of the employees consciously violate the rules. It's just that at some point, speed and convenience outweigh formalities. However, often these "temporary" copies of databases become sources of leaks: a dump is downloaded to a personal laptop, and a forgotten database lives for years in a test segment without control. The information security does not see these streams.
Scenario 2. "Sending Personal Data of Clients to the Contractor as an Example"
The company needed to integrate an external service (a bonus system) into its IT infrastructure. The contractor could not reproduce the error with their data and requested to send them information about several clients as an example.
The project was important, and deadlines were tight. To avoid missing deadlines, part of the dump was handed over to the contractor. Formally, it was verified: there was a contract signed with him and an NDA. However, in fact, the data ended up outside the perimeter of the client company. As everyone knows, everything that goes beyond the perimeter ceases to be controlled. And one can no longer be completely sure that the contractor will diligently delete the dump and not make a backup and place the data in internal repositories.
This practice is widespread. Therefore, even when all legal formalities are observed, the company ceases to manage the lifecycle of personal data immediately after it is transferred. The contractor may be conscientious, but there are no guarantees that his infrastructure is adequately protected. And if an incident occurs, the responsibility will fall on the one who transferred the data — that is, on the client.
Scenario 3. "Creating a BI Showcase Based on Real Data"
While the developers were fixing another bug in the CRM, the business urgently needed data on how many clients returned after a return, and which segments were the most loyal. The analyst created a temporary showcase in BI. They did not agree on access to production; after all, the report for management was needed urgently. Therefore, the data for creating the showcase was taken from the test database.
The dashboard has been published, and access has been granted to top managers, salespeople, marketers, and product teams. The BI system in the company was perceived as an analytical tool rather than a full copy of the database. Therefore, almost any employee could export information from the dashboard to Excel, and then the files began to live their own lives: they were sent to Slack, Teams, personal folders, and presentations for investors.
Our experience shows that such showcases and exports are one of the most common points where personal data leaks occur. Moreover, no one considers it a violation to create reports based on real personal data. "It's just a report," employees think. But from a legal standpoint, this is uncontrolled distribution of personal data. Unlike production, such copies almost never come to the attention of information security.
Scenario 4. "Abandoned database with real personal data in the testing environment"
The company launched an annual planned information security audit. During the inventory of servers in the test segment, an old database named crm_test_old was discovered. The last update to the database was made over a year ago.
We began to investigate. It turned out that this was a sandbox for PoC. The data was taken from production, the project was closed, and the server was forgotten. The DBA who set up the database no longer works at the company. There is no documentation, and there is no one to ask.
Inside the database were real personal data without any protection. The server was unpatched, passwords were old, and access to the database was not restricted in any way. Log analysis showed that there were connections, but it was impossible to recover who exactly used the data.
The most unpleasant thing in such a situation is not even that a leak definitely occurred, but rather that, in case of an incident, the company will find it problematic to prove otherwise.
How to organize data protection?
After such stories, the conclusion is clear: we cannot control what is constantly created manually. Often, a leak begins not with an attack, but with the desire to do things more quickly.
Undoubtedly, copies in non-production will always exist. They cannot simply be prohibited, as this could slow down many vital business processes. However, we can ensure that these database copies do not initially contain personal data. This can be achieved through masking.
Data masking reduces damage from the human factor. Even if the database is forgotten, even if access is granted more broadly than necessary, even if a dump has gone beyond the perimeter. It is impossible to directly identify the client through masked data. Therefore, companies are increasingly coming to the need for using Data Masking solutions, such as Garda Data Masking. This allows teams to provide working data without spreading personal data across tests, showcases, and contractors.
Write comment