Can GPT-4o be trusted with confidential data?

We delve into Open AI's privacy policy and find out why experts have dubbed GPT-4o the "data turbo vacuum cleaner".

On May 13, OpenAI released a new AI model GPT-4o. It has incredible capabilities and is much more human-like: it can solve equations, tell bedtime stories, and, according to the company, can determine emotions by facial expression.

OpenAI emphasizes that it strives to make access to its tools free for everyone. However, experts say that along with the expansion of GPT-4o's capabilities, the amount of data the company can access has also increased. This creates privacy risks for users.

OpenAI's reputation in terms of user data protection is hard to call impeccable. After the launch of ChatGPT in 2020 and the publication of a technical paper, it was revealed that millions of pages of Reddit posts, books, and the web in general were scraped to create the generative text AI system, including personal data shared by users online. Because of this, ChatGPT was temporarily banned in Italy last year, drawing the attention of data protection authorities.

Shortly before the launch of GPT-4o, the company released a demo of the desktop application ChatGPT for macOS, from which it became clear that the chatbot would be able to access the user's screen. And in July, the same application was again criticized: it turned out that due to a security issue, it was easy to find saved chats on the computer and read them in unencrypted form.

OpenAI quickly released an update that encrypts chats, but given the current level of public attention to the company and GPT-4o, it is easy to understand why people are so concerned about privacy.

So how confidential is the new iteration of ChatGPT? Is it any worse in this regard than previous versions? And can the user limit its access to data?

OpenAI Privacy Policy

The OpenAI privacy policy clearly states that the model collects large amounts of data, including personal information, usage data, and content transmitted to it. ChatGPT will by default collect all this data to train its models unless you disable the corresponding setting in the parameters or switch to the corporate version of the solution.

OpenAI states in its privacy policy that users' personal data is "anonymized." But according to Angus Allan, in reality, the company is more likely to follow the principle of "collect everything we can get our hands on first, and then figure it out later." Angus works as a senior product manager at the consulting firm CreateFuture, which helps companies use AI and data analytics. "OpenAI's privacy policy clearly states that it collects everything the user inputs and reserves the right to train its models on this data."

According to Allan, the broad concept of "user content" likely includes images and voice data as well. "It's a real data turbo-vacuum, and the policy spells it all out very clearly. With the release of GPT-4o, the policy hasn't undergone significant changes, but given the model's expanded capabilities, the scope of what is considered 'user content' has greatly increased."

OpenAI's privacy policies state that ChatGPT does not have access to data on the user's device, except for what was entered in the chat. However, according to Jules Lava, by default, ChatGPT collects a lot of user data. Jules is the founder of Spark, a consulting firm that helps companies use AI tools in their workflows. "It uses everything from prompts and responses to email addresses, phone numbers, geolocation data, network activity, and device information."

Open AI claims that the data is used to train the AI model and improve its responses, but the policy conditions allow the company to share users' personal information with its affiliated organizations, service providers, and law enforcement agencies. "Therefore, it is difficult to understand where your data will end up," says Love.

According to data scientist Bharat Thota, the data collected by Open AI includes full names, account credentials, payment card information, and transaction history. "Personal information can also be stored, especially if the user uploads images as part of the prompts." Similarly, if a user decides to connect to the company's pages on social networks such as Facebook, LinkedIn, and Instagram, personal information may also be collected when transmitting contact information.

Machine learning specialist Jeff Schwarzentruver notes that OpenAI uses consumer data but does not sell advertising. "Instead of advertising, the company provides tools, and this is an important distinction. The data entered by the user is not directly used as a commodity. They are used to improve services, which benefits the user, but at the same time increases the value of OpenAI's intellectual property."

Privacy Management

After being criticized and embroiled in privacy scandals following the launch of Chat GPT in 2020, OpenAI has implemented tools and management methods to protect user data. OpenAI claims that it "strives to protect people's privacy."

In particular, in the case of ChatGPT, OpenAI stated that it understands the reluctance of some users to share their information to improve models, so it provides them with ways to manage data. "ChatGPT Free and Plus users can easily manage whether their data contributes to model improvement in the settings," the company's website says. It also states that by default, training is not performed on API user data, ChatGPT Enterprise, and ChatGPT Team.

"We have provided ChatGPT users with various ways to manage privacy, including an easy way to opt out of training our AI models and a temporary chat mode that automatically deletes chats," OpenAI spokeswoman Taya Christianson told WIRED.

The company stated that it does not collect personal information to train its models, nor does it use public information from the internet to create profiles of people, targeted advertising, or to sell user data.

The FAQ on voice chats on the OpenAI website states that audio clips from voice chats are not used to train models unless the user chooses to submit audio to "improve voice chats for all users."

"If you submit audio from voice chats to us, we will use it to train models," the same FAQ says. In addition, depending on the user's choice and subscription plan, the model may also be trained on transcribed chats.

In recent years, OpenAI has "to some extent" increased transparency in data collection and usage by providing users with options to manage privacy settings. This is stated by Rob Cobley, a commercial partner at the law firm Harper James, which provides legal support on data protection issues. "Users can access their personal information, update or delete it, which gives them control over this data."

The easiest way to maintain data privacy is to go to personal settings and disable data collection.

Angus Allan recommends "almost everyone" to spend a few minutes as soon as possible opting out of model training. "This will not remove your data from the company's platform, but it will not be used to train future models, during which a leak may occur."

To opt out of model training, go to Settings, Data Controls, and disable Improve the model for everyone.

Another way to prevent data collection by OpenAI is to use only the temporary chat. Click on ChatGPT in the top left corner, then enable Temporary Chat at the bottom of the list.

However, disabling data collection limits functionality. The model will not remember anything from your previous chats, so the responses will be less accurate and with fewer nuances.

In the ChatGPT web interface, users can delete their chat history, add a personalized instruction to help maintain privacy, manage all transmitted links, make data export requests, and delete their account. For additional security, you can also add multi-factor authentication and the ability to log out of your account on all devices.

When working with ChatGPT, it is generally worth thinking more often about the security of your data. For example, when using Custom GPT, you may inadvertently give access to your confidential data.

You can also manage data when interacting with the chatbot by initially choosing what content you share with ChatGPT-4. According to experts, the difficulty lies in finding a compromise between ensuring privacy and optimizing ease of use. If you limit data transmission when using ChatGPT, the interaction experience with AI will worsen: relevance, accuracy, and personalization of responses will decrease, as AI will have to rely on more limited and generalized algorithms.

Comments