Can GPT-4o be trusted with confidential data?

18:53
25.09.2024
breakmirrors
282

We delve into Open AI's privacy policy and find out why experts have dubbed GPT-4o the "data turbo vacuum cleaner".

On May 13, OpenAI released a new AI model GPT-4o. It has incredible capabilities and is much more human-like: it can solve equations, tell bedtime stories, and, according to the company, can determine emotions by facial expression.

OpenAI emphasizes that it strives to make access to its tools free for everyone. However, experts say that along with the expansion of GPT-4o's capabilities, the amount of data the company can access has also increased. This creates privacy risks for users.

OpenAI's reputation in terms of user data protection is hard to call impeccable. After the launch of ChatGPT in 2020 and the publication of a technical paper, it was revealed that millions of pages of Reddit posts, books, and the web in general were scraped to create the generative text AI system, including personal data shared by users online. Because of this, ChatGPT was temporarily banned in Italy last year, drawing the attention of data protection authorities.

Shortly before the launch of GPT-4o, the company released a demo of the desktop application ChatGPT for macOS, from which it became clear that the chatbot would be able to access the user's screen. And in July, the same application was again criticized: it turned out that due to a security issue, it was easy to find saved chats on the computer and read them in unencrypted form.

OpenAI quickly released an update that encrypts chats, but given the current level of public attention to the company and GPT-4o, it is easy to understand why people are so concerned about privacy.

So how confidential is the new iteration of ChatGPT? Is it any worse in this regard than previous versions? And can the user limit its access to data?

OpenAI Privacy Policy

The OpenAI privacy policy clearly states that the model collects large amounts of data, including personal information, usage data, and content transmitted to it. ChatGPT will by default collect all this data to train its models unless you disable the corresponding setting in the parameters or switch to the corporate version of the solution.

OpenAI states in its privacy policy that users' personal data is "anonymized." But according to Angus Allan, in reality, the company is more likely to follow the principle of "collect everything we can get our hands on first, and then figure it out later." Angus works as a senior product manager at the consulting firm CreateFuture, which helps companies use AI and data analytics. "OpenAI's privacy policy clearly states that it collects everything the user inputs and reserves the right to train its models on this data."

According to Allan, the broad concept of "user content" likely includes images and voice data as well. "It's a real data turbo-vacuum, and the policy spells it all out very clearly. With the release of GPT-4o, the policy hasn't undergone significant changes, but given the model's expanded capabilities, the scope of what is considered 'user content' has greatly increased."

OpenAI's privacy policies state that ChatGPT does not have access to data on the user's device, except for what was entered in the chat. However, according to Jules Lava, by default, ChatGPT collects a lot of user data. Jules is the founder of Spark, a consulting firm that helps companies use AI tools in their workflows. "It uses everything from prompts and responses to email addresses, phone numbers, geolocation data, network activity, and device information."

Privacy Management

After being criticized and embroiled in privacy scandals following the launch of Chat GPT in 2020, OpenAI has implemented tools and management methods to protect user data. OpenAI claims that it "strives to protect people's privacy."

In particular, in the case of ChatGPT, OpenAI stated that it understands the reluctance of some users to share their information to improve models, so it provides them with ways to manage data. "ChatGPT Free and Plus users can easily manage whether their data contributes to model improvement in the settings," the company's website says. It also states that by default, training is not performed on API user data, ChatGPT Enterprise, and ChatGPT Team.

"We have provided ChatGPT users with various ways to manage privacy, including an easy way to opt out of training our AI models and a temporary chat mode that automatically deletes chats," OpenAI spokeswoman Taya Christianson told WIRED.

The company stated that it does not collect personal information to train its models, nor does it use public information from the internet to create profiles of people, targeted advertising, or to sell user data.

The FAQ on voice chats on the OpenAI website states that audio clips from voice chats are not used to train models unless the user chooses to submit audio to "improve voice chats for all users."

"If you submit audio from voice chats to us, we will use it to train models," the same FAQ says. In addition, depending on the user's choice and subscription plan, the model may also be trained on transcribed chats.

In recent years, OpenAI has "to some extent" increased transparency in data collection and usage by providing users with options to manage privacy settings. This is stated by Rob Cobley, a commercial partner at the law firm Harper James, which provides legal support on data protection issues. "Users can access their personal information, update or delete it, which gives them control over this data."

The easiest way to maintain data privacy is to go to personal settings and disable data collection.

Angus Allan recommends "almost everyone" to spend a few minutes as soon as possible opting out of model training. "This will not remove your data from the company's platform, but it will not be used to train future models, during which a leak may occur."

To opt out of model training, go to Settings, Data Controls, and disable Improve the model for everyone.