AI Tools Gather Your Data: How To Stay Informed

Like it or not, artificial intelligence has become part of daily life. Many devices - including electric razors and toothbrushes - have become "AI-powered," using machine learning algorithms to track how a person uses the device, how the device is working in real time, and provide feedback. From asking questions to an AI assistant like ChatGPT or Microsoft Copilot to monitoring a daily fitness routine with a smartwatch, many people use an AI system or tool every day.

Author

Christopher Ramezan
Assistant Professor of Cybersecurity, West Virginia University

While AI tools and technologies can make life easier, they also raise important questions about data privacy . These systems often collect large amounts of data, sometimes without people even realizing their data is being collected. The information can then be used to identify personal habits and preferences, and even predict future behaviors by drawing inferences from the aggregated data.

As an assistant professor of cybersecurity at West Virginia University, I study how emerging technologies and various types of AI systems manage personal data and how we can build more secure, privacy-preserving systems for the future.

Generative AI software uses large amounts of training data to create new content such as text or images. Predictive AI uses data to forecast outcomes based on past behavior, such as how likely you are to hit your daily step goal, or what movies you may want to watch. Both types can be used to gather information about you.

How AI tools collect data

Generative AI assistants such as ChatGPT and Google Gemini collect all the information users type into a chat box. Every question, response and prompt that users enter is recorded, stored and analyzed to improve the AI model.

OpenAI's privacy policy informs users that "we may use content you provide us to improve our Services, for example to train the models that power ChatGPT." Even though OpenAI allows you to opt out of content use for model training, it still collects and retains your personal data . Although some companies promise that they anonymize this data, meaning they store it without naming the person who provided it, there is always a risk of data being reidentified.

Predictive AI

Beyond generative AI assistants, social media platforms like Facebook, Instagram and TikTok continuously gather data on their users to train predictive AI models. Every post, photo, video, like, share and comment, including the amount of time people spend looking at each of these, is collected as data points that are used to build digital data profiles for each person who uses the service.

The profiles can be used to refine the social media platform's AI recommender systems . They can also be sold to data brokers, who sell a person's data to other companies to, for instance, help develop targeted advertisements that align with that person's interests.

Many social media companies also track users across websites and applications by putting cookies and embedded tracking pixels on their computers. Cookies are small files that store information about who you are and what you clicked on while browsing a website.

One of the most common uses of cookies is in digital shopping carts: When you place an item in your cart, leave the website and return later, the item will still be in your cart because the cookie stored that information. Tracking pixels are invisible images or snippets of code embedded in websites that notify companies of your activity when you visit their page. This helps them track your behavior across the internet.

This is why users often see or hear advertisements that are related to their browsing and shopping habits on many of the unrelated websites they browse, and even when they are using different devices, including computers, phones and smart speakers. One study found that some websites can store over 300 tracking cookies on your computer or mobile phone.

Data privacy controls - and limitations

Like generative AI platforms, social media platforms offer privacy settings and opt-outs, but these give people limited control over how their personal data is aggregated and monetized . As media theorist Douglas Rushkoff argued in 2011, if the service is free, you are the product.

Many tools that include AI don't require a person to take any direct action for the tool to collect data about that person. Smart devices such as home speakers, fitness trackers and watches continually gather information through biometric sensors, voice recognition and location tracking. Smart home speakers continually listen for the command to activate or " wake up " the device. As the device is listening for this word, it picks up all the conversations happening around it , even though it does not seem to be active.

Some companies claim that voice data is only stored when the wake word - what you say to wake up the device - is detected. However, people have raised concerns about accidental recordings, especially because these devices are often connected to cloud services , which allow voice data to be stored, synced and shared across multiple devices such as your phone, smart speaker and tablet.

If the company allows, it's also possible for this data to be accessed by third parties, such as advertisers, data analytics firms or a law enforcement agency with a warrant.

Privacy rollbacks

This potential for third-party access also applies to smartwatches and fitness trackers, which monitor health metrics and user activity patterns. Companies that produce wearable fitness devices are not considered "covered entities" and so are not bound by the Health Information Portability and Accountability Act . This means that they are legally allowed to sell health- and location-related data collected from their users.

Concerns about HIPAA data arose in 2018, when Strava, a fitness company released a global heat map of user's exercise routes. In doing so, it accidentally revealed sensitive military locations across the globe through highlighting the exercise routes of military personnel.

The Trump administration has tapped Palantir , a company that specializes in using AI for data analytics, to collate and analyze data about Americans. Meanwhile, Palantir has announced a partnership with a company that runs self-checkout systems .

Such partnerships can expand corporate and government reach into everyday consumer behavior. This one could be used to create detailed personal profiles on Americans by linking their consumer habits with other personal data. This raises concerns about increased surveillance and loss of anonymity. It could allow citizens to be tracked and analyzed across multiple aspects of their lives without their knowledge or consent.

Some smart device companies are also rolling back privacy protections instead of strengthening them. Amazon recently announced that starting on March 28, 2025, all voice recordings from Amazon Echo devices would be sent to Amazon's cloud by default, and users will no longer have the option to turn this function off. This is different from previous settings, which allowed users to limit private data collection.

Changes like these raise concerns about how much control consumers have over their own data when using smart devices. Many privacy experts consider cloud storage of voice recordings a form of data collection, especially when used to improve algorithms or build user profiles, which has implications for data privacy laws designed to protect online privacy.

Implications for data privacy

All of this brings up serious privacy concerns for people and governments on how AI tools collect, store, use and transmit data. The biggest concern is transparency. People don't know what data is being collected, how the data is being used, and who has access to that data.

Companies tend to use complicated privacy policies filled with technical jargon to make it difficult for people to understand the terms of a service that they agree to. People also tend not to read terms of service documents. One study found that people averaged 73 seconds reading a terms of service document that had an average read time of 29-32 minutes.

Data collected by AI tools may initially reside with a company that you trust, but can easily be sold and given to a company that you don't trust.

AI tools, the companies in charge of them and the companies that have access to the data they collect can also be subject to cyberattacks and data breaches that can reveal sensitive personal information. These attacks can by carried out by cybercriminals who are in it for the money, or by so-called advanced persistent threats , which are typically nation/state- sponsored attackers who gain access to networks and systems and remain there undetected, collecting information and personal data to eventually cause disruption or harm.

While laws and regulations such as the General Data Protection Regulation in the European Union and the California Consumer Privacy Act aim to safeguard user data, AI development and use have often outpaced the legislative process. The laws are still catching up on AI and data privacy . For now, you should assume any AI-powered device or platform is collecting data on your inputs, behaviors and patterns.

Using AI tools

Although AI tools collect people's data, and the way this accumulation of data affects people's data privacy is concerning, the tools can also be useful. AI-powered applications can streamline workflows, automate repetitive tasks and provide valuable insights.

But it's crucial to approach these tools with awareness and caution.

When using a generative AI platform that gives you answers to questions you type in a prompt, don't include any personally identifiable information , including names, birth dates, Social Security numbers or home addresses. At the workplace, don't include trade secrets or classified information. In general, don't put anything into a prompt that you wouldn't feel comfortable revealing to the public or seeing on a billboard. Remember, once you hit enter on the prompt, you've lost control of that information.

Remember that devices which are turned on are always listening - even if they're asleep. If you use smart home or embedded devices, turn them off when you need to have a private conversation. A device that's asleep looks inactive, but it is still powered on and listening for a wake word or signal. Unplugging a device or removing its batteries is a good way of making sure the device is truly off.

Finally, be aware of the terms of service and data collection policies of the devices and platforms that you are using. You might be surprised by what you've already agreed to.

This article is part of a series on data privacy that explores who collects your data, what and how they collect, who sells and buys your data, what they all do with it, and what you can do about it.

Previous articles in the series:

How illicit markets fueled by data breaches sell your personal information to criminals

Christopher Ramezan receives funding from the Appalachian Regional Commission.

/Courtesy of The Conversation. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).