Tracking Conversations:
Measuring Content and Identity
Exposure on AI Chatbots

Muhammad Jazlan  ·  Ethan Wang  ·  Yash Vekaria  ·  Zubair Shafiq

University of California, Davis

17/20
chatbots share data with ≥1 third party
47
unique third-party owners identified
3
chatbots expose plaintext prompts via session replay
178
(chatbot, third-party) pairs in normal sessions
0
content or identity leaks observed in private mode
Abstract

What We Studied & What We Found

AI chatbots have become a primary interface for seeking information online. As their popularity grows, providers increasingly deploy advertising and analytics infrastructure - raising questions about what happens to the sensitive conversations users type into these interfaces.

We present the first systematic measurement study of web tracking on 20 popular AI chatbots. Using controlled test accounts and a deliberately sensitive prompt ("pregnancy test near me"), we captured and analyzed HTTP network traffic to measure two categories of exposure: content (user prompts, prompt-derived titles, chat URLs, chat identifiers) and identity (names, email addresses, account identifiers, first-party cookies, IP addresses, User-Agent strings).

We find that 17 of 20 chatbots share information with at least one third party. Three services - Genspark, SeaArt, and ChatOn - transmit the full plaintext conversation text to Microsoft Clarity, a session replay service, during normal authenticated sessions. Microsoft's Copilot also embeds Clarity. While the conversation is tracked, it is not recorded in plaintext. Fifteen chatbots expose conversation URLs or identifiers to third-party advertising and analytics endpoints. Several expose user identity through support widgets, analytics tags, and error monitoring - including hashed email addresses typically used for cross-site tracking and targeted advertising.

We also evaluate private and temporary chat modes, finding they dramatically reduce tracking: from 178 observed (chatbot, third-party) pairs in normal sessions to just 13 in private mode, with zero content or identity exposure detected across all 10 services tested. Finally, we analyze privacy policy disclosures and find notable gaps between what services actually send and what they disclose.

Findings

Exposure Tracking Matrix

Which chatbots expose which types of information, and to whom. Click any row label to expand notes. Hover cells for per-service detail.

Third-party exposure First-party only Both first & third party Not observed (✗) Channels: U URL   B Body   H Header   C Cookie ★ supports private mode
Data Flows

Third-Party Data Flow Visualization

Flow diagram mapping which chatbots send data to which third-party services. Toggle between normal and private sessions to see the stark difference.

AI Chatbot
Advertising
Analytics / Session Replay
Other (Error Monitoring, etc.)

Hover over flows and nodes for details.

Notable Exposures

Case Studies

Three categories of serious data exposure discovered during the study. Expand each to read the full technical detail and view captured network payloads.

Mitigation

Private Mode Effectiveness

For the 10 services offering a private or temporary chat mode, tracking collapses almost entirely. No content or identity exposure was detected in any private session.

Normal Chats
178
(chatbot, third-party) pairs observed
17 of 20 chatbots share data. 47 unique third-party owners. Content and identity exposure detected across multiple services. Advertising, analytics, error monitoring, session replay all fire.
Private / Temporary Chats
13
(chatbot, third-party) pairs observed
Only 3 unique third-party owners remain (Datadog, Mapbox, Google). Zero identity or content exposure detected in any private session - across all 10 services that support this mode.

Private / Temporary Chat Support

Starred services were evaluated in both normal and private modes.

Transparency

Privacy Policy Analysis

All 20 privacy policies acknowledge general data practices, but most stop there. Only 8 name specific third-party recipients - and three services have a critical gap.

Duck.ai stands out: the only service studied with zero third-party data sharing. It actively strips user IP addresses from requests. OpenRouter and Claude provide the most comprehensive third-party disclosure among the services examined.
!
Critical policy gap - Genspark, SeaArt, ChatOn: these three services transmit plaintext conversation text to Microsoft Clarity (session replay), yet none of their privacy policies disclose Microsoft or Clarity as a data recipient.
Chatbot Names Specific Recipients Notable Gap
Methods

Study Methodology

A controlled measurement study using fresh test accounts, a single sensitive prompt, and comprehensive traffic analysis with 12+ encoding variants to catch hashed identifiers.

01
Fresh Accounts
New account per service, controlled test identity - known email, name, credentials
02
Sensitive Prompt
"pregnancy test near me" - health topic with implicit location, high privacy stakes
03
Traffic Capture
Full HTTP metadata, URLs, headers, request bodies, cookies via Chrome DevTools
04
Analysis Pipeline
Preprocess → search target definition → matching → party attribution

Chatbot Selection

20 popular AI chatbots spanning major US and international providers - closed-weight (ChatGPT, Claude, Gemini) and open-weight deployments (DeepSeek, Qwen), consumer and developer-oriented.

Prompt Design

"pregnancy test near me" was chosen for combining a sensitive health topic with an implicit location - a high-stakes query category where privacy exposure carries real-world consequences.

Encoding & Hash Coverage

Identity strings searched across 12+ encoding and hash variants: base64, URL-encoding, hex, MD5, SHA-1/256/512, SHA3, RIPEMD-160, CRC-32, Adler-32 - to catch hashed email identifiers.

Party Attribution

eTLD+1 matching classifies domains. Platform parties (e.g. Google for Gemini) are distinguished from independent third parties. Categories: Advertising, Analytics, Other.

Private Mode Evaluation

10 services support private or temporary chat. Each was evaluated identically to normal sessions and results compared to measure tracking reduction effectiveness.

Scope & Limitations

Web interfaces only - excludes mobile apps, extensions, and embedded chatbot deployments. Single controlled prompt. Chrome baseline without tracking protection. Measurement-only; does not evaluate user consent flows.

Citation

Cite This Work

Research Team

Authors

MJ
Muhammad Jazlan
UC Davis
mjazlan@ucdavis.edu
EW
Ethan Wang
UC Davis
ebwang@ucdavis.edu
YV
Yash Vekaria
UC Davis
yvekaria@ucdavis.edu
ZS
Zubair Shafiq
UC Davis
zubair@ucdavis.edu