Skip to content

Data Privacy in AI Research

Data privacy in AI research covers the security and confidentiality considerations that arise when using cloud-based AI tools with potentially sensitive research data.

Context & Background

Most AI tools send data to external servers for processing, raising significant concerns for researchers working with confidential data — including IRB-protected human subjects data, proprietary financial data from WRDS or similar providers, pre-publication research findings, and student records.

Key privacy considerations include:

  • Data transmission: What data leaves your machine when you use an AI tool?
  • Data retention: Do AI providers store your inputs for training or other purposes?
  • Compliance: Does AI tool use comply with IRB protocols, data use agreements, and institutional policies?
  • Local alternatives: When should you use local/on-premise models instead of cloud services?

Key Perspectives

Several sources emphasize the importance of understanding privacy settings before using AI with any research data. Claude Code, for example, offers zero-retention API options, while other tools may use inputs for model training by default.

Practical Implications

  • Read the terms of service: Understand what happens to data you send to AI providers
  • Use zero-retention modes: When available, enable settings that prevent data storage
  • Never send protected data: IRB-restricted data should generally not be processed by cloud AI
  • Consider local models: For sensitive data, use locally-running models (e.g., Ollama, llama.cpp)
  • Check data use agreements: WRDS and similar providers may restrict AI processing of their data
  • Document your approach: Include AI data handling in your research data management plan