Data Privacy in AI Research¶

Data privacy in AI research covers the security and confidentiality considerations that arise when using cloud-based AI tools with potentially sensitive research data.

Context & Background¶

Most AI tools send data to external servers for processing, raising significant concerns for researchers working with confidential data — including IRB-protected human subjects data, proprietary financial data from WRDS or similar providers, pre-publication research findings, and student records.

Key privacy considerations include:

Data transmission: What data leaves your machine when you use an AI tool?
Data retention: Do AI providers store your inputs for training or other purposes?
Compliance: Does AI tool use comply with IRB protocols, data use agreements, and institutional policies?
Local alternatives: When should you use local/on-premise models instead of cloud services?

Key Perspectives¶

Several sources emphasize the importance of understanding privacy settings before using AI with any research data. Claude Code, for example, offers zero-retention API options, while other tools may use inputs for model training by default.

Practical Implications¶

Read the terms of service: Understand what happens to data you send to AI providers
Use zero-retention modes: When available, enable settings that prevent data storage
Never send protected data: IRB-restricted data should generally not be processed by cloud AI
Consider local models: For sensitive data, use locally-running models (e.g., Ollama, llama.cpp)
Check data use agreements: WRDS and similar providers may restrict AI processing of their data
Document your approach: Include AI data handling in your research data management plan

Data Privacy in AI Research¶

Context & Background¶

Key Perspectives¶

Practical Implications¶

Related Concepts¶