Redact Documents Before Feeding to AI 将文档送入 AI 前先脱敏
Strip personal data locally, get AI analysis with aliases, then map answers back to real names 在本地去除个人数据,使用别名获取 AI 分析,然后将回答映射回真实姓名
The Problem: AI Sees Everything You Paste
问题:AI 能看到你粘贴的一切
When you paste a contract into ChatGPT...
当你把合同粘贴到 ChatGPT 时...
Every name, address, phone number and financial detail is sent to OpenAI's servers. Even with "don't train" settings, data is retained for 30 days. For enterprise accounts, it's processed on shared infrastructure. Once the data leaves your device, you've lost control.
每个姓名、地址、电话号码和财务细节都被发送到 OpenAI 的服务器。即使开启了 "不训练" 设置,数据也会保留 30 天。对于企业账户,数据在共享基础设施上处理。一旦数据离开设备,你就失去了控制。
The Solution: Redact → AI → Reverse-Map
解决方案:脱敏 → AI → 反向映射
DocMask's approach: the AI never sees real names
DocMask 的方法:AI 永远看不到真实姓名
Replace "John Smith" with "Person_A" before pasting. The AI analyzes the document using aliases. When it responds "Person_A should sign by Friday", you reverse-map locally: "John Smith should sign by Friday". The AI produced a useful answer without ever seeing real personal data.
粘贴前将 "张三" 替换为 "Person_A"。AI 使用别名分析文档。当它回答 "Person_A 应在周五签字" 时,你在本地反向映射:"张三应在周五签字"。AI 在完全未接触真实个人数据的情况下给出了有用的回答。
Why Consistent Aliases Matter
为什么一致的别名很重要
Simple find-and-replace breaks when the same name appears in different contexts. DocMask uses consistent pseudonymization: "John Smith" is always "Person_A" throughout the entire document — in headers, body text, footers, and across multiple pages. The AI understands that Person_A in paragraph 1 is the same entity as Person_A in paragraph 47.
简单的查找替换在同一姓名出现在不同上下文时会出错。DocMask 使用一致的假名化:"张三" 在整个文档中始终是 "Person_A" — 包括标题、正文、页脚和跨页面。AI 理解第 1 段的 Person_A 与第 47 段的 Person_A 是同一实体。
- Names → Person_A, Person_B, Person_C...
- 姓名 → Person_A、Person_B、Person_C...
- Emails → [email protected], [email protected]...
- 邮箱 → [email protected]、[email protected]...
- Phone numbers → 000-000-0001, 000-000-0002...
- 电话号码 → 000-000-0001、000-000-0002...
- Addresses → 1 Placeholder St, 2 Placeholder St...
- 地址 → 1 Placeholder St、2 Placeholder St...
The mapping is encrypted with AES-256-GCM and stored only on your device. No cloud, no sync, no data broker.
映射使用 AES-256-GCM 加密,仅存储在您的设备上。无云端、无同步、无数据中介。
Who Needs AI Document Redaction?
谁需要 AI 文档脱敏?
- Lawyers reviewing contracts with AI assistance — client names and case details must stay confidential.
- 律师使用 AI 辅助审查合同 — 客户姓名和案件详情必须保密。
- HR professionals analyzing employee records — GDPR/HIPAA requires PII protection.
- HR 专业人员分析员工记录 — GDPR/HIPAA 要求保护个人信息。
- Financial analysts processing client reports — regulatory compliance demands data minimization.
- 金融分析师处理客户报告 — 监管合规要求数据最小化。
- Researchers working with survey data or medical records — IRB protocols require de-identification.
- 研究人员处理调查数据或医疗记录 — IRB 协议要求去标识化。
- Anyone who pastes work documents into ChatGPT and wonders "should I be doing this?"
- 任何人把工作文档粘贴到 ChatGPT 时想过 "我应该这样做吗?"
Frequently Asked Questions
常见问题
Is it safe to paste documents into ChatGPT? 将文档粘贴到 ChatGPT 安全吗?
By default, text pasted into ChatGPT may be used for model training (unless you opt out or use the API). Even with training disabled, the text is processed on OpenAI's servers and retained for 30 days for abuse monitoring. If the document contains names, addresses, or medical/financial data, you should redact PII before pasting.
默认情况下,粘贴到 ChatGPT 的文本可能用于模型训练(除非您选择退出或使用 API)。即使禁用训练,文本仍在 OpenAI 服务器上处理,并保留 30 天用于滥用监控。如果文档包含姓名、地址或医疗/财务数据,您应在粘贴前脱敏个人信息。
How do I protect personal data when using Claude? 使用 Claude 时如何保护个人数据?
Anthropic's Claude processes your input on their servers. While Claude doesn't use conversations for training by default, the data still leaves your device. Use DocMask to replace real names with aliases before pasting — Claude analyzes the redacted version, then you restore real names locally.
Anthropic 的 Claude 在其服务器上处理您的输入。虽然 Claude 默认不使用对话进行训练,但数据仍会离开您的设备。使用 DocMask 在粘贴前将真实姓名替换为别名 — Claude 分析脱敏版本,然后您在本地还原真实姓名。
What is the difference between redaction and anonymization? 脱敏和匿名化有什么区别?
Redaction removes or masks sensitive information (traditional: permanent black bars). Anonymization transforms data so individuals can't be re-identified. DocMask does reversible pseudonymization — replacing real values with consistent aliases that can be mapped back locally, which is the ideal approach for AI workflows.
脱敏移除或遮盖敏感信息(传统方式:永久黑条)。匿名化转换数据使个人无法被重新识别。DocMask 执行可逆假名化 — 将真实值替换为可在本地映射回的一致别名,这是 AI 工作流程的理想方法。
Can AI still give useful answers with redacted documents? AI 还能对脱敏文档给出有用的回答吗?
Yes. DocMask uses consistent aliases (Person_A stays Person_A throughout the document), so the AI understands relationships and context. Structural analysis, summarization, and question-answering all work normally. You then reverse-map aliases to real names in the AI's response.
可以。DocMask 使用一致的别名(Person_A 在整个文档中保持不变),因此 AI 理解关系和上下文。结构分析、摘要和问答都正常工作。然后您在 AI 的回答中将别名反向映射为真实姓名。