epstein-files / CONTRIBUTING.md
theelderemo's picture
Update CONTRIBUTING.md
281115a verified
# Contributing to the Epstein Estate Document Dataset
I welcome contributions that improve the accessibility and cleanliness of this dataset. However, due to the sensitive nature of the content, I have strict guidelines for pull requests.
## What I Accept
* **OCR Corrections:** Fixes to typos resulting from the Tesseract conversion (e.g., correcting "1lI" confusions), provided they match the original image source.
* **Metadata improvements:** Adding structured data (dates, document types) to the CSV index.
* **Formatting:** Improving the readability of markdown files without altering the semantic content.
## What I Do Not Accept
* **PII Restoration:** Do not submit PRs that attempt to "fill in" redacted names or addresses.
* **Speculative Annotations:** Do not add commentary, theories, or external context directly into the document text files. Keep annotations in separate metadata fields.
* **Fine-tuned Models:** Do not upload LoRAs or model weights trained on this data.
## How to Submit
1. Fork the repository.
2. Make your changes to the text or CSV files.
3. Submit a Pull Request with a clear description of the fix.
4. Reference the original filename/page number in your PR description for verification.