Spaces:
Running
Running
| # Contributing to the Epstein Estate Document Dataset | |
| I welcome contributions that improve the accessibility and cleanliness of this dataset. However, due to the sensitive nature of the content, I have strict guidelines for pull requests. | |
| ## What I Accept | |
| * **OCR Corrections:** Fixes to typos resulting from the Tesseract conversion (e.g., correcting "1lI" confusions), provided they match the original image source. | |
| * **Metadata improvements:** Adding structured data (dates, document types) to the CSV index. | |
| * **Formatting:** Improving the readability of markdown files without altering the semantic content. | |
| ## What I Do Not Accept | |
| * **PII Restoration:** Do not submit PRs that attempt to "fill in" redacted names or addresses. | |
| * **Speculative Annotations:** Do not add commentary, theories, or external context directly into the document text files. Keep annotations in separate metadata fields. | |
| * **Fine-tuned Models:** Do not upload LoRAs or model weights trained on this data. | |
| ## How to Submit | |
| 1. Fork the repository. | |
| 2. Make your changes to the text or CSV files. | |
| 3. Submit a Pull Request with a clear description of the fix. | |
| 4. Reference the original filename/page number in your PR description for verification. |