Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.3.0
Contributing to the Epstein Estate Document Dataset
I welcome contributions that improve the accessibility and cleanliness of this dataset. However, due to the sensitive nature of the content, I have strict guidelines for pull requests.
What I Accept
- OCR Corrections: Fixes to typos resulting from the Tesseract conversion (e.g., correcting "1lI" confusions), provided they match the original image source.
- Metadata improvements: Adding structured data (dates, document types) to the CSV index.
- Formatting: Improving the readability of markdown files without altering the semantic content.
What I Do Not Accept
- PII Restoration: Do not submit PRs that attempt to "fill in" redacted names or addresses.
- Speculative Annotations: Do not add commentary, theories, or external context directly into the document text files. Keep annotations in separate metadata fields.
- Fine-tuned Models: Do not upload LoRAs or model weights trained on this data.
How to Submit
- Fork the repository.
- Make your changes to the text or CSV files.
- Submit a Pull Request with a clear description of the fix.
- Reference the original filename/page number in your PR description for verification.