Spaces:
Running
Running
| title: EpsteinWithAnomScore | |
| emoji: ๐ | |
| colorFrom: gray | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.5.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Epstein Corpus Explorer (Space + Dataset split) | |
| This Space is a read-only browser for a large SQLite corpus plus optional signal cards. | |
| - Space: `cjc0013/EpsteinWithAnomScore` | |
| - Dataset: `cjc0013/EpsteinWithAnomScore` | |
| ## Links | |
| - Space: https://huggingface.co/spaces/cjc0013/EpsteinWithAnomScore | |
| - Dataset file (DB): https://huggingface.co/datasets/cjc0013/EpsteinWithAnomScore/blob/main/corpus.sqlite | |
| ## What this app does | |
| - Opens `corpus.sqlite` in read-only mode | |
| - FTS keyword search (`chunks_fts`) | |
| - Cluster browsing across runs (`cluster_summary`) | |
| - Open any `uid` and view local context window (`order_index +/- k`) | |
| - Optional Signals tab for method-sanitized signal cards (JSONL/CSV), then open linked chunks | |
| ## Core principle | |
| Raw data is not modified here. | |
| This app is for indexing, browsing, and narrowing search space. | |
| Signal/anomaly values are triage hints, not proof. | |
| ## How DB loading works | |
| Priority order: | |
| 1. `CORPUS_SQLITE_PATH` (if set) | |
| 2. Local paths like `./data/corpus.sqlite` | |
| 3. Download from dataset repo using: | |
| - `DATASET_REPO_ID` | |
| - `DATASET_FILENAME` (default: `corpus.sqlite`) | |
| Recommended Space variables: | |
| - `DATASET_REPO_ID = cjc0013/EpsteinWithAnomScore` | |
| - `DATASET_FILENAME = corpus.sqlite` | |
| - `DB_LOCAL_DIR = ./data` (optional) | |
| ## Optional Signals file loading | |
| If you publish a signals file in the dataset, the app can load it automatically. | |
| Supported names: | |
| - `public_method_sanitized_topN.jsonl` | |
| - `public_top_signals.jsonl` | |
| - CSV variants of the same names | |
| Priority order: | |
| 1. `METHOD_SIGNALS_PATH` (if set) | |
| 2. Common local paths (`./data`, `./dataset`, `/data`) | |
| 3. Download from dataset repo with: | |
| - `METHOD_SIGNALS_DATASET_REPO_ID` | |
| - `METHOD_SIGNALS_FILENAME` | |
| Recommended variables (if signals are in same dataset repo): | |
| - `METHOD_SIGNALS_DATASET_REPO_ID = cjc0013/EpsteinWithAnomScore` | |
| - `METHOD_SIGNALS_FILENAME = public_method_sanitized_topN.jsonl` | |
| ```txt | |
| gradio>=4.0.0 | |
| huggingface_hub>=0.20.0 |