Spaces:
Running
Running
File size: 2,119 Bytes
c350513 2862cfc c350513 2862cfc c350513 2862cfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
title: EpsteinWithAnomScore
emoji: 👁
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---
# Epstein Corpus Explorer (Space + Dataset split)
This Space is a read-only browser for a large SQLite corpus plus optional signal cards.
- Space: `cjc0013/EpsteinWithAnomScore`
- Dataset: `cjc0013/EpsteinWithAnomScore`
## Links
- Space: https://huggingface.co/spaces/cjc0013/EpsteinWithAnomScore
- Dataset file (DB): https://huggingface.co/datasets/cjc0013/EpsteinWithAnomScore/blob/main/corpus.sqlite
## What this app does
- Opens `corpus.sqlite` in read-only mode
- FTS keyword search (`chunks_fts`)
- Cluster browsing across runs (`cluster_summary`)
- Open any `uid` and view local context window (`order_index +/- k`)
- Optional Signals tab for method-sanitized signal cards (JSONL/CSV), then open linked chunks
## Core principle
Raw data is not modified here.
This app is for indexing, browsing, and narrowing search space.
Signal/anomaly values are triage hints, not proof.
## How DB loading works
Priority order:
1. `CORPUS_SQLITE_PATH` (if set)
2. Local paths like `./data/corpus.sqlite`
3. Download from dataset repo using:
- `DATASET_REPO_ID`
- `DATASET_FILENAME` (default: `corpus.sqlite`)
Recommended Space variables:
- `DATASET_REPO_ID = cjc0013/EpsteinWithAnomScore`
- `DATASET_FILENAME = corpus.sqlite`
- `DB_LOCAL_DIR = ./data` (optional)
## Optional Signals file loading
If you publish a signals file in the dataset, the app can load it automatically.
Supported names:
- `public_method_sanitized_topN.jsonl`
- `public_top_signals.jsonl`
- CSV variants of the same names
Priority order:
1. `METHOD_SIGNALS_PATH` (if set)
2. Common local paths (`./data`, `./dataset`, `/data`)
3. Download from dataset repo with:
- `METHOD_SIGNALS_DATASET_REPO_ID`
- `METHOD_SIGNALS_FILENAME`
Recommended variables (if signals are in same dataset repo):
- `METHOD_SIGNALS_DATASET_REPO_ID = cjc0013/EpsteinWithAnomScore`
- `METHOD_SIGNALS_FILENAME = public_method_sanitized_topN.jsonl`
```txt
gradio>=4.0.0
huggingface_hub>=0.20.0 |