Spaces:
Running
Running
metadata
title: EpsteinWithAnomScore
emoji: 👁
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
Epstein Corpus Explorer (Space + Dataset split)
This Space is a read-only browser for a large SQLite corpus plus optional signal cards.
- Space:
cjc0013/EpsteinWithAnomScore - Dataset:
cjc0013/EpsteinWithAnomScore
Links
- Space: https://huggingface.co/spaces/cjc0013/EpsteinWithAnomScore
- Dataset file (DB): https://huggingface.co/datasets/cjc0013/EpsteinWithAnomScore/blob/main/corpus.sqlite
What this app does
- Opens
corpus.sqlitein read-only mode - FTS keyword search (
chunks_fts) - Cluster browsing across runs (
cluster_summary) - Open any
uidand view local context window (order_index +/- k) - Optional Signals tab for method-sanitized signal cards (JSONL/CSV), then open linked chunks
Core principle
Raw data is not modified here.
This app is for indexing, browsing, and narrowing search space.
Signal/anomaly values are triage hints, not proof.
How DB loading works
Priority order:
CORPUS_SQLITE_PATH(if set)- Local paths like
./data/corpus.sqlite - Download from dataset repo using:
DATASET_REPO_IDDATASET_FILENAME(default:corpus.sqlite)
Recommended Space variables:
DATASET_REPO_ID = cjc0013/EpsteinWithAnomScoreDATASET_FILENAME = corpus.sqliteDB_LOCAL_DIR = ./data(optional)
Optional Signals file loading
If you publish a signals file in the dataset, the app can load it automatically.
Supported names:
public_method_sanitized_topN.jsonlpublic_top_signals.jsonl- CSV variants of the same names
Priority order:
METHOD_SIGNALS_PATH(if set)- Common local paths (
./data,./dataset,/data) - Download from dataset repo with:
METHOD_SIGNALS_DATASET_REPO_IDMETHOD_SIGNALS_FILENAME
Recommended variables (if signals are in same dataset repo):
METHOD_SIGNALS_DATASET_REPO_ID = cjc0013/EpsteinWithAnomScoreMETHOD_SIGNALS_FILENAME = public_method_sanitized_topN.jsonl
gradio>=4.0.0
huggingface_hub>=0.20.0