Researcher / README.md
amarck's picture
Add HF Spaces support, preference seeding, archive search, tests
430d0f8
---
title: Research Intelligence
emoji: "\U0001F4E1"
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
---
# Research Intelligence
A self-hosted research triage system that monitors academic papers (AI/ML and Security) and trending GitHub projects, scores them with AI, and learns your preferences over time.
> **This HuggingFace Space is a demo.** Data is ephemeral and resets when the container restarts. For production use, deploy locally with Docker Compose and persistent storage β€” see instructions below.
## Features
- **Paper monitoring** β€” Fetches new papers from arXiv and HuggingFace daily/weekly
- **AI scoring** β€” Scores each paper on configurable axes (novelty, code availability, practical impact)
- **Preference learning** β€” Rate papers with thumbs up/down; the system learns what you care about and re-ranks accordingly
- **GitHub tracking** β€” Monitors trending repositories across curated collections
- **Event tracking** β€” Conference deadlines, releases, and RSS news feeds
- **Weekly reports** β€” Auto-generated markdown summaries of top papers
- **Dark-theme dashboard** β€” Fast, responsive web UI built with HTMX
## Deployment
### Docker Compose (recommended for production)
This is the intended deployment method. Your data persists across restarts via a local volume mount.
```bash
git clone https://github.com/yourname/researcher.git
cd researcher
cp .env.example .env
# Edit .env and add your Anthropic API key
docker compose up --build
```
Visit **http://localhost:9090** β€” the setup wizard will guide you through configuration.
> **Security note:** The app has no built-in authentication. Run it on a private network or behind a reverse proxy with auth. Do not expose it to the public internet.
### Local (without Docker)
```bash
git clone https://github.com/yourname/researcher.git
cd researcher
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your Anthropic API key
python -m uvicorn src.web.app:app --host 0.0.0.0 --port 8888
```
Visit **http://localhost:8888** and follow the setup wizard.
### HuggingFace Spaces (demo / preview only)
This repo can run on HuggingFace Spaces as a Docker Space for quick demos, but **data is ephemeral** β€” the database and config reset on every container restart (free Spaces sleep after 48h of inactivity).
To try it:
1. Duplicate this Space or create a new Docker Space pointing to this repo
2. In **Settings > Secrets**, add `ANTHROPIC_API_KEY`
3. The app starts automatically β€” follow the setup wizard
For anything beyond a quick test, use Docker Compose locally with persistent storage.
## Setup Wizard
On first launch (before `config.yaml` exists), you'll be guided through:
1. **API Key** β€” Enter your Anthropic API key (validated with a test call)
2. **Domains** β€” Enable/disable AI/ML and Security monitoring, adjust scoring weights
3. **GitHub** β€” Toggle GitHub project tracking
4. **Schedule** β€” Set pipeline frequency (daily, weekly, or manual-only)
After setup, you can optionally **pick seed papers** to bootstrap your preference profile.
## Configuration
All settings live in `config.yaml` (generated by the setup wizard). You can also edit it directly:
```yaml
domains:
aiml:
enabled: true
scoring_axes:
- name: "Code & Weights"
weight: 0.30
- name: "Novelty"
weight: 0.35
- name: "Practical Applicability"
weight: 0.35
security:
enabled: true
scoring_axes:
- name: "Has Code/PoC"
weight: 0.25
- name: "Novel Attack Surface"
weight: 0.40
- name: "Real-World Impact"
weight: 0.35
schedule:
cron: "0 22 * * 0" # Weekly on Sunday at 22:00 UTC
```
## Architecture
| Component | Technology |
|-----------|-----------|
| Web server | FastAPI + Jinja2 + HTMX |
| Database | SQLite |
| Scoring | Anthropic API |
| Scheduling | APScheduler |
| Container | Docker |
### Key Files
| File | Purpose |
|------|---------|
| `src/config.py` | YAML config loader with defaults |
| `src/db.py` | SQLite schema and queries |
| `src/scoring.py` | API batch scorer |
| `src/preferences.py` | Preference learning from user signals |
| `src/pipelines/aiml.py` | AI/ML paper fetcher (HF + arXiv) |
| `src/pipelines/security.py` | Security paper fetcher (arXiv cs.CR) |
| `src/pipelines/github.py` | GitHub trending projects |
| `src/pipelines/events.py` | Conferences, releases, RSS |
| `src/web/app.py` | Web routes and middleware |
| `src/scheduler.py` | Cron-based pipeline scheduler |
## Running Pipelines Manually
From the dashboard, click the pipeline buttons. Or via API:
```bash
curl -X POST http://localhost:9090/run/aiml
curl -X POST http://localhost:9090/run/security
curl -X POST http://localhost:9090/run/github
curl -X POST http://localhost:9090/run/events
```
## Requirements
- Python 3.12+
- Anthropic API key (for paper scoring)
- Optional: GitHub token (for higher API rate limits)
## License
MIT