File size: 4,982 Bytes
430d0f8
 
 
 
 
 
 
 
 
a0f27fa
 
430d0f8
 
 
a0f27fa
 
 
 
430d0f8
a0f27fa
 
 
 
 
 
430d0f8
 
 
a0f27fa
430d0f8
a0f27fa
 
 
 
 
 
 
 
 
 
 
 
430d0f8
 
 
a0f27fa
 
 
 
 
 
 
 
 
 
 
 
 
430d0f8
 
 
 
 
 
 
 
 
 
 
 
a0f27fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
430d0f8
a0f27fa
 
 
 
 
 
 
 
 
430d0f8
a0f27fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: Research Intelligence
emoji: "\U0001F4E1"
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
---

# Research Intelligence

A self-hosted research triage system that monitors academic papers (AI/ML and Security) and trending GitHub projects, scores them with AI, and learns your preferences over time.

> **This HuggingFace Space is a demo.** Data is ephemeral and resets when the container restarts. For production use, deploy locally with Docker Compose and persistent storage β€” see instructions below.

## Features

- **Paper monitoring** β€” Fetches new papers from arXiv and HuggingFace daily/weekly
- **AI scoring** β€” Scores each paper on configurable axes (novelty, code availability, practical impact)
- **Preference learning** β€” Rate papers with thumbs up/down; the system learns what you care about and re-ranks accordingly
- **GitHub tracking** β€” Monitors trending repositories across curated collections
- **Event tracking** β€” Conference deadlines, releases, and RSS news feeds
- **Weekly reports** β€” Auto-generated markdown summaries of top papers
- **Dark-theme dashboard** β€” Fast, responsive web UI built with HTMX

## Deployment

### Docker Compose (recommended for production)

This is the intended deployment method. Your data persists across restarts via a local volume mount.

```bash
git clone https://github.com/yourname/researcher.git
cd researcher
cp .env.example .env
# Edit .env and add your Anthropic API key

docker compose up --build
```

Visit **http://localhost:9090** β€” the setup wizard will guide you through configuration.

> **Security note:** The app has no built-in authentication. Run it on a private network or behind a reverse proxy with auth. Do not expose it to the public internet.

### Local (without Docker)

```bash
git clone https://github.com/yourname/researcher.git
cd researcher
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your Anthropic API key

python -m uvicorn src.web.app:app --host 0.0.0.0 --port 8888
```

Visit **http://localhost:8888** and follow the setup wizard.

### HuggingFace Spaces (demo / preview only)

This repo can run on HuggingFace Spaces as a Docker Space for quick demos, but **data is ephemeral** β€” the database and config reset on every container restart (free Spaces sleep after 48h of inactivity).

To try it:

1. Duplicate this Space or create a new Docker Space pointing to this repo
2. In **Settings > Secrets**, add `ANTHROPIC_API_KEY`
3. The app starts automatically β€” follow the setup wizard

For anything beyond a quick test, use Docker Compose locally with persistent storage.

## Setup Wizard

On first launch (before `config.yaml` exists), you'll be guided through:

1. **API Key** β€” Enter your Anthropic API key (validated with a test call)
2. **Domains** β€” Enable/disable AI/ML and Security monitoring, adjust scoring weights
3. **GitHub** β€” Toggle GitHub project tracking
4. **Schedule** β€” Set pipeline frequency (daily, weekly, or manual-only)

After setup, you can optionally **pick seed papers** to bootstrap your preference profile.

## Configuration

All settings live in `config.yaml` (generated by the setup wizard). You can also edit it directly:

```yaml
domains:
  aiml:
    enabled: true
    scoring_axes:
      - name: "Code & Weights"
        weight: 0.30
      - name: "Novelty"
        weight: 0.35
      - name: "Practical Applicability"
        weight: 0.35
  security:
    enabled: true
    scoring_axes:
      - name: "Has Code/PoC"
        weight: 0.25
      - name: "Novel Attack Surface"
        weight: 0.40
      - name: "Real-World Impact"
        weight: 0.35

schedule:
  cron: "0 22 * * 0"  # Weekly on Sunday at 22:00 UTC
```

## Architecture

| Component | Technology |
|-----------|-----------|
| Web server | FastAPI + Jinja2 + HTMX |
| Database | SQLite |
| Scoring | Anthropic API |
| Scheduling | APScheduler |
| Container | Docker |

### Key Files

| File | Purpose |
|------|---------|
| `src/config.py` | YAML config loader with defaults |
| `src/db.py` | SQLite schema and queries |
| `src/scoring.py` | API batch scorer |
| `src/preferences.py` | Preference learning from user signals |
| `src/pipelines/aiml.py` | AI/ML paper fetcher (HF + arXiv) |
| `src/pipelines/security.py` | Security paper fetcher (arXiv cs.CR) |
| `src/pipelines/github.py` | GitHub trending projects |
| `src/pipelines/events.py` | Conferences, releases, RSS |
| `src/web/app.py` | Web routes and middleware |
| `src/scheduler.py` | Cron-based pipeline scheduler |

## Running Pipelines Manually

From the dashboard, click the pipeline buttons. Or via API:

```bash
curl -X POST http://localhost:9090/run/aiml
curl -X POST http://localhost:9090/run/security
curl -X POST http://localhost:9090/run/github
curl -X POST http://localhost:9090/run/events
```

## Requirements

- Python 3.12+
- Anthropic API key (for paper scoring)
- Optional: GitHub token (for higher API rate limits)

## License

MIT