File size: 4,537 Bytes
06b30ae
d92b427
f10fa78
 
06b30ae
 
f10fa78
d92b427
06b30ae
 
f10fa78
 
 
 
 
 
d92b427
f10fa78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: SmolLM2 Customs ADI
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: true
short_description: DEMO β€” Build your own free LLM service
---

# SmolLM2 Customs β€” Build Your Own LLM Service

> A showcase: how to build a free, private, OpenAI-compatible LLM service on HuggingFace Spaces and plug it into any hub or application β€” no GPU, no money, no drama.

> [!IMPORTANT]
> This project is under active development β€” always use the latest release from [Codey Lab](https://github.com/Codey-LAB/SmolLM2-customs) *(more stable builds land there first)*.
> This repo ([DEV-STATUS](https://github.com/VolkanSah/SmolLM2-ADI)) is where the chaos happens. πŸ”¬ A ⭐ on the repos would be cool πŸ˜™

---

## What is this?

A minimal but production-ready LLM service built on:

- **SmolLM2-360M-Instruct** β€” 269MB, Apache 2.0, runs on 2 CPUs for free
- **FastAPI** β€” OpenAI-compatible `/v1/chat/completions` endpoint
- **ADI** (Anti-Dump Index) β€” filters low-quality requests before they hit the model
- **HF Dataset** β€” logs every request for later analysis and finetuning

The point is not the model β€” the point is the pattern. Fork it, swap SmolLM2 for any model you want, and you have your own private LLM API running for free.

---

## How it works

```
Request
    ↓
ADI Score (is this request worth answering?)
    ↓
REJECT        β†’ returns improvement suggestions, logs to dataset
MEDIUM/HIGH   β†’ SmolLM2 answers, logs to dataset
SmolLM2 fails β†’ returns 503 β†’ hub fallback chain kicks in
```

---

## Endpoints

```
GET  /                       β†’ status
GET  /v1/health              β†’ health check
POST /v1/chat/completions    β†’ OpenAI-compatible inference
```

---

## Plug into any Hub (one config block)

Works out of the box with [Multi-LLM-API-Gateway](https://github.com/VolkanSah/Multi-LLM-API-Gateway): Hub Screenshot for this [SmolLM2](SmolLM2.jpg)

```ini
[LLM_PROVIDER.smollm]
active        = "true"
base_url      = "https://YOUR-USERNAME-smollm2-customs.hf.space/v1"
env_key       = "SMOLLM_API_KEY"
default_model = "smollm2-360m"
models        = "smollm2-360m, YOUR-USERNAME/your-finetuned-model"
fallback_to   = "gemini"
[LLM_PROVIDER.smollm_END]
```

Any OpenAI-compatible client works the same way.


---

## Secrets (HF Space Settings)

| Secret | Required | Description |
|--------|----------|-------------|
| `SMOLLM_API_KEY` | recommended | Locks the endpoint β€” set same value in your hub |
| `HF_TOKEN` or `TEST_TOKEN` | optional | HF auth for dataset + model repo access |
| `MODEL_REPO` | optional | Base model override (default: `HuggingFaceTB/SmolLM2-360M-Instruct`) |
| `DATASET_REPO` | optional | Your private HF dataset for logging |
| `PRIVATE_MODEL_REPO` | optional | Your private model repo for finetuned weights |

**Auth modes:**
```
SMOLLM_API_KEY not set  β†’ open access (demo/showcase mode)
SMOLLM_API_KEY set      β†’ protected (production mode)
Space private           β†’ double protection (HF gate + your key)
```

---

## ADI Routing

| Decision | Action |
|----------|--------|
| `HIGH_PRIORITY` | SmolLM2 handles it |
| `MEDIUM_PRIORITY` | SmolLM2 handles it |
| `REJECT` | Returns suggestions, logs to dataset |
| SmolLM2 fails | 503 β†’ hub fallback chain |

---

## Training Utilities

Every request is logged to your private HF dataset. Use it to improve over time:

```bash
python train.py --mode export    # export dataset β†’ JSONL
python train.py --mode validate  # validate ADI weights against labeled data
python train.py --mode finetune  # finetune SmolLM2 on your data (coming soon)
```

Once you have enough data β†’ finetune β†’ push to your private model repo β†’ Space loads it automatically next restart.

---

## Stack

| Component | What it does |
|-----------|-------------|
| `main.py` | FastAPI, auth, routing |
| `smollm.py` | Inference engine, lazy loading |
| `model.py` | HF token resolution, dataset + model repo access |
| `adi.py` | Request quality scoring |
| `train.py` | Dataset export, ADI validation, finetuning |

---

## Part of

- [Multi-LLM-API-Gateway](https://github.com/VolkanSah/Multi-LLM-API-Gateway) β€” the hub this was built for
- [Anti-Dump-Index](https://github.com/VolkanSah/Anti-Dump-Index) β€” the ADI algorithm idea


## License

Dual-licensed:

- [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- [Ethical Security Operations License v1.1 (ESOL)](ESOL) β€” mandatory, non-severable

By using this software you agree to all ethical constraints defined in ESOL v1.1.