Spaces:
Sleeping
Sleeping
File size: 4,537 Bytes
06b30ae d92b427 f10fa78 06b30ae f10fa78 d92b427 06b30ae f10fa78 d92b427 f10fa78 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
title: SmolLM2 Customs ADI
emoji: π€
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: true
short_description: DEMO β Build your own free LLM service
---
# SmolLM2 Customs β Build Your Own LLM Service
> A showcase: how to build a free, private, OpenAI-compatible LLM service on HuggingFace Spaces and plug it into any hub or application β no GPU, no money, no drama.
> [!IMPORTANT]
> This project is under active development β always use the latest release from [Codey Lab](https://github.com/Codey-LAB/SmolLM2-customs) *(more stable builds land there first)*.
> This repo ([DEV-STATUS](https://github.com/VolkanSah/SmolLM2-ADI)) is where the chaos happens. π¬ A β on the repos would be cool π
---
## What is this?
A minimal but production-ready LLM service built on:
- **SmolLM2-360M-Instruct** β 269MB, Apache 2.0, runs on 2 CPUs for free
- **FastAPI** β OpenAI-compatible `/v1/chat/completions` endpoint
- **ADI** (Anti-Dump Index) β filters low-quality requests before they hit the model
- **HF Dataset** β logs every request for later analysis and finetuning
The point is not the model β the point is the pattern. Fork it, swap SmolLM2 for any model you want, and you have your own private LLM API running for free.
---
## How it works
```
Request
β
ADI Score (is this request worth answering?)
β
REJECT β returns improvement suggestions, logs to dataset
MEDIUM/HIGH β SmolLM2 answers, logs to dataset
SmolLM2 fails β returns 503 β hub fallback chain kicks in
```
---
## Endpoints
```
GET / β status
GET /v1/health β health check
POST /v1/chat/completions β OpenAI-compatible inference
```
---
## Plug into any Hub (one config block)
Works out of the box with [Multi-LLM-API-Gateway](https://github.com/VolkanSah/Multi-LLM-API-Gateway): Hub Screenshot for this [SmolLM2](SmolLM2.jpg)
```ini
[LLM_PROVIDER.smollm]
active = "true"
base_url = "https://YOUR-USERNAME-smollm2-customs.hf.space/v1"
env_key = "SMOLLM_API_KEY"
default_model = "smollm2-360m"
models = "smollm2-360m, YOUR-USERNAME/your-finetuned-model"
fallback_to = "gemini"
[LLM_PROVIDER.smollm_END]
```
Any OpenAI-compatible client works the same way.
---
## Secrets (HF Space Settings)
| Secret | Required | Description |
|--------|----------|-------------|
| `SMOLLM_API_KEY` | recommended | Locks the endpoint β set same value in your hub |
| `HF_TOKEN` or `TEST_TOKEN` | optional | HF auth for dataset + model repo access |
| `MODEL_REPO` | optional | Base model override (default: `HuggingFaceTB/SmolLM2-360M-Instruct`) |
| `DATASET_REPO` | optional | Your private HF dataset for logging |
| `PRIVATE_MODEL_REPO` | optional | Your private model repo for finetuned weights |
**Auth modes:**
```
SMOLLM_API_KEY not set β open access (demo/showcase mode)
SMOLLM_API_KEY set β protected (production mode)
Space private β double protection (HF gate + your key)
```
---
## ADI Routing
| Decision | Action |
|----------|--------|
| `HIGH_PRIORITY` | SmolLM2 handles it |
| `MEDIUM_PRIORITY` | SmolLM2 handles it |
| `REJECT` | Returns suggestions, logs to dataset |
| SmolLM2 fails | 503 β hub fallback chain |
---
## Training Utilities
Every request is logged to your private HF dataset. Use it to improve over time:
```bash
python train.py --mode export # export dataset β JSONL
python train.py --mode validate # validate ADI weights against labeled data
python train.py --mode finetune # finetune SmolLM2 on your data (coming soon)
```
Once you have enough data β finetune β push to your private model repo β Space loads it automatically next restart.
---
## Stack
| Component | What it does |
|-----------|-------------|
| `main.py` | FastAPI, auth, routing |
| `smollm.py` | Inference engine, lazy loading |
| `model.py` | HF token resolution, dataset + model repo access |
| `adi.py` | Request quality scoring |
| `train.py` | Dataset export, ADI validation, finetuning |
---
## Part of
- [Multi-LLM-API-Gateway](https://github.com/VolkanSah/Multi-LLM-API-Gateway) β the hub this was built for
- [Anti-Dump-Index](https://github.com/VolkanSah/Anti-Dump-Index) β the ADI algorithm idea
## License
Dual-licensed:
- [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- [Ethical Security Operations License v1.1 (ESOL)](ESOL) β mandatory, non-severable
By using this software you agree to all ethical constraints defined in ESOL v1.1. |