File size: 4,266 Bytes
e9fc8f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
575bad0
e9fc8f7
 
 
 
 
 
 
 
575bad0
e9fc8f7
575bad0
 
e9fc8f7
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
# 🪐 Circe-1.5B
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
  - bilingual
  - lora
  - rl
  - cost-efficient
  - tiny-models
language:
  - en
  - es
---

<!-- center-aligned, capped at 420 px wide × 240 px tall -->
<p align="center">
  <img
    src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
    alt="Circe-1.5B schematic"
    width="420"
    height="240"
  />
</p>


**Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:  

> _“How far can you push tiny models on a tiny budget?”_

| ⚙️ Spec | Value |
|---------|-------|
| Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
| Trainable params | 4 M (LoRA) |
| Post-training cost | **≈ US $12** on 1×L40S |
| Training recipe | 8 h SFT → 4 h GRPO |
| Context length | up to **4 k tokens** (tested) |
| RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) |
| Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) |

It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU.  
We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.

---

# 🔭 Intended Use

* **Base for new LoRAs** — domain adaptation, longer-context studies.  
* **Research** into cost-efficient RL for reasoning.  
* **Not** for high-stakes or production tasks.

See the [⚙️ Limitations](#️-limitations--bias) section before use.

---

# ⚡ Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")

prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

# 🛠️ Installation
```bash
git clone https://github.com/palet-global/circe
cd circe
python -m venv venv && source venv/bin/activate
pip install .
```

## 🏗️ Re-Training Pipeline

### Data
```bash
python data/fetch_datasets.py --out data/processed
```

### Supervised LoRA
```bash
accelerate config default            # one-time
accelerate launch train/sft.py \
  --data_dir data/processed \
  --output_dir checkpoints/sft
```

### RL (GRPO)
```bash
accelerate launch train/rl_grpo.py \
  --data_dir data/processed \
  --output_dir checkpoints/grpo \
  --init_ckpt checkpoints/sft/checkpoint-13000 \
  --num_steps 3000 --save_steps 500 --group 4
```

### Merge and Tokenizer
```bash
python train/merge_lora.py \
  --ckpt_dir checkpoints/grpo \
  --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
```

### SQuAD Sanity Checks
```bash
python eval/quick_squad_eval.py --model ./merged --dataset squad
python eval/quick_squad_eval.py --model ./merged --dataset squad_es
```

### Upload
```bash
python train/upload_to_hub.py \
  --model_dir merged \
  --repo PaletLabs/Circe-1.5B \
  --token $HF_TOKEN
```

---

# 💻 Hardware & Inference Tips
- **bf16 / fp16**: Needs ~9 GB VRAM.  
- **4-bit GPTQ**: < 3 GB. `bitsandbytes` works out-of-the-box.  
- Compile once (`torch.compile`) for **+10–15 %** throughput.

---
# ✍️ Current Evaluation Status
Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.

---
## ⚙️ Limitations & Bias
- No reward-model alignment. 
- Long-context (> 4 k) stability untested.  
- Training data bias from public QA pairs. Spanish coverage favors Latin American variants.  
- Minimal safety filters so **you** have to wrap with your own guardrails for production.

---
# 🔮 Roadmap
- Publish full reasoning benchmark suite & eval scripts.  
- Release code-reasoning and doc-QA adapters.  
- Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops.

---
# 🪪 License
This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.