Circe

File size: 4,266 Bytes

---
# 🪐 Circe-1.5B
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
  - bilingual
  - lora
  - rl
  - cost-efficient
  - tiny-models
language:
  - en
  - es
---

<!-- center-aligned, capped at 420 px wide × 240 px tall -->
<p align="center">
  <img
    src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
    alt="Circe-1.5B schematic"
    width="420"
    height="240"
  />
</p>


**Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:  

> _“How far can you push tiny models on a tiny budget?”_

| ⚙️ Spec | Value |
|---------|-------|
| Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
| Trainable params | 4 M (LoRA) |
| Post-training cost | **≈ US $12** on 1×L40S |
| Training recipe | 8 h SFT → 4 h GRPO |
| Context length | up to **4 k tokens** (tested) |
| RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) |
| Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) |

It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU.  
We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.

---

# 🔭 Intended Use

* **Base for new LoRAs** — domain adaptation, longer-context studies.  
* **Research** into cost-efficient RL for reasoning.  
* **Not** for high-stakes or production tasks.

See the [⚙️ Limitations](#️-limitations--bias) section before use.

---

# ⚡ Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")

prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))
```

---

# 🛠️ Installation
```bash
git clone https://github.com/palet-global/circe
cd circe
python -m venv venv && source venv/bin/activate
pip install .
```

## 🏗️ Re-Training Pipeline

### Data
```bash
python data/fetch_datasets.py --out data/processed
```

### Supervised LoRA
```bash
accelerate config default            # one-time
accelerate launch train/sft.py \
  --data_dir data/processed \
  --output_dir checkpoints/sft
```

### RL (GRPO)
```bash
accelerate launch train/rl_grpo.py \
  --data_dir data/processed \
  --output_dir checkpoints/grpo \
  --init_ckpt checkpoints/sft/checkpoint-13000 \
  --num_steps 3000 --save_steps 500 --group 4
```

### Merge and Tokenizer
```bash
python train/merge_lora.py \
  --ckpt_dir checkpoints/grpo \
  --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
```

### SQuAD Sanity Checks
```bash
python eval/quick_squad_eval.py --model ./merged --dataset squad
python eval/quick_squad_eval.py --model ./merged --dataset squad_es
```

### Upload
```bash
python train/upload_to_hub.py \
  --model_dir merged \
  --repo PaletLabs/Circe-1.5B \
  --token $HF_TOKEN
```

---

# 💻 Hardware & Inference Tips
- **bf16 / fp16**: Needs ~9 GB VRAM.  
- **4-bit GPTQ**: < 3 GB. `bitsandbytes` works out-of-the-box.  
- Compile once (`torch.compile`) for **+10–15 %** throughput.

---
# ✍️ Current Evaluation Status
Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.

---
## ⚙️ Limitations & Bias
- No reward-model alignment. 
- Long-context (> 4 k) stability untested.  
- Training data bias from public QA pairs. Spanish coverage favors Latin American variants.  
- Minimal safety filters so **you** have to wrap with your own guardrails for production.

---
# 🔮 Roadmap
- Publish full reasoning benchmark suite & eval scripts.  
- Release code-reasoning and doc-QA adapters.  
- Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops.

---
# 🪪 License
This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.