PaletLabs
/

Circe

+---
+# 🪐 Circe-1.5B
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - bilingual
+  - lora
+  - rl
+  - cost-efficient
+  - tiny-models
+language:
+  - en
+  - es
+---
+<!-- center-aligned, capped at 420 px wide × 240 px tall -->
+<p align="center">
+  <img
+    src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
+    alt="Circe-1.5B schematic"
+    width="420"
+    height="240"
+  />
+</p>
+**Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:
+> _“How far can you push tiny models on a tiny budget?”_
+| ⚙️ Spec | Value |
+|---------|-------|
+| Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
+| Trainable params | 4 M (LoRA) |
+| Post-training cost | **≈ US $12** on 1×L40S |
+| Training recipe | 8 h SFT → 4 h GRPO |
+| Context length | up to **4 k tokens** (tested) |
+| RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) |
+| Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) |
+It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU.
+We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.
+---
+# 🔭 Intended Use
+* **Base for new LoRAs** — domain adaptation, longer-context studies.
+* **Research** into cost-efficient RL for reasoning.
+* **Not** for high-stakes or production tasks.
+See the [⚙️ Limitations](#️-limitations--bias) section before use.
+---
+# ⚡ Quickstart
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
+tok   = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")
+prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
+out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+---
+# 🛠️ Installation
+```bash
+git clone https://github.com/palet-global/circe
+cd circe
+python -m venv venv && source venv/bin/activate
+pip install .
+```
+## 🏗️ Re-Training Pipeline
+### Data
+```bash
+python data/fetch_datasets.py --out data/processed
+```
+### Supervised LoRA
+```bash
+accelerate config default            # one-time
+accelerate launch train/sft.py \
+  --data_dir data/processed \
+  --output_dir checkpoints/sft
+```
+### RL (GRPO)
+```bash
+accelerate launch train/rl_grpo.py \
+  --data_dir data/processed \
+  --output_dir checkpoints/grpo \
+  --init_ckpt checkpoints/sft/checkpoint-13000 \
+  --num_steps 3000 --save_steps 500 --group 4
+```
+### Merge and Tokenizer
+```bash
+python train/merge_lora.py \
+  --ckpt_dir checkpoints/grpo \
+  --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+```
+### SQuAD Sanity Checks
+```bash
+python eval/quick_squad_eval.py --model ./merged --dataset squad
+python eval/quick_squad_eval.py --model ./merged --dataset squad_es
+```
+### Upload
+```bash
+python train/upload_to_hub.py \
+  --model_dir merged \
+  --repo PaletLabs/Circe-1.5B \
+  --token $HF_TOKEN
+```
+---
+# 💻 Hardware & Inference Tips
+- **bf16 / fp16**: Needs ~9 GB VRAM.
+- **4-bit GPTQ**: < 3 GB; `bitsandbytes` works out-of-the-box.
+- Compile once (`torch.compile`) for **+10–15 %** throughput.
+---
+# ✍️ Current Evaluation Status
+Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.
+---
+## ⚙️ Limitations & Bias
+- No reward-model alignment — outputs may be unsafe or hallucinate.
+- Long-context (> 4 k) stability untested.
+- Training data bias from public QA pairs; Spanish coverage favors Latin-American variants.
+- Minimal safety filters — **you** must wrap with your own guardrails for production.
+---
+# 🔮 Roadmap
+- Publish full reasoning benchmark suite & eval scripts.
+- Release code-reasoning and doc-QA adapters.
+- Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops.
+---
+# 🪪 License
+This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.