Circe / README.md

Update README.md

575bad0 verified 9 months ago

4.27 kB

	---
	# 🪐 Circe-1.5B
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- bilingual
	- lora
	- rl
	- cost-efficient
	- tiny-models
	language:
	- en
	- es
	---

	<!-- center-aligned, capped at 420 px wide × 240 px tall -->
	<p align="center">
	<img
	src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
	alt="Circe-1.5B schematic"
	width="420"
	height="240"
	/>
	</p>


	Circe-1.5B is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:

	> _“How far can you push tiny models on a tiny budget?”_

	\| ⚙️ Spec \| Value \|
	\|---------\|-------\|
	\| Base model \| `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` \|
	\| Trainable params \| 4 M (LoRA) \|
	\| Post-training cost \| ≈ US $12 on 1×L40S \|
	\| Training recipe \| 8 h SFT → 4 h GRPO \|
	\| Context length \| up to 4 k tokens (tested) \|
	\| RAM @ bf16 \| ~9 GB (≤ 3 GB 4-bit GPTQ) \|
	\| Throughput \| ~55 tok / s on 1×A6000 (fp16, no compile) \|

	It keeps DeepSeek-R1’s strong reasoning depth but adds fluent bilingual chat (English & Spanish) in a checkpoint that fits on a laptop GPU.
	We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.

	---

	# 🔭 Intended Use

	* Base for new LoRAs — domain adaptation, longer-context studies.
	* Research into cost-efficient RL for reasoning.
	* Not for high-stakes or production tasks.

	See the [⚙️ Limitations](#️-limitations--bias) section before use.

	---

	# ⚡ Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
	tok = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")

	prompt = "<\|user\|>¿Cómo se dice “tiny model” en español?<\|assistant\|>"
	out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
	print(tok.decode(out[0], skip_special_tokens=True))
	```

	---

	# 🛠️ Installation
	```bash
	git clone https://github.com/palet-global/circe
	cd circe
	python -m venv venv && source venv/bin/activate
	pip install .
	```

	## 🏗️ Re-Training Pipeline

	### Data
	```bash
	python data/fetch_datasets.py --out data/processed
	```

	### Supervised LoRA
	```bash
	accelerate config default # one-time
	accelerate launch train/sft.py \
	--data_dir data/processed \
	--output_dir checkpoints/sft
	```

	### RL (GRPO)
	```bash
	accelerate launch train/rl_grpo.py \
	--data_dir data/processed \
	--output_dir checkpoints/grpo \
	--init_ckpt checkpoints/sft/checkpoint-13000 \
	--num_steps 3000 --save_steps 500 --group 4
	```

	### Merge and Tokenizer
	```bash
	python train/merge_lora.py \
	--ckpt_dir checkpoints/grpo \
	--base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	```

	### SQuAD Sanity Checks
	```bash
	python eval/quick_squad_eval.py --model ./merged --dataset squad
	python eval/quick_squad_eval.py --model ./merged --dataset squad_es
	```

	### Upload
	```bash
	python train/upload_to_hub.py \
	--model_dir merged \
	--repo PaletLabs/Circe-1.5B \
	--token $HF_TOKEN
	```

	---

	# 💻 Hardware & Inference Tips
	- bf16 / fp16: Needs ~9 GB VRAM.
	- 4-bit GPTQ: < 3 GB. `bitsandbytes` works out-of-the-box.
	- Compile once (`torch.compile`) for +10–15 % throughput.

	---
	# ✍️ Current Evaluation Status
	Formal lighteval / MMLU / GSM-8K runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.

	---
	## ⚙️ Limitations & Bias
	- No reward-model alignment.
	- Long-context (> 4 k) stability untested.
	- Training data bias from public QA pairs. Spanish coverage favors Latin American variants.
	- Minimal safety filters so you have to wrap with your own guardrails for production.

	---
	# 🔮 Roadmap
	- Publish full reasoning benchmark suite & eval scripts.
	- Release code-reasoning and doc-QA adapters.
	- Attach a 24 kHz neural codec → real-time, full-duplex voice chat without ASR → TTS hops.

	---
	# 🪪 License
	This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.