ErnestoOjeda commited on
Commit
e9fc8f7
·
verified ·
1 Parent(s): f075f6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -1,3 +1,151 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # 🪐 Circe-1.5B
3
+ license: mit
4
+ library_name: transformers
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - bilingual
8
+ - lora
9
+ - rl
10
+ - cost-efficient
11
+ - tiny-models
12
+ language:
13
+ - en
14
+ - es
15
+ ---
16
+
17
+ <!-- center-aligned, capped at 420 px wide × 240 px tall -->
18
+ <p align="center">
19
+ <img
20
+ src="https://cdn-uploads.huggingface.co/production/uploads/657e1ad01e3e9c41a49b732e/8IsJaxuOwuqBN0GctRUUe.png"
21
+ alt="Circe-1.5B schematic"
22
+ width="420"
23
+ height="240"
24
+ />
25
+ </p>
26
+
27
+
28
+ **Circe-1.5B** is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:
29
+
30
+ > _“How far can you push tiny models on a tiny budget?”_
31
+
32
+ | ⚙️ Spec | Value |
33
+ |---------|-------|
34
+ | Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
35
+ | Trainable params | 4 M (LoRA) |
36
+ | Post-training cost | **≈ US $12** on 1×L40S |
37
+ | Training recipe | 8 h SFT → 4 h GRPO |
38
+ | Context length | up to **4 k tokens** (tested) |
39
+ | RAM @ bf16 | ~9 GB (≤ 3 GB 4-bit GPTQ) |
40
+ | Throughput | ~55 tok / s on 1×A6000 (fp16, no compile) |
41
+
42
+ It keeps DeepSeek-R1’s strong reasoning depth but adds **fluent bilingual chat** (English & Spanish) in a checkpoint that fits on a laptop GPU.
43
+ We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.
44
+
45
+ ---
46
+
47
+ # 🔭 Intended Use
48
+
49
+ * **Base for new LoRAs** — domain adaptation, longer-context studies.
50
+ * **Research** into cost-efficient RL for reasoning.
51
+ * **Not** for high-stakes or production tasks.
52
+
53
+ See the [⚙️ Limitations](#️-limitations--bias) section before use.
54
+
55
+ ---
56
+
57
+ # ⚡ Quickstart
58
+
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+
62
+ model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
63
+ tok = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")
64
+
65
+ prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
66
+ out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
67
+ print(tok.decode(out[0], skip_special_tokens=True))
68
+ ```
69
+
70
+ ---
71
+
72
+ # 🛠️ Installation
73
+ ```bash
74
+ git clone https://github.com/palet-global/circe
75
+ cd circe
76
+ python -m venv venv && source venv/bin/activate
77
+ pip install .
78
+ ```
79
+
80
+ ## 🏗️ Re-Training Pipeline
81
+
82
+ ### Data
83
+ ```bash
84
+ python data/fetch_datasets.py --out data/processed
85
+ ```
86
+
87
+ ### Supervised LoRA
88
+ ```bash
89
+ accelerate config default # one-time
90
+ accelerate launch train/sft.py \
91
+ --data_dir data/processed \
92
+ --output_dir checkpoints/sft
93
+ ```
94
+
95
+ ### RL (GRPO)
96
+ ```bash
97
+ accelerate launch train/rl_grpo.py \
98
+ --data_dir data/processed \
99
+ --output_dir checkpoints/grpo \
100
+ --init_ckpt checkpoints/sft/checkpoint-13000 \
101
+ --num_steps 3000 --save_steps 500 --group 4
102
+ ```
103
+
104
+ ### Merge and Tokenizer
105
+ ```bash
106
+ python train/merge_lora.py \
107
+ --ckpt_dir checkpoints/grpo \
108
+ --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
109
+ ```
110
+
111
+ ### SQuAD Sanity Checks
112
+ ```bash
113
+ python eval/quick_squad_eval.py --model ./merged --dataset squad
114
+ python eval/quick_squad_eval.py --model ./merged --dataset squad_es
115
+ ```
116
+
117
+ ### Upload
118
+ ```bash
119
+ python train/upload_to_hub.py \
120
+ --model_dir merged \
121
+ --repo PaletLabs/Circe-1.5B \
122
+ --token $HF_TOKEN
123
+ ```
124
+
125
+ ---
126
+
127
+ # 💻 Hardware & Inference Tips
128
+ - **bf16 / fp16**: Needs ~9 GB VRAM.
129
+ - **4-bit GPTQ**: < 3 GB; `bitsandbytes` works out-of-the-box.
130
+ - Compile once (`torch.compile`) for **+10–15 %** throughput.
131
+
132
+ ---
133
+ # ✍️ Current Evaluation Status
134
+ Formal **lighteval / MMLU / GSM-8K** runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.
135
+
136
+ ---
137
+ ## ⚙️ Limitations & Bias
138
+ - No reward-model alignment — outputs may be unsafe or hallucinate.
139
+ - Long-context (> 4 k) stability untested.
140
+ - Training data bias from public QA pairs; Spanish coverage favors Latin-American variants.
141
+ - Minimal safety filters — **you** must wrap with your own guardrails for production.
142
+
143
+ ---
144
+ # 🔮 Roadmap
145
+ - Publish full reasoning benchmark suite & eval scripts.
146
+ - Release code-reasoning and doc-QA adapters.
147
+ - Attach a **24 kHz neural codec** → real-time, full-duplex voice chat without ASR → TTS hops.
148
+
149
+ ---
150
+ # 🪪 License
151
+ This project is licensed under the [MIT](https://opensource.org/licenses/MIT) License. Attribution appreciated but not required.