Instructions to use PaletLabs/Circe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PaletLabs/Circe with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PaletLabs/Circe")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PaletLabs/Circe")
model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PaletLabs/Circe with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PaletLabs/Circe"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PaletLabs/Circe

SGLang

How to use PaletLabs/Circe with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PaletLabs/Circe" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PaletLabs/Circe" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PaletLabs/Circe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PaletLabs/Circe with Docker Model Runner:
```
docker model run hf.co/PaletLabs/Circe
```

Circe-1.5B schematic

Circe-1.5B is a single-checkpoint, 1.5 B-parameter language model that asks a simple question:

“How far can you push tiny models on a tiny budget?”

⚙️ Spec	Value
Base model	`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
Trainable params	4 M (LoRA)
Post-training cost	≈ US $12 on 1×L40S
Training recipe	8 h SFT → 4 h GRPO
Context length	up to 4 k tokens (tested)
RAM @ bf16	~9 GB (≤ 3 GB 4-bit GPTQ)
Throughput	~55 tok / s on 1×A6000 (fp16, no compile)

It keeps DeepSeek-R1’s strong reasoning depth but adds fluent bilingual chat (English & Spanish) in a checkpoint that fits on a laptop GPU.
We intend to use it as a reproducible waypoint on the road to real-time speech-to-speech reasoning systems.

🔭 Intended Use

Base for new LoRAs — domain adaptation, longer-context studies.
Research into cost-efficient RL for reasoning.
Not for high-stakes or production tasks.

See the ⚙️ Limitations section before use.

⚡ Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("PaletLabs/Circe-1.5B", torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained("PaletLabs/Circe-1.5B")

prompt = "<|user|>¿Cómo se dice “tiny model” en español?<|assistant|>"
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

🛠️ Installation

git clone https://github.com/palet-global/circe
cd circe
python -m venv venv && source venv/bin/activate
pip install .

🏗️ Re-Training Pipeline

Data

python data/fetch_datasets.py --out data/processed

Supervised LoRA

accelerate config default            # one-time
accelerate launch train/sft.py \
  --data_dir data/processed \
  --output_dir checkpoints/sft

RL (GRPO)

accelerate launch train/rl_grpo.py \
  --data_dir data/processed \
  --output_dir checkpoints/grpo \
  --init_ckpt checkpoints/sft/checkpoint-13000 \
  --num_steps 3000 --save_steps 500 --group 4

Merge and Tokenizer

python train/merge_lora.py \
  --ckpt_dir checkpoints/grpo \
  --base deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

SQuAD Sanity Checks

python eval/quick_squad_eval.py --model ./merged --dataset squad
python eval/quick_squad_eval.py --model ./merged --dataset squad_es

Upload

python train/upload_to_hub.py \
  --model_dir merged \
  --repo PaletLabs/Circe-1.5B \
  --token $HF_TOKEN

💻 Hardware & Inference Tips

bf16 / fp16: Needs ~9 GB VRAM.
4-bit GPTQ: < 3 GB. bitsandbytes works out-of-the-box.
Compile once (torch.compile) for +10–15 % throughput.

✍️ Current Evaluation Status

Formal lighteval / MMLU / GSM-8K runs are queued. Preliminary spot-checks show Circe retains DeepSeek-R1’s chain-of-thought depth on reasoning-heavy QA while adding smooth bilingual generation.

⚙️ Limitations & Bias

No reward-model alignment.
Long-context (> 4 k) stability untested.
Training data bias from public QA pairs. Spanish coverage favors Latin American variants.
Minimal safety filters so you have to wrap with your own guardrails for production.

🔮 Roadmap

Publish full reasoning benchmark suite & eval scripts.
Release code-reasoning and doc-QA adapters.
Attach a 24 kHz neural codec → real-time, full-duplex voice chat without ASR → TTS hops.

🪪 License

This project is licensed under the MIT License. Attribution appreciated but not required.

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

F32