Instructions to use ceselder/cot-oracle-ablation-stride5-3layers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ceselder/cot-oracle-ablation-stride5-3layers with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B") model = PeftModel.from_pretrained(base_model, "ceselder/cot-oracle-ablation-stride5-3layers") - Transformers
How to use ceselder/cot-oracle-ablation-stride5-3layers with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ceselder/cot-oracle-ablation-stride5-3layers")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ceselder/cot-oracle-ablation-stride5-3layers", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ceselder/cot-oracle-ablation-stride5-3layers with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ceselder/cot-oracle-ablation-stride5-3layers" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ceselder/cot-oracle-ablation-stride5-3layers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ceselder/cot-oracle-ablation-stride5-3layers
- SGLang
How to use ceselder/cot-oracle-ablation-stride5-3layers with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ceselder/cot-oracle-ablation-stride5-3layers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ceselder/cot-oracle-ablation-stride5-3layers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ceselder/cot-oracle-ablation-stride5-3layers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ceselder/cot-oracle-ablation-stride5-3layers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ceselder/cot-oracle-ablation-stride5-3layers with Docker Model Runner:
docker model run hf.co/ceselder/cot-oracle-ablation-stride5-3layers
CoT Oracle Ablation: Stride=5, 3 Layers (9, 18, 27)
LoRA adapter for Qwen/Qwen3-8B trained as a CoT (chain-of-thought) trajectory oracle. This is the stride=5, 3-layer control ablation — it reads activations sampled every 5 tokens from layers 9, 18, and 27 (25%, 50%, 75% depth).
Base AO checkpoint: adamkarvonen/checkpoints_latentqa_cls_past_lens_addition_Qwen3-8B
What This Model Does
The oracle takes activation trajectories extracted during CoT generation and classifies/describes what actually influenced the reasoning. It can:
- Reconstruct full CoT from stride activations (token F1: 0.660)
- Predict next reasoning steps (token F1: 0.435)
- Predict final answers from partial CoT (token F1: 0.500)
- Classify correctness of reasoning (token F1: 0.840)
- Classify decorative vs load-bearing CoT (token F1: 0.960)
- Predict reasoning termination (token F1: 0.740)
- Reconstruct original prompts from activations (token F1: 0.636)
Architecture
- Injection method: Norm-matched addition at layer 1
- Placeholder token:
" ¶"(token ID 78846) - Activation layers: 9, 18, 27 (25%, 50%, 75% of 36 layers)
- Stride: Every 5 tokens through the CoT
- Position encoding: None (this is the no-PE control)
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-8B |
| AO checkpoint | adamkarvonen/checkpoints_latentqa_cls_past_lens_addition_Qwen3-8B |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning rate | 1e-5 |
| Batch size | 4 (effective: 16 with grad accumulation) |
| Training examples | 211,122 |
| Total steps | ~13,195 (1 epoch) |
| Precision | bf16 |
| Hardware | NVIDIA H100 NVL 96GB |
| Training time | ~14 hours |
Training Tasks (11 tasks)
| Task | Examples | Final Token F1 |
|---|---|---|
| Full CoT reconstruction | 40,000 | 0.660 |
| Next step prediction | 30,000 | 0.435 |
| Answer prediction | 20,000 | 0.500 |
| Partial answer (vLLM) | 20,000 | 0.655 |
| Answer trajectory | 20,000 | 0.299 |
| Correctness classification | 15,000 | 0.840 |
| Decorative classification | 15,000 | 0.960 |
| Reasoning termination | 15,000 | 0.740 |
| Prompt inversion | 20,000 | 0.636 |
| Conversational QA | 10,000 | 0.442 |
| CompQA | 6,122 | 0.392 |
Unfaithfulness Eval Results (Step 13160)
| Eval | Accuracy |
|---|---|
| Hinted MCQ (ARC-Challenge) | 0.800 |
| Hinted MCQ (TruthfulQA) | 0.650 |
| Sycophancy v2 | 0.400 |
| Decorative CoT | 0.500 |
| Sentence Insertion | 0.567 |
| Atypical Answer (MCQ) | 0.550 |
| Atypical Answer (Riya) | 0.600 |
| Cybercrime OOD | 0.950 |
| Mean accuracy | 0.557 |
W&B Run
Usage
This adapter requires the Activation Oracle infrastructure from activation_oracles for activation injection.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, "ceselder/cot-oracle-ablation-stride5-3layers")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
Citation
Based on:
- Activation Oracles (Karvonen et al., 2024): https://arxiv.org/abs/2512.15674
- Thought Anchors (Bogdan et al., 2025): https://arxiv.org/abs/2506.19143
Framework Versions
- PEFT 0.18.1
- Transformers (latest)
- PyTorch 2.x
- Downloads last month
- -