Text Generation
Transformers
Safetensors
English
llama
dense-responses
self-improvement
representation-engineering
cf-hot
recursive-self-improvement
Instructions to use LoganResearch/ARC-Base-8B-Condensed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LoganResearch/ARC-Base-8B-Condensed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LoganResearch/ARC-Base-8B-Condensed")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LoganResearch/ARC-Base-8B-Condensed", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LoganResearch/ARC-Base-8B-Condensed with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LoganResearch/ARC-Base-8B-Condensed" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LoganResearch/ARC-Base-8B-Condensed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LoganResearch/ARC-Base-8B-Condensed
- SGLang
How to use LoganResearch/ARC-Base-8B-Condensed with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LoganResearch/ARC-Base-8B-Condensed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LoganResearch/ARC-Base-8B-Condensed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LoganResearch/ARC-Base-8B-Condensed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LoganResearch/ARC-Base-8B-Condensed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LoganResearch/ARC-Base-8B-Condensed with Docker Model Runner:
docker model run hf.co/LoganResearch/ARC-Base-8B-Condensed
Commit ·
d106751
1
Parent(s): 3f598f5
Fix all references to use merged model, remove broken API section
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ tags:
|
|
| 9 |
- dense-responses
|
| 10 |
- self-optimization
|
| 11 |
- representation-engineering
|
| 12 |
-
base_model:
|
| 13 |
---
|
| 14 |

|
| 15 |
|
|
@@ -18,9 +18,9 @@ base_model: NousResearch/Hermes-3-Llama-3.1-8B
|
|
| 18 |
A closed-loop control system that uses internal state predictability to improve response efficiency without collapsing.
|
| 19 |
|
| 20 |
**Author:** Logan Matthew Napolitano
|
| 21 |
-
**Base Model:**
|
| 22 |
**License:** CC BY 4.0
|
| 23 |
-
**Code:** 7,111 lines | **Weights:** ~
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
@@ -155,39 +155,6 @@ A/B checkpoint comparison with automatic rollback on quality drops > 0.05.
|
|
| 155 |
|
| 156 |
---
|
| 157 |
|
| 158 |
-
## API Integration
|
| 159 |
-
|
| 160 |
-
For developers integrating ARC into their own applications:
|
| 161 |
-
|
| 162 |
-
```python
|
| 163 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 164 |
-
from peft import PeftModel
|
| 165 |
-
import torch
|
| 166 |
-
|
| 167 |
-
base = AutoModelForCausalLM.from_pretrained(
|
| 168 |
-
"NousResearch/Hermes-3-Llama-3.1-8B",
|
| 169 |
-
torch_dtype=torch.float16,
|
| 170 |
-
device_map="auto",
|
| 171 |
-
load_in_4bit=True
|
| 172 |
-
)
|
| 173 |
-
|
| 174 |
-
model = PeftModel.from_pretrained(
|
| 175 |
-
base,
|
| 176 |
-
"LoganResearch/ARC-Base-8B-Condensed",
|
| 177 |
-
subfolder="dense_checkpoints/step_100"
|
| 178 |
-
)
|
| 179 |
-
|
| 180 |
-
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-3-Llama-3.1-8B")
|
| 181 |
-
|
| 182 |
-
prompt = "<|im_start|>user\nWhat is recursion?<|im_end|>\n<|im_start|>assistant\n"
|
| 183 |
-
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
| 184 |
-
output = model.generate(**inputs, max_new_tokens=50)
|
| 185 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 186 |
-
```
|
| 187 |
-
|
| 188 |
-
Note: For full dense output with CF-HoT steering, use the main engine (`ubermenschetien_v2_full.py`).
|
| 189 |
-
|
| 190 |
-
---
|
| 191 |
|
| 192 |
## Training From Scratch
|
| 193 |
|
|
|
|
| 9 |
- dense-responses
|
| 10 |
- self-optimization
|
| 11 |
- representation-engineering
|
| 12 |
+
base_model: LoganResearch/ARC-Base-8B-Condensed
|
| 13 |
---
|
| 14 |

|
| 15 |
|
|
|
|
| 18 |
A closed-loop control system that uses internal state predictability to improve response efficiency without collapsing.
|
| 19 |
|
| 20 |
**Author:** Logan Matthew Napolitano
|
| 21 |
+
**Base Model:** LoganResearch/ARC-Base-8B-Condensed
|
| 22 |
**License:** CC BY 4.0
|
| 23 |
+
**Code:** 7,111 lines | **Weights:** ~16 GB
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
|
|
| 155 |
|
| 156 |
---
|
| 157 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
|
| 159 |
## Training From Scratch
|
| 160 |
|