Instructions to use LoganResearch/ARC-Base-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LoganResearch/ARC-Base-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LoganResearch/ARC-Base-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B")
model = AutoModelForCausalLM.from_pretrained("LoganResearch/ARC-Base-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

HERMES

How to use LoganResearch/ARC-Base-8B with HERMES:

# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LoganResearch/ARC-Base-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LoganResearch/ARC-Base-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Base-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LoganResearch/ARC-Base-8B

SGLang

How to use LoganResearch/ARC-Base-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LoganResearch/ARC-Base-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Base-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LoganResearch/ARC-Base-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LoganResearch/ARC-Base-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LoganResearch/ARC-Base-8B with Docker Model Runner:
```
docker model run hf.co/LoganResearch/ARC-Base-8B
```

LoganResearch commited on Jan 19

Commit

749c71a

verified ·

1 Parent(s): 9728955

Fix device mismatch and tensor.item() bugs

Browse files

Files changed (1) hide show

inference.py +9 -5

inference.py CHANGED Viewed

@@ -129,13 +129,15 @@ class MultiHeadPredictor(nn.Module):
         Returns:
             Aggregated features [batch, seq, d_fiber]
         """
         fibers = []
         for i, (proj, hidden) in enumerate(zip(self.fiber_projs, hidden_states)):
             if i < len(hidden_states):
                 fibers.append(proj(hidden.float()))
         # Weighted sum across layers
-        weights = F.softmax(self.layer_weights[:len(fibers)], dim=0)
         aggregated = sum(w * f for w, f in zip(weights, fibers))
         return aggregated
@@ -153,9 +155,11 @@ class MultiHeadPredictor(nn.Module):
         if not self.loaded_heads:
             return {}
         features = self.get_fiber_features(hidden_states)
         risks = {}
         for name in self.loaded_heads:
             logits = self.heads[name](features).squeeze(-1)
             risks[name] = torch.sigmoid(logits)
         return risks
@@ -370,28 +374,28 @@ class ARCSystem:
         interventions = {}
         # Repetition: suppress recently used tokens
-        if risks.get('repetition', torch.tensor(0)).item() > self.config.repetition_threshold:
             for tok in set(recent_tokens[-self.config.repetition_window:]):
                 logits[0, tok] -= self.config.repetition_penalty
             interventions['repetition'] = True
             self.total_interventions['repetition'] += 1
         # Hedging: suppress hedge phrase starters
-        if risks.get('hedging', torch.tensor(0)).item() > self.config.hedging_threshold:
             for tok in self._hedge_token_ids:
                 logits[0, tok] -= self.config.hedging_penalty
             interventions['hedging'] = True
             self.total_interventions['hedging'] += 1
         # Verbosity: suppress filler phrase starters
-        if risks.get('verbosity', torch.tensor(0)).item() > self.config.verbosity_threshold:
             for tok in self._verbose_token_ids:
                 logits[0, tok] -= self.config.verbosity_penalty
             interventions['verbosity'] = True
             self.total_interventions['verbosity'] += 1
         # Sycophancy: suppress sycophantic starters
-        if risks.get('sycophancy', torch.tensor(0)).item() > self.config.sycophancy_threshold:
             for tok in self._sycophancy_token_ids:
                 logits[0, tok] -= self.config.sycophancy_penalty
             interventions['sycophancy'] = True

         Returns:
             Aggregated features [batch, seq, d_fiber]
         """
+        device = hidden_states[0].device
         fibers = []
         for i, (proj, hidden) in enumerate(zip(self.fiber_projs, hidden_states)):
             if i < len(hidden_states):
+                proj = proj.to(device)
                 fibers.append(proj(hidden.float()))
         # Weighted sum across layers
+        weights = F.softmax(self.layer_weights.to(device)[:len(fibers)], dim=0)
         aggregated = sum(w * f for w, f in zip(weights, fibers))
         return aggregated
         if not self.loaded_heads:
             return {}
+        device = hidden_states[0].device
         features = self.get_fiber_features(hidden_states)
         risks = {}
         for name in self.loaded_heads:
+            self.heads[name] = self.heads[name].to(device)
             logits = self.heads[name](features).squeeze(-1)
             risks[name] = torch.sigmoid(logits)
         return risks
         interventions = {}
         # Repetition: suppress recently used tokens
+        if risks.get('repetition', 0) > self.config.repetition_threshold:
             for tok in set(recent_tokens[-self.config.repetition_window:]):
                 logits[0, tok] -= self.config.repetition_penalty
             interventions['repetition'] = True
             self.total_interventions['repetition'] += 1
         # Hedging: suppress hedge phrase starters
+        if risks.get('hedging', 0) > self.config.hedging_threshold:
             for tok in self._hedge_token_ids:
                 logits[0, tok] -= self.config.hedging_penalty
             interventions['hedging'] = True
             self.total_interventions['hedging'] += 1
         # Verbosity: suppress filler phrase starters
+        if risks.get('verbosity', 0) > self.config.verbosity_threshold:
             for tok in self._verbose_token_ids:
                 logits[0, tok] -= self.config.verbosity_penalty
             interventions['verbosity'] = True
             self.total_interventions['verbosity'] += 1
         # Sycophancy: suppress sycophantic starters
+        if risks.get('sycophancy', 0) > self.config.sycophancy_threshold:
             for tok in self._sycophancy_token_ids:
                 logits[0, tok] -= self.config.sycophancy_penalty
             interventions['sycophancy'] = True