Instructions to use 169Pi/Alpie-Core with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 169Pi/Alpie-Core with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="169Pi/Alpie-Core")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("169Pi/Alpie-Core")
model = AutoModelForCausalLM.from_pretrained("169Pi/Alpie-Core")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use 169Pi/Alpie-Core with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "169Pi/Alpie-Core"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/169Pi/Alpie-Core

SGLang

How to use 169Pi/Alpie-Core with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "169Pi/Alpie-Core" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "169Pi/Alpie-Core" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use 169Pi/Alpie-Core with Docker Model Runner:
```
docker model run hf.co/169Pi/Alpie-Core
```

deepanshupillm commited on Sep 5, 2025

Commit

b487e83

verified ·

1 Parent(s): 8ec4403

Update README.md

Browse files

Files changed (1) hide show

README.md +112 -32

README.md CHANGED Viewed

@@ -44,24 +44,26 @@ Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, provin
 ## 5. Benchmark Results
-| Benchmark | Alpie-Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Category |
-|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------|
-| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | General Knowledge |
-| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | Mathematical Reasoning |
-| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | Complex Reasoning |
-| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | Advanced Reasoning |
-| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | Code Generation |
-| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | Code Generation |
-| SWE-Bench Verified | **57.8%** | - | - | - | - | - | Software Engineering |
-| AIME | **47.34%** | - | - | - | - | - | Advanced Mathematics |
-| GPQA (Diamond) | **40.91%** | - | - | - | - | - | Graduate-level QA |
-| TruthfulQA (MC2) | **60.05%** | - | - | - | - | - | Truthfulness |
-| HellaSwag | **84.66%** | - | - | - | - | - | Commonsense |
-| PIQA | **83.24%** | - | - | - | - | - | Physical Reasoning |
-| ARC Challenge | **67.58%** | - | - | - | - | - | Science QA |
-| CommonSenseQA | **87.06%** | - | - | - | - | - | Commonsense |
-| AGIEval | **64.98%** | - | - | - | - | - | General Intelligence |
-| Winogrande | **79.53%** | - | - | - | - | - | Commonsense Reasoning |
 ### Humanity's Last Exam Leaderboard Performance
@@ -76,6 +78,20 @@ Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, provin
 | 7 | DeepSeek V3 | 4.55 | Below Alpie |
 | 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
 ## 6. Training Details
 - **Hardware**: 8× NVIDIA A100-80GB GPUs
@@ -101,7 +117,7 @@ Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, provin
 - Experimental design optimization
 ### Advanced Coding and Software Engineering
-- 57.8% SWE-Bench Verified score (12% above nearest competitor)
 - Automated bug detection and GitHub issue resolution
 - Competitive programming and algorithm design
 - Enterprise software development and architecture design
@@ -130,22 +146,86 @@ Unlike the base DeepSeek model, Alpie-Core provides factual, balanced responses
 ## 10. How to Use
-### Installation
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-model_id = "alpie/Alpie-Core-4bit"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    device_map="auto",
-    torch_dtype="auto"
 )
-messages = [{"role": "user", "content": "Solve 2x^2 + 3x + 5 = 0"}]
-inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ### Deployment Options

 ## 5. Benchmark Results
+| Benchmark | Alpie-Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B-Base-2501 |
+|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------------------------|
+| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
+| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | nan | 82.2% | 80.73% |
+| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | nan |
+| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | 54.37% |
+| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | nan | 65.6% | 69.64% |
+| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | nan | 48.8% | nan |
+### SWE-Bench Verified Performance
+| Rank | Model | Accuracy (%) | Performance vs Alpie |
+|------|-------|-------------|---------------------|
+| **1** | **Alpie Core** | **57.8** | **Alpie** |
+| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | Below Alpie |
+| 3 | o1 | 48.9 | Below Alpie |
+| 4 | o3-mini (high) | 49.3 | Below Alpie |
+| 5 | Claude 3.5 Sonnet | 49.0 | Below Alpie |
+| 6 | DeepSeek R1 | 49.2 | Below Alpie |
+| 7 | Devstral | 46.8 | Below Alpie |
 ### Humanity's Last Exam Leaderboard Performance
 | 7 | DeepSeek V3 | 4.55 | Below Alpie |
 | 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
+### Additional Benchmarks
+| Benchmark | Alpie-Core (32B-4bit) | Category |
+|-----------|----------------------|----------|
+| AIME | **47.34%** | Advanced Mathematics |
+| GPQA (Diamond) | **40.91%** | Graduate-level QA |
+| TruthfulQA (MC2) | **60.05%** | Truthfulness |
+| HellaSwag | **84.66%** | Commonsense |
+| PIQA | **83.24%** | Physical Reasoning |
+| ARC Challenge | **67.58%** | Science QA |
+| CommonSenseQA | **87.06%** | Commonsense |
+| AGIEval | **64.98%** | General Intelligence |
+| Winogrande | **79.53%** | Commonsense Reasoning |
 ## 6. Training Details
 - **Hardware**: 8× NVIDIA A100-80GB GPUs
 - Experimental design optimization
 ### Advanced Coding and Software Engineering
+- 57.8% SWE-Bench Verified score (8% above nearest competitor)
 - Automated bug detection and GitHub issue resolution
 - Competitive programming and algorithm design
 - Enterprise software development and architecture design
 ## 10. How to Use
+### Non-Streaming Inference
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel, PeftConfig
+import torch
+# Load LoRA adapter configuration to find the base model
+peft_model_id = "169Pi/Alpie-core"
+config = PeftConfig.from_pretrained(peft_model_id)
+# Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
+    torch_dtype=torch.float16,
+    device_map="auto"
 )
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
+# Load LoRA weights
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+# Ensure evaluation mode
+model.eval()
+# Sample inference
+prompt = "Solve the Riemann Hypothesis and provide a final answer?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=1000)
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print("Response:\n", response)
+```
+### Streaming Inference
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+from peft import PeftModel, PeftConfig
+import torch
+# Load LoRA adapter configuration to find the base model
+peft_model_id = "169Pi/Alpie-core"
+config = PeftConfig.from_pretrained(peft_model_id)
+# Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
+# Load LoRA weights
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+# Ensure evaluation mode
+model.eval()
+# Initialize streamer
+streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+# Sample streaming inference
+prompt = "Solve the Riemann Hypothesis and provide a final answer?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+print("Streaming Response:")
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=1000,
+        streamer=streamer,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.9
+    )
 ```
 ### Deployment Options