Instructions to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AiAsistent/Llama-3.1-8B-Instruct-STO-Master")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AiAsistent/Llama-3.1-8B-Instruct-STO-Master")
model = AutoModelForCausalLM.from_pretrained("AiAsistent/Llama-3.1-8B-Instruct-STO-Master")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AiAsistent/Llama-3.1-8B-Instruct-STO-Master"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master

SGLang

How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AiAsistent/Llama-3.1-8B-Instruct-STO-Master" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AiAsistent/Llama-3.1-8B-Instruct-STO-Master" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with Docker Model Runner:
```
docker model run hf.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master
```

Llama-3.1-8B-Instruct-STO-Master

Model Description

The Llama-3.1-8B-Instruct-STO-Master is a high-performance fine-tune of Meta's Llama-3.1-8B-Instruct. This model represents the "Master Version" (Model E) of an extensive research project aimed at pushing the boundaries of 8B parameter architectures.

Unlike traditional Supervised Fine-Tuning (SFT), this model was developed using the STO (Specialized Task Optimization) method. This methodology focuses on "Reasoning over Recall," forcing the model to understand the underlying logic of a prompt rather than simply predicting the next most likely token.

Key Achievements:

Zero-Loss Generalization: Successfully increased academic and specialized knowledge while maintaining the base model's original "common sense" (Hellaswag) and "ethical alignment" (Moral Scenarios).
Logic Breakthrough: Achieved a significant increase in the ARC Challenge benchmark, surpassing the base model's reasoning capabilities.
Superior IQ: Internal testing suggests an IQ increase of 20-30 points compared to the base Llama 3.1 8B Instruct, particularly in complex problem-solving and multi-step reasoning.

Training Details

Training Data: Only 800,000 high-quality tokens.
Data Source: 100% Synthetic Data generated via a proprietary high-tier pipeline.
Methodology: STO (Specialized Task Optimization).
Philosophy: This model proves that data quality and training methodology (STO) beat raw data quantity. By using just 800k tokens of "Grade 20" synthetic data, we achieved results typically reserved for models with much larger training sets.

For more information on the synthetic data generation used in this project, visit: LLMResearch - Synthetic Data

Evaluation Results

Evaluation was performed using a sample limit of 250 (due to hardware constraints) across four major benchmarks: Hellaswag, ARC Challenge, GSM8K, and MMLU.

Comparative Performance:

Benchmark	Meta Llama 3.1 8B Base	STO-Master (Model E)	Status
MMLU General	69.53%	69.78%	✅ Superior
ARC Challenge	52.80%	53.60%	🏆 Record Logic
Hellaswag	70.80%	70.80%	🟢 Perfect Recovery
Moral Scenarios	59.60%	59.20%	🟢 Stable Alignment

Notable Domain Expertise:

US Foreign Policy: 90.0%
Government & Politics: 90.67%
Marketing: 89.32%
World Religions: 83.04%
College Biology: 81.25%
Machine Learning: 53.57%

Usage and Testing

We encourage the community to run their own independent benchmarks on this model. Our internal results show that the model excels in academic writing, professional analysis, and complex STEM tasks.

Recommendations:

Context Window: Best results are achieved with a context length of 3096 or higher.
System Prompt: Works exceptionally well with expert-level personas (e.g., "Senior Researcher," "Professor of Logic").

Citation & Credits

Author: AlexH
Organization: LLMResearch.net

@misc{alexh2026llama31sto,
  author = {AlexH},
  title = {Llama-3.1-8B-Instruct-STO-Master: Pushing the limits of 8B architectures},
  year = {2026},
  publisher = {LLMResearch},
  organization = {LLMResearch.net},
  howpublished = {\url{https://huggingface.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master}}
}

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

F16

Model tree for AiAsistent/Llama-3.1-8B-Instruct-STO-Master

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2838)

this model