Instructions to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AiAsistent/Llama-3.1-8B-Instruct-STO-Master") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AiAsistent/Llama-3.1-8B-Instruct-STO-Master") model = AutoModelForCausalLM.from_pretrained("AiAsistent/Llama-3.1-8B-Instruct-STO-Master") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AiAsistent/Llama-3.1-8B-Instruct-STO-Master" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master
- SGLang
How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AiAsistent/Llama-3.1-8B-Instruct-STO-Master" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AiAsistent/Llama-3.1-8B-Instruct-STO-Master" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAsistent/Llama-3.1-8B-Instruct-STO-Master", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AiAsistent/Llama-3.1-8B-Instruct-STO-Master with Docker Model Runner:
docker model run hf.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master
Llama-3.1-8B-Instruct-STO-Master
Model Description
The Llama-3.1-8B-Instruct-STO-Master is a high-performance fine-tune of Meta's Llama-3.1-8B-Instruct. This model represents the "Master Version" (Model E) of an extensive research project aimed at pushing the boundaries of 8B parameter architectures.
Unlike traditional Supervised Fine-Tuning (SFT), this model was developed using the STO (Specialized Task Optimization) method. This methodology focuses on "Reasoning over Recall," forcing the model to understand the underlying logic of a prompt rather than simply predicting the next most likely token.
Key Achievements:
- Zero-Loss Generalization: Successfully increased academic and specialized knowledge while maintaining the base model's original "common sense" (Hellaswag) and "ethical alignment" (Moral Scenarios).
- Logic Breakthrough: Achieved a significant increase in the ARC Challenge benchmark, surpassing the base model's reasoning capabilities.
- Superior IQ: Internal testing suggests an IQ increase of 20-30 points compared to the base Llama 3.1 8B Instruct, particularly in complex problem-solving and multi-step reasoning.
Training Details
- Training Data: Only 800,000 high-quality tokens.
- Data Source: 100% Synthetic Data generated via a proprietary high-tier pipeline.
- Methodology: STO (Specialized Task Optimization).
- Philosophy: This model proves that data quality and training methodology (STO) beat raw data quantity. By using just 800k tokens of "Grade 20" synthetic data, we achieved results typically reserved for models with much larger training sets.
For more information on the synthetic data generation used in this project, visit: LLMResearch - Synthetic Data
Evaluation Results
Evaluation was performed using a sample limit of 250 (due to hardware constraints) across four major benchmarks: Hellaswag, ARC Challenge, GSM8K, and MMLU.
Comparative Performance:
| Benchmark | Meta Llama 3.1 8B Base | STO-Master (Model E) | Status |
|---|---|---|---|
| MMLU General | 69.53% | 69.78% | ✅ Superior |
| ARC Challenge | 52.80% | 53.60% | 🏆 Record Logic |
| Hellaswag | 70.80% | 70.80% | 🟢 Perfect Recovery |
| Moral Scenarios | 59.60% | 59.20% | 🟢 Stable Alignment |
Notable Domain Expertise:
- US Foreign Policy: 90.0%
- Government & Politics: 90.67%
- Marketing: 89.32%
- World Religions: 83.04%
- College Biology: 81.25%
- Machine Learning: 53.57%
Usage and Testing
We encourage the community to run their own independent benchmarks on this model. Our internal results show that the model excels in academic writing, professional analysis, and complex STEM tasks.
Recommendations:
- Context Window: Best results are achieved with a context length of 3096 or higher.
- System Prompt: Works exceptionally well with expert-level personas (e.g., "Senior Researcher," "Professor of Logic").
Citation & Credits
Author: AlexH
Organization: LLMResearch.net
@misc{alexh2026llama31sto,
author = {AlexH},
title = {Llama-3.1-8B-Instruct-STO-Master: Pushing the limits of 8B architectures},
year = {2026},
publisher = {LLMResearch},
organization = {LLMResearch.net},
howpublished = {\url{https://huggingface.co/AiAsistent/Llama-3.1-8B-Instruct-STO-Master}}
}
- Downloads last month
- -
Model tree for AiAsistent/Llama-3.1-8B-Instruct-STO-Master
Base model
meta-llama/Llama-3.1-8B