Instructions to use Beagledata/Elpis-VR-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Beagledata/Elpis-VR-70B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Beagledata/Elpis-VR-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Beagledata/Elpis-VR-70B")
model = AutoModelForCausalLM.from_pretrained("Beagledata/Elpis-VR-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Beagledata/Elpis-VR-70B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Beagledata/Elpis-VR-70B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Beagledata/Elpis-VR-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Beagledata/Elpis-VR-70B

SGLang

How to use Beagledata/Elpis-VR-70B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Beagledata/Elpis-VR-70B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Beagledata/Elpis-VR-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Beagledata/Elpis-VR-70B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Beagledata/Elpis-VR-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Beagledata/Elpis-VR-70B with Docker Model Runner:
```
docker model run hf.co/Beagledata/Elpis-VR-70B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Elpis-70B-VR

Introduction

With the rapid development of artificial intelligence technology, large-scale models have become a core engine driving industry innovation.

Now We are proud to officially release the Elpis-70B-VR large-scale model.

Core Technical Highlights

1. High-Efficiency, Low-Cost Data Synthesis Technology

– Role-Driven Diversity: Covers 250,000+ industry roles and generates differentiated instruction data through dynamic prompts, ensuring broad applicability of training data.

– Constraint Enhancement & Quality Assurance: Combines manually curated verifiable examples with pre-trained model extensions. Utilizes n-gram overlap detection and vector similarity matching for deduplication. Multi-round expert reviews ensure data precision.

– Cost Optimization: Improves data generation efficiency by 40%, enabling rapid adaptation to new scenarios.

2. Domain Enhancement via Verifiable Reinforcement Learning

– Mathematical & Coding Proficiency: Generates tiered questions based on role requirements. Code answers are validated via unit tests and execution results; mathematical solutions are verified by dedicated solvers for logical and numerical accuracy.

– Reward Model Optimization: Fine-tunes reward models using standardized key-value pair data (e.g., exam answers) to mitigate labeling noise.

– Policy Iteration: Optimizes policy models via RLVR algorithms, providing rewards by verified answers

Performance

Special Notes

The DeepSeek-R1 model cannot be launched with vllm batch processing; thus, MMLU scores are sourced directly from the official site.
For DeepSeek-R1, due to longer reasoning steps, the generation length for HumanEval and HumanEval+ is set to 6144 tokens, while for others it is 4096 tokens.
For DeepSeek-R1 on DROP, the prompt requires direct output of the answer. However, the R1 model prepends "Answer:" before the response, causing a drop in the evaluation metric (F1 score).
For models like qwen2.5, Llama3.3-70B, QwQ-32B, and DeepSeek-R1, the MATH output formats are inconsistent, so correctness is judged based on large model verification rather than strict format matching.
For inference models QwQ-32B and DeepSeek-R1-Distill-Llama-70B, due to longer reasoning times, the generation length for PopQA, BBH, and GSM8K is set to 4096 tokens.

Using the model

loading with HuggingFace

To load the model with HuggingFace, use the following snippet:


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Beagledata001/Elpis-70B-VR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Elpis, finetuned by Beagledata, You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

VLLM

the model can be served with:

vllm serve Beagledata001/Elpis-70B-VR --max_model_len=4096

Note that it is recommended not to use a length greater than 4096, because the model is fine-tuned with a length of 4096, and the effect cannot be guaranteed after a longer length.

Safety & Intended Use

This model has undergone preference tuning and verifiable-reward RL to reduce harmful or low-quality outputs.

Not immune to generating incorrect or biased responses—do not use for high-stakes decisions (medical, legal, financial) without human oversight.

Consider adding moderation layers or human-in-the-loop checks in production.

License

Released under the Apache 2.0 License.

Contact

For questions, feedback, or collaboration, please open an issue on the Hugging Face model repository.

Downloads last month: 5

Safetensors

Model size

71B params

Tensor type

BF16

Model tree for Beagledata/Elpis-VR-70B

Quantizations

4 models