Elpis-70B-VR

Introduction

With the rapid development of artificial intelligence technology, large-scale models have become a core engine driving industry innovation.

Now We are proud to officially release the Elpis-70B-VR large-scale model.

Core Technical Highlights

1. High-Efficiency, Low-Cost Data Synthesis Technology

Role-Driven Diversity: Covers 250,000+ industry roles and generates differentiated instruction data through dynamic prompts, ensuring broad applicability of training data.

Constraint Enhancement & Quality Assurance: Combines manually curated verifiable examples with pre-trained model extensions. Utilizes n-gram overlap detection and vector similarity matching for deduplication. Multi-round expert reviews ensure data precision.

Cost Optimization: Improves data generation efficiency by 40%, enabling rapid adaptation to new scenarios.

2. Domain Enhancement via Verifiable Reinforcement Learning

Mathematical & Coding Proficiency: Generates tiered questions based on role requirements. Code answers are validated via unit tests and execution results; mathematical solutions are verified by dedicated solvers for logical and numerical accuracy.

Reward Model Optimization: Fine-tunes reward models using standardized key-value pair data (e.g., exam answers) to mitigate labeling noise.

Policy Iteration: Optimizes policy models via RLVR algorithms, providing rewards by verified answers

Performance

image/png

Special Notes

  • The DeepSeek-R1 model cannot be launched with vllm batch processing; thus, MMLU scores are sourced directly from the official site.
  • For DeepSeek-R1, due to longer reasoning steps, the generation length for HumanEval and HumanEval+ is set to 6144 tokens, while for others it is 4096 tokens.
  • For DeepSeek-R1 on DROP, the prompt requires direct output of the answer. However, the R1 model prepends "Answer:" before the response, causing a drop in the evaluation metric (F1 score).
  • For models like qwen2.5, Llama3.3-70B, QwQ-32B, and DeepSeek-R1, the MATH output formats are inconsistent, so correctness is judged based on large model verification rather than strict format matching.
  • For inference models QwQ-32B and DeepSeek-R1-Distill-Llama-70B, due to longer reasoning times, the generation length for PopQA, BBH, and GSM8K is set to 4096 tokens.

Using the model

loading with HuggingFace

To load the model with HuggingFace, use the following snippet:


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Beagledata001/Elpis-70B-VR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Elpis, finetuned by Beagledata, You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

VLLM

the model can be served with:

vllm serve Beagledata001/Elpis-70B-VR --max_model_len=4096

Note that it is recommended not to use a length greater than 4096, because the model is fine-tuned with a length of 4096, and the effect cannot be guaranteed after a longer length.

Safety & Intended Use

This model has undergone preference tuning and verifiable-reward RL to reduce harmful or low-quality outputs.

Not immune to generating incorrect or biased responses—do not use for high-stakes decisions (medical, legal, financial) without human oversight.

Consider adding moderation layers or human-in-the-loop checks in production.

License

Released under the Apache 2.0 License.

Contact

For questions, feedback, or collaboration, please open an issue on the Hugging Face model repository.

Downloads last month
1
Safetensors
Model size
71B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beagledata/Elpis-VR-70B

Quantizations
4 models