Elpis-VR-32B

Elpis-VR-32B is a domain-enhanced large language model trained based on Qwen3-32B, designed for energy, electric power, factory production operations, and related industrial scenarios. The model aims to strengthen industry knowledge understanding and task execution capabilities while preserving as much of the base model's general abilities as possible.

Model Details

Model Description

Elpis-VR-32B is fully fine-tuned from Qwen3-32B and is mainly intended for professional question answering, knowledge understanding, structured analysis, instruction following, and text generation tasks in energy, electric power, factory production operations, and industrial scenarios.

The training of this model focuses on the following three directions:

High-Quality Domain Data Construction
Advanced large language models such as DeepSeek-R1 and GPT-4o were used to organize, extract, summarize, rewrite, and structure data related to energy, electric power, factories, and other relevant domains, in order to build high-quality domain training data.
Training Data Quality Evaluation System
A quality evaluation system was built for training data generated by large language models. Candidate data was assessed across multiple dimensions, including factual consistency, terminology accuracy, clarity of expression, task relevance, format standardization, and output stability, thereby improving the overall quality of the training data.
Domain Enhancement with Capability Preservation
While strengthening knowledge in the energy and electric power domain, the training also seeks to preserve the base model's performance in general capabilities such as mathematics, knowledge understanding, and comprehensive question answering, reducing the risk of capability degradation caused by domain training.

Overall, the goal of Elpis-VR-32B is not merely to inject industry knowledge, but to build a large language model for industrial scenarios that balances domain capability, training data quality, and preservation of base-model abilities.

Uses

Use with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Beagledata/Elpis-VR-32B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)

Use with VLLM

For deployment, you can use vllm>=0.8.5 or create an OpenAI-compatible API endpoint:

vllm serve Beagledata/Elpis-VR-32B --enable-reasoning --reasoning-parser deepseek_r1

Evaluation

Evaluation Goal

We focus on evaluating two aspects:

Preservation of general capabilities: maintain as much of the base model's original ability as possible after training
Control of domain-enhancement cost: observe the magnitude of changes on general benchmarks after introducing domain knowledge

Benchmark Results

Model Version	math_500	mmlu_redux	ceval	gpqa	live_code_bench/v5
Qwen3-32B	0.9520	0.8807	0.8848	0.6364	0.6280
Elpis-VR-32B	0.9501	0.8814	0.8819	0.6278	0.5854

Safety & Intended Use

This model has undergone preference tuning and verifiable-reward RL to reduce harmful or low-quality outputs.

Not immune to generating incorrect or biased responses—do not use for high-stakes decisions (medical, legal, financial) without human oversight.

Consider adding moderation layers or human-in-the-loop checks in production.

License

Released under the Apache 2.0 License.

Contact

For questions, feedback, or collaboration, please open an issue on the Hugging Face model repository.