# Clone the repository
git clone https://huggingface.co/DeepXR/Helion-2.5-Rnd
cd Helion-2.5-Rnd

# Install dependencies
pip install -r requirements.txt

# Or use Docker
docker build -t helion:2.5-rnd .

Running the Server

Using Python

python -m inference.server \
    --model /path/to/model \
    --tensor-parallel-size 2 \
    --max-model-len 131072 \
    --gpu-memory-utilization 0.95

Using Docker

docker run -d \
    --gpus all \
    -p 8000:8000 \
    -v /path/to/model:/models/helion \
    -e MODEL_PATH=/models/helion \
    -e TENSOR_PARALLEL_SIZE=2 \
    helion:2.5-rnd

Using the Client

from inference.client import HelionClient, HelionAssistant

# Basic client
client = HelionClient(base_url="http://localhost:8000")

# Simple completion
response = client.complete(
    "Explain quantum entanglement:",
    temperature=0.7,
    max_tokens=500
)

# Chat interface
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What is machine learning?"}
]
response = client.chat(messages=messages)

# High-level assistant
assistant = HelionAssistant()
response = assistant.chat("Write a Python function for quicksort")

API Endpoints

Chat Completions

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DeepXR/Helion-2.5-Rnd",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Text Completions

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DeepXR/Helion-2.5-Rnd",
    "prompt": "Once upon a time",
    "temperature": 0.8,
    "max_tokens": 500
  }'

Health Check

curl http://localhost:8000/health

Configuration

Model Parameters

See model_config.yaml for full configuration options:

Temperature: 0.0-2.0 (default: 0.7)
Top-p: 0.0-1.0 (default: 0.9)
Top-k: Integer (default: 50)
Max Tokens: 1-131072 (default: 4096)
Repetition Penalty: 1.0-2.0 (default: 1.1)

Hardware Requirements

Minimum:

2x NVIDIA A100 80GB GPUs
256GB RAM
500GB NVMe SSD

Recommended:

4x NVIDIA H100 80GB GPUs
512GB RAM
1TB NVMe SSD

Capabilities

Code Generation

messages = [
    {"role": "user", "content": "Write a binary search tree implementation in Rust"}
]
response = client.chat(messages=messages, temperature=0.3)

Mathematical Reasoning

response = client.complete(
    "Prove that the square root of 2 is irrational using contradiction:",
    temperature=0.5
)

Creative Writing

response = client.complete(
    "Write a haiku about artificial intelligence:",
    temperature=0.9
)

Multilingual Support

Helion supports 50+ languages including:

English, Spanish, French, German, Italian
Chinese (Simplified & Traditional), Japanese, Korean
Arabic, Hebrew, Hindi, Russian
And many more...

Benchmarks

Benchmark	Score
MMLU	84.7%
GSM8K	89.2%
HumanEval	75.6%
MBPP	72.3%
ARC Challenge	83.4%
HellaSwag	88.9%
TruthfulQA	61.2%

Safety and Limitations

Safety Features

Content filtering for harmful outputs
PII (Personally Identifiable Information) detection
Prompt injection protection
Toxicity thresholds

Known Limitations

This is a research model - outputs should be verified
May exhibit biases present in training data
Performance on highly specialized domains may vary
Long context (>64K tokens) performance degrades
Not suitable for production without further fine-tuning

Research Use

This model is intended for research and development purposes. It represents an experimental version of the Helion architecture and is continuously being improved.

Citation

If you use this model in your research, please cite:

@misc{helion-2.5-rnd,
  title={Helion-2.5-Rnd: Advanced Research Language Model},
  author={DeepXR Team},
  year={2025},
  publisher={DeepXR},
  url={https://huggingface.co/DeepXR/Helion-2.5-Rnd}
}

License

This model is released under the Apache License 2.0. See LICENSE for full details.

Support

Documentation: See docs/ directory
Issues: Report on GitHub Issues
Community: Join our Discord/Slack
Email: support@deepxr.ai

Acknowledgments

Built upon the excellent work of:

Meta AI (LLaMA architecture)
Hugging Face (Transformers library)
vLLM team (High-performance inference)
The open-source AI community

DeepXR - Advancing AI Research

Version: 2.5.0-rnd | Status: Research | Updated: 2025-01-30