Instructions to use DeepXR/Helion-V2.5-Rnd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DeepXR/Helion-V2.5-Rnd with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DeepXR/Helion-V2.5-Rnd with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DeepXR/Helion-V2.5-Rnd"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DeepXR/Helion-V2.5-Rnd

SGLang

How to use DeepXR/Helion-V2.5-Rnd with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DeepXR/Helion-V2.5-Rnd" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DeepXR/Helion-V2.5-Rnd" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DeepXR/Helion-V2.5-Rnd with Docker Model Runner:
```
docker model run hf.co/DeepXR/Helion-V2.5-Rnd
```

Trouter-Library commited on Dec 1, 2025

Commit

9bcba4a

verified ·

1 Parent(s): 44b08c5

Update README.md

Browse files

Files changed (1) hide show

README.md +261 -3

README.md CHANGED Viewed

@@ -1,3 +1,261 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Helion-2.5-Rnd
+**DeepXR/Helion-2.5-Rnd** - Advanced Research & Development Language Model
+## Overview
+Helion-2.5-Rnd is a cutting-edge research language model designed for exceptional performance across multiple domains including:
+- **Advanced Reasoning**: Complex problem-solving and logical deduction
+- **Code Generation**: Multi-language programming assistance
+- **Mathematical Computation**: Proof generation and symbolic mathematics
+- **Multilingual Understanding**: 50+ languages with cultural context
+- **Creative Writing**: Story generation, poetry, and content creation
+- **Scientific Analysis**: Research paper understanding and synthesis
+- **Long Context**: Up to 131K tokens of context window
+## Model Architecture
+- **Type**: Transformer-based causal language model
+- **Parameters**: 70B+ parameters
+- **Architecture**: LLaMA-based with YARN positional embeddings
+- **Context Window**: 131,072 tokens (128K)
+- **Precision**: BF16/FP16 with INT8/INT4 quantization support
+- **Training Data**: 2.5 trillion tokens across diverse domains
+## Quick Start
+### Installation
+```bash
+# Clone the repository
+git clone https://huggingface.co/DeepXR/Helion-2.5-Rnd
+cd Helion-2.5-Rnd
+# Install dependencies
+pip install -r requirements.txt
+# Or use Docker
+docker build -t helion:2.5-rnd .
+```
+### Running the Server
+#### Using Python
+```bash
+python -m inference.server \
+    --model /path/to/model \
+    --tensor-parallel-size 2 \
+    --max-model-len 131072 \
+    --gpu-memory-utilization 0.95
+```
+#### Using Docker
+```bash
+docker run -d \
+    --gpus all \
+    -p 8000:8000 \
+    -v /path/to/model:/models/helion \
+    -e MODEL_PATH=/models/helion \
+    -e TENSOR_PARALLEL_SIZE=2 \
+    helion:2.5-rnd
+```
+### Using the Client
+```python
+from inference.client import HelionClient, HelionAssistant
+# Basic client
+client = HelionClient(base_url="http://localhost:8000")
+# Simple completion
+response = client.complete(
+    "Explain quantum entanglement:",
+    temperature=0.7,
+    max_tokens=500
+)
+# Chat interface
+messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": "What is machine learning?"}
+]
+response = client.chat(messages=messages)
+# High-level assistant
+assistant = HelionAssistant()
+response = assistant.chat("Write a Python function for quicksort")
+```
+## API Endpoints
+### Chat Completions
+```bash
+curl -X POST http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "DeepXR/Helion-2.5-Rnd",
+    "messages": [
+      {"role": "user", "content": "Hello, how are you?"}
+    ],
+    "temperature": 0.7,
+    "max_tokens": 1000
+  }'
+```
+### Text Completions
+```bash
+curl -X POST http://localhost:8000/v1/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "DeepXR/Helion-2.5-Rnd",
+    "prompt": "Once upon a time",
+    "temperature": 0.8,
+    "max_tokens": 500
+  }'
+```
+### Health Check
+```bash
+curl http://localhost:8000/health
+```
+## Configuration
+### Model Parameters
+See `model_config.yaml` for full configuration options:
+- **Temperature**: 0.0-2.0 (default: 0.7)
+- **Top-p**: 0.0-1.0 (default: 0.9)
+- **Top-k**: Integer (default: 50)
+- **Max Tokens**: 1-131072 (default: 4096)
+- **Repetition Penalty**: 1.0-2.0 (default: 1.1)
+### Hardware Requirements
+**Minimum**:
+- 2x NVIDIA A100 80GB GPUs
+- 256GB RAM
+- 500GB NVMe SSD
+**Recommended**:
+- 4x NVIDIA H100 80GB GPUs
+- 512GB RAM
+- 1TB NVMe SSD
+## Capabilities
+### Code Generation
+```python
+messages = [
+    {"role": "user", "content": "Write a binary search tree implementation in Rust"}
+]
+response = client.chat(messages=messages, temperature=0.3)
+```
+### Mathematical Reasoning
+```python
+response = client.complete(
+    "Prove that the square root of 2 is irrational using contradiction:",
+    temperature=0.5
+)
+```
+### Creative Writing
+```python
+response = client.complete(
+    "Write a haiku about artificial intelligence:",
+    temperature=0.9
+)
+```
+### Multilingual Support
+Helion supports 50+ languages including:
+- English, Spanish, French, German, Italian
+- Chinese (Simplified & Traditional), Japanese, Korean
+- Arabic, Hebrew, Hindi, Russian
+- And many more...
+## Benchmarks
+| Benchmark | Score |
+|-----------|-------|
+| MMLU | 84.7% |
+| GSM8K | 89.2% |
+| HumanEval | 75.6% |
+| MBPP | 72.3% |
+| ARC Challenge | 83.4% |
+| HellaSwag | 88.9% |
+| TruthfulQA | 61.2% |
+## Safety and Limitations
+### Safety Features
+- Content filtering for harmful outputs
+- PII (Personally Identifiable Information) detection
+- Prompt injection protection
+- Toxicity thresholds
+### Known Limitations
+- This is a **research model** - outputs should be verified
+- May exhibit biases present in training data
+- Performance on highly specialized domains may vary
+- Long context (>64K tokens) performance degrades
+- Not suitable for production without further fine-tuning
+## Research Use
+This model is intended for **research and development purposes**. It represents an experimental version of the Helion architecture and is continuously being improved.
+### Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{helion-2.5-rnd,
+  title={Helion-2.5-Rnd: Advanced Research Language Model},
+  author={DeepXR Team},
+  year={2025},
+  publisher={DeepXR},
+  url={https://huggingface.co/DeepXR/Helion-2.5-Rnd}
+}
+```
+## License
+This model is released under the **Apache License 2.0**. See [LICENSE](LICENSE) for full details.
+## Support
+- **Documentation**: See `docs/` directory
+- **Issues**: Report on GitHub Issues
+- **Community**: Join our Discord/Slack
+- **Email**: support@deepxr.ai
+## Acknowledgments
+Built upon the excellent work of:
+- Meta AI (LLaMA architecture)
+- Hugging Face (Transformers library)
+- vLLM team (High-performance inference)
+- The open-source AI community
+---
+**DeepXR** - Advancing AI Research
+Version: 2.5.0-rnd | Status: Research | Updated: 2025-01-30