Instructions to use DeepXR/Helion-V2.5-Rnd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DeepXR/Helion-V2.5-Rnd with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DeepXR/Helion-V2.5-Rnd with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DeepXR/Helion-V2.5-Rnd"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DeepXR/Helion-V2.5-Rnd

SGLang

How to use DeepXR/Helion-V2.5-Rnd with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DeepXR/Helion-V2.5-Rnd" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DeepXR/Helion-V2.5-Rnd" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepXR/Helion-V2.5-Rnd",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DeepXR/Helion-V2.5-Rnd with Docker Model Runner:
```
docker model run hf.co/DeepXR/Helion-V2.5-Rnd
```

Trouter-Library commited on Dec 6, 2025

Commit

4f1d5cf

verified ·

1 Parent(s): 940292e

Create MODEL_CARD.json

Browse files

Files changed (1) hide show

MODEL_CARD.json +231 -0

MODEL_CARD.json ADDED Viewed

	@@ -0,0 +1,231 @@

+{
+  "model_details": {
+    "name": "Helion-2.5-Rnd",
+    "version": "2.5.0-rnd",
+    "full_name": "DeepXR/Helion-2.5-Rnd",
+    "description": "Advanced research language model for reasoning, code generation, and multilingual understanding",
+    "organization": "DeepXR",
+    "license": "Apache-2.0",
+    "status": "research",
+    "release_date": "2025-01-30",
+    "model_type": "causal language model",
+    "architecture": "LLaMA",
+    "parameters": "70B+",
+    "base_model": "meta-llama/Meta-Llama-3.1-70B"
+  },
+  "intended_use": {
+    "primary_uses": [
+      "Research in natural language processing",
+      "Advanced reasoning and problem-solving",
+      "Code generation and programming assistance",
+      "Mathematical computation and proof generation",
+      "Multilingual text understanding and generation",
+      "Scientific analysis and research assistance",
+      "Educational applications"
+    ],
+    "primary_users": [
+      "AI researchers",
+      "Software developers",
+      "Data scientists",
+      "Academic researchers",
+      "Students and educators"
+    ],
+    "out_of_scope": [
+      "Production systems without extensive validation",
+      "Critical decision-making without human oversight",
+      "Medical diagnosis or treatment recommendations",
+      "Legal advice or financial guidance",
+      "Real-time safety-critical applications"
+    ]
+  },
+  "factors": {
+    "relevant_factors": [
+      "Input language and complexity",
+      "Task domain and specialization",
+      "Context length requirements",
+      "Computational resources available",
+      "User expertise and validation capability"
+    ],
+    "evaluation_factors": [
+      "Accuracy on benchmark datasets",
+      "Reasoning capability",
+      "Code correctness",
+      "Mathematical precision",
+      "Multilingual performance",
+      "Context utilization",
+      "Generation quality"
+    ]
+  },
+  "metrics": {
+    "reasoning": {
+      "MMLU": 0.847,
+      "ARC-Challenge": 0.834,
+      "HellaSwag": 0.889,
+      "WinoGrande": 0.823
+    },
+    "mathematics": {
+      "GSM8K": 0.892,
+      "MATH": 0.567,
+      "Minerva": 0.534
+    },
+    "code": {
+      "HumanEval": 0.756,
+      "MBPP": 0.723,
+      "DS-1000": 0.645
+    },
+    "knowledge": {
+      "TruthfulQA": 0.612
+    },
+    "perplexity": 2.34
+  },
+  "training_data": {
+    "note": "Training data information is proprietary to DeepXR Research",
+    "preprocessing": [
+      "Quality filtering",
+      "Deduplication",
+      "PII removal",
+      "Format standardization",
+      "Language identification",
+      "Toxicity filtering"
+    ]
+  },
+  "ethical_considerations": {
+    "risks": [
+      "Potential for generating biased content",
+      "May produce factually incorrect information",
+      "Could be misused for harmful content generation",
+      "Privacy concerns with training data",
+      "Environmental impact of training and inference"
+    ],
+    "mitigations": [
+      "Content filtering mechanisms",
+      "Regular bias auditing",
+      "Clear documentation of limitations",
+      "User education on responsible use",
+      "Toxicity detection and prevention",
+      "PII detection in outputs"
+    ],
+    "recommendations": [
+      "Implement additional safety layers for production use",
+      "Regular monitoring and evaluation of outputs",
+      "Human oversight for critical applications",
+      "Transparency about model capabilities and limitations",
+      "Respect for user privacy and data protection"
+    ]
+  },
+  "caveats_and_recommendations": {
+    "limitations": [
+      "Research model - requires validation before production use",
+      "May exhibit biases present in training data",
+      "Can generate plausible but incorrect information",
+      "Performance varies across specialized domains",
+      "Long context performance degrades beyond 64K tokens",
+      "Computational requirements are substantial",
+      "Not optimized for real-time applications"
+    ],
+    "recommendations": [
+      "Always verify outputs for critical applications",
+      "Implement appropriate content filtering",
+      "Monitor for bias in specific use cases",
+      "Test thoroughly before deployment",
+      "Use temperature=0 for deterministic tasks",
+      "Implement retry logic for API failures",
+      "Consider quantization for resource constraints"
+    ]
+  },
+  "technical_specifications": {
+    "context_window": 131072,
+    "vocabulary_size": 128256,
+    "hidden_size": 4096,
+    "num_layers": 32,
+    "num_attention_heads": 32,
+    "num_key_value_heads": 8,
+    "intermediate_size": 14336,
+    "rope_theta": 500000.0,
+    "rope_scaling": {
+      "type": "yarn",
+      "factor": 8.0,
+      "original_max_position_embeddings": 16384
+    },
+    "weight_format": "safetensors",
+    "supported_precisions": [
+      "fp16"
+    ],
+    "quantization": "none",
+    "safetensors_shards": 82,
+    "shard_naming": "shard_01 to shard_82",
+    "shard_size_gb": 1.57,
+    "supported_frameworks": [
+      "transformers",
+      "vllm",
+      "text-generation-inference"
+    ]
+  },
+  "hardware_requirements": {
+    "minimum": {
+      "gpu": "2x NVIDIA A100 80GB",
+      "vram": "160GB",
+      "ram": "256GB",
+      "storage": "500GB NVMe"
+    },
+    "recommended": {
+      "gpu": "4x NVIDIA H100 80GB",
+      "vram": "320GB",
+      "ram": "512GB",
+      "storage": "1TB+ NVMe"
+    },
+    "inference_speed": {
+      "tokens_per_second": "30-50 (depending on hardware)",
+      "latency": "100-300ms first token",
+      "throughput": "High with batch processing"
+    }
+  },
+  "model_sources": {
+    "repository": "https://huggingface.co/DeepXR/Helion-2.5-Rnd",
+    "paper": null,
+    "demo": null,
+    "organization": "https://deepxr.ai"
+  },
+  "citation": {
+    "bibtex": "@misc{helion-2.5-rnd-2025,\n  title={Helion-2.5-Rnd: Advanced Research Language Model},\n  author={DeepXR Research Team},\n  year={2025},\n  publisher={DeepXR},\n  url={https://huggingface.co/DeepXR/Helion-2.5-Rnd}\n}",
+    "apa": "DeepXR Research Team. (2025). Helion-2.5-Rnd: Advanced Research Language Model. DeepXR. https://huggingface.co/DeepXR/Helion-2.5-Rnd"
+  },
+  "contact": {
+    "email": "research@deepxr.ai",
+    "website": "https://deepxr.ai",
+    "github": "https://github.com/DeepXR",
+    "support": "support@deepxr.ai"
+  },
+  "additional_information": {
+    "languages_supported": [
+      "English", "Spanish", "French", "German", "Italian", "Portuguese",
+      "Chinese (Simplified)", "Chinese (Traditional)", "Japanese", "Korean",
+      "Russian", "Arabic", "Hindi", "Bengali", "Turkish", "Vietnamese",
+      "Polish", "Ukrainian", "Romanian", "Dutch", "Greek", "Czech",
+      "Swedish", "Hungarian", "Finnish", "Norwegian", "Danish", "Hebrew",
+      "Thai", "Indonesian", "Malay", "Filipino", "Persian", "Urdu",
+      "Tamil", "Telugu", "Kannada", "Malayalam", "Gujarati", "Marathi",
+      "Punjabi", "Swahili", "Amharic", "Yoruba", "Igbo", "Hausa"
+    ],
+    "programming_languages": [
+      "Python", "JavaScript", "TypeScript", "Java", "C++", "C#", "Go",
+      "Rust", "Swift", "Kotlin", "Ruby", "PHP", "Scala", "R", "MATLAB",
+      "SQL", "Shell", "PowerShell", "HTML", "CSS", "LaTeX"
+    ],
+    "deployment_options": [
+      "Docker containers",
+      "Kubernetes clusters",
+      "Cloud platforms (AWS, GCP, Azure)",
+      "On-premise servers",
+      "API endpoints",
+      "Batch processing pipelines"
+    ],
+    "monitoring_tools": [
+      "Prometheus metrics",
+      "Grafana dashboards",
+      "Custom logging",
+      "Performance profiling",
+      "Token usage tracking"
+    ]
+  }
+}