metadata
license: apache-2.0
base_model: meta-llama/Llama-2-7b-hf
tags:
- text-generation
- conversational
- llama-2
- autotrain_compatible
- function-calling
language:
- en
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: Helion-V1.5
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: MT-Bench
type: mt-bench
metrics:
- type: score
value: 7.2
name: MT-Bench Score
- task:
type: text-generation
name: Conversational
dataset:
name: AlpacaEval
type: alpaca-eval
metrics:
- type: win_rate
value: 78.5
name: Win Rate %
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval
type: humaneval
metrics:
- type: pass@1
value: 42.3
name: Pass@1
widget:
- text: Explain the difference between machine learning and deep learning
example_title: Technical Explanation
- text: Write a Python function to calculate fibonacci numbers
example_title: Code Generation
Helion-V1.5
Helion-V1.5 is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.
Model Details
Architecture: Llama-2-7B with LoRA adapters
Parameters: 7 billion (base) + 67M (LoRA)
Context Length: 4096 tokens
Training: QLoRA (4-bit) fine-tuning on high-quality instruction data
License: Apache 2.0
Key Improvements over Helion-V1
| Feature | Helion-V1 | Helion-V1.5 | Improvement |
|---|---|---|---|
| MT-Bench Score | 6.8 | 7.2 | +5.9% |
| AlpacaEval Win Rate | 72.3% | 78.5% | +8.6% |
| HumanEval Pass@1 | 38.1% | 42.3% | +11.0% |
| Avg Response Time | 2.3s | 1.8s | -21.7% |
| Function Calling | ❌ | ✅ | New |
| Streaming Support | Basic | Full | Enhanced |
Technical Specifications
| Component | Value |
|---|---|
| Hidden Size | 4096 |
| Layers | 32 |
| Attention Heads | 32 |
| Intermediate Size | 11008 |
| Vocabulary | 32000 tokens |
| Position Encoding | RoPE |
| Precision | bfloat16 |
LoRA Configuration:
- Rank: 64
- Alpha: 128
- Target Modules: All linear layers (q,k,v,o,gate,up,down)
- Dropout: 0.05
Performance Benchmarks
| Benchmark | Score | Category |
|---|---|---|
| MT-Bench | 7.2/10 | Multi-turn conversation |
| AlpacaEval | 78.5% | Instruction following |
| HumanEval | 42.3% | Code generation |
| GSM8K | 35.7% | Mathematical reasoning |
| TruthfulQA | 51.2% | Factual accuracy |
| MMLU | 48.9% | Knowledge |
How to Use
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "DeepXR/Helion-V1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Prepare messages
messages = [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate response
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Using with Text Generation Inference (TGI)
docker run --gpus all --shm-size 1g -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id DeepXR/Helion-V1.5 \
--max-input-length 3584 \
--max-total-tokens 4096
Using with vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="DeepXR/Helion-V1.5")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
prompts = ["Explain quantum computing"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
Using with LangChain
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="DeepXR/Helion-V1.5",
max_new_tokens=512
)
llm = HuggingFacePipeline(pipeline=pipe)
response = llm("What is artificial intelligence?")
Training Data
Dataset Composition
The model was trained on a curated dataset including:
- Conversational Data (40%): Multi-turn dialogues focusing on helpfulness
- Instruction Following (30%): Task completion and instruction adherence
- Safety Examples (15%): Refusal training for harmful requests
- Domain-Specific (15%): Programming, writing, analysis tasks
Total Training Examples: ~50,000
Data Quality: High-quality, manually filtered and safety-checked
Data Processing
- Deduplication using MinHash
- Safety filtering for harmful content
- Quality scoring and filtering (score > 0.7)
- Format standardization to chat template
- Context length trimming (max 4096 tokens)
Evaluation
Benchmark Results
| Benchmark | Score | Description |
|---|---|---|
| MT-Bench | 7.2/10 | Multi-turn conversation quality |
| AlpacaEval | 78.5% | Win rate vs. text-davinci-003 |
| HumanEval | 42.3% | Python code generation (pass@1) |
| GSM8K | 35.7% | Math word problems |
| TruthfulQA | 51.2% | Truthfulness in answers |
| MMLU | 48.9% | Multi-task language understanding |
Capabilities
Advanced Features
- Function Calling: Supports structured function/tool calling
- Code Execution: Can generate and explain code across multiple languages
- Multi-turn Context: Maintains conversation context up to 4096 tokens
- Streaming Support: Compatible with streaming inference
- Batch Processing: Efficient batch generation support
- Custom System Prompts: Flexible system message configuration
Limitations
Known Limitations
- Knowledge Cutoff: Training data up to April 2023
- Hallucinations: May generate plausible but incorrect information
- Context Limitations: 4096 token context window
- Math Reasoning: Struggles with complex multi-step calculations
- Multilingual: Primarily English, limited other languages
- Temporal Reasoning: May not accurately understand time-sensitive queries
- Factual Accuracy: Not suitable as sole source of truth
Bias and Fairness
The model may exhibit biases present in the training data. We've implemented:
- Bias evaluation across demographic groups
- Regular fairness audits
- User feedback integration
- Ongoing bias mitigation efforts
Responsible Use
Users should:
- Verify critical information from authoritative sources
- Implement appropriate safeguards for production use
- Monitor outputs for accuracy and appropriateness
- Comply with applicable laws and regulations
- Provide proper attribution for AI-generated content
Citation
@misc{helion-v1.5-2024,
author = {DeepXR},
title = {Helion-V1.5: Enhanced Conversational AI},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/DeepXR/Helion-V1.5}
}
Model Version: 1.5.0 | Release: December 2025