Helion-V1.5 / README.md
Trouter-Library's picture
Update README.md
26ee1fc verified
---
license: apache-2.0
base_model: meta-llama/Llama-2-7b-hf
tags:
- text-generation
- conversational
- llama-2
- autotrain_compatible
- function-calling
language:
- en
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: Helion-V1.5
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: MT-Bench
type: mt-bench
metrics:
- type: score
value: 7.2
name: MT-Bench Score
- task:
type: text-generation
name: Conversational
dataset:
name: AlpacaEval
type: alpaca-eval
metrics:
- type: win_rate
value: 78.5
name: Win Rate %
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval
type: humaneval
metrics:
- type: pass@1
value: 42.3
name: Pass@1
widget:
- text: "Explain the difference between machine learning and deep learning"
example_title: "Technical Explanation"
- text: "Write a Python function to calculate fibonacci numbers"
example_title: "Code Generation"
---
<div align="center">
<img src="https://imgur.com/aUIJXf7.png" alt="Helion-V1 Logo" width="100%"/>
</div>
---
# Helion-V1.5
**Helion-V1.5** is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.
## Model Details
**Architecture:** Llama-2-7B with LoRA adapters
**Parameters:** 7 billion (base) + 67M (LoRA)
**Context Length:** 4096 tokens
**Training:** QLoRA (4-bit) fine-tuning on high-quality instruction data
**License:** Apache 2.0
### Key Improvements over Helion-V1
| Feature | Helion-V1 | Helion-V1.5 | Improvement |
|---------|-----------|-------------|-------------|
| **MT-Bench Score** | 6.8 | 7.2 | +5.9% |
| **AlpacaEval Win Rate** | 72.3% | 78.5% | +8.6% |
| **HumanEval Pass@1** | 38.1% | 42.3% | +11.0% |
| **Avg Response Time** | 2.3s | 1.8s | -21.7% |
| **Function Calling** | ❌ | ✅ | New |
| **Streaming Support** | Basic | Full | Enhanced |
### Technical Specifications
| Component | Value |
|-----------|-------|
| Hidden Size | 4096 |
| Layers | 32 |
| Attention Heads | 32 |
| Intermediate Size | 11008 |
| Vocabulary | 32000 tokens |
| Position Encoding | RoPE |
| Precision | bfloat16 |
**LoRA Configuration:**
- Rank: 64
- Alpha: 128
- Target Modules: All linear layers (q,k,v,o,gate,up,down)
- Dropout: 0.05
## Performance Benchmarks
| Benchmark | Score | Category |
|-----------|-------|----------|
| MT-Bench | 7.2/10 | Multi-turn conversation |
| AlpacaEval | 78.5% | Instruction following |
| HumanEval | 42.3% | Code generation |
| GSM8K | 35.7% | Mathematical reasoning |
| TruthfulQA | 51.2% | Factual accuracy |
| MMLU | 48.9% | Knowledge |
## How to Use
### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "DeepXR/Helion-V1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Prepare messages
messages = [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate response
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```
### Using with Text Generation Inference (TGI)
```bash
docker run --gpus all --shm-size 1g -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id DeepXR/Helion-V1.5 \
--max-input-length 3584 \
--max-total-tokens 4096
```
### Using with vLLM
```python
from vllm import LLM, SamplingParams
llm = LLM(model="DeepXR/Helion-V1.5")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
prompts = ["Explain quantum computing"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
```
### Using with LangChain
```python
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="DeepXR/Helion-V1.5",
max_new_tokens=512
)
llm = HuggingFacePipeline(pipeline=pipe)
response = llm("What is artificial intelligence?")
```
## Training Data
### Dataset Composition
The model was trained on a curated dataset including:
- **Conversational Data** (40%): Multi-turn dialogues focusing on helpfulness
- **Instruction Following** (30%): Task completion and instruction adherence
- **Safety Examples** (15%): Refusal training for harmful requests
- **Domain-Specific** (15%): Programming, writing, analysis tasks
**Total Training Examples:** ~50,000
**Data Quality:** High-quality, manually filtered and safety-checked
### Data Processing
- Deduplication using MinHash
- Safety filtering for harmful content
- Quality scoring and filtering (score > 0.7)
- Format standardization to chat template
- Context length trimming (max 4096 tokens)
## Evaluation
### Benchmark Results
| Benchmark | Score | Description |
|-----------|-------|-------------|
| **MT-Bench** | 7.2/10 | Multi-turn conversation quality |
| **AlpacaEval** | 78.5% | Win rate vs. text-davinci-003 |
| **HumanEval** | 42.3% | Python code generation (pass@1) |
| **GSM8K** | 35.7% | Math word problems |
| **TruthfulQA** | 51.2% | Truthfulness in answers |
| **MMLU** | 48.9% | Multi-task language understanding |
## Capabilities
### Advanced Features
- **Function Calling**: Supports structured function/tool calling
- **Code Execution**: Can generate and explain code across multiple languages
- **Multi-turn Context**: Maintains conversation context up to 4096 tokens
- **Streaming Support**: Compatible with streaming inference
- **Batch Processing**: Efficient batch generation support
- **Custom System Prompts**: Flexible system message configuration
## Limitations
### Known Limitations
1. **Knowledge Cutoff:** Training data up to April 2023
2. **Hallucinations:** May generate plausible but incorrect information
3. **Context Limitations:** 4096 token context window
4. **Math Reasoning:** Struggles with complex multi-step calculations
5. **Multilingual:** Primarily English, limited other languages
6. **Temporal Reasoning:** May not accurately understand time-sensitive queries
7. **Factual Accuracy:** Not suitable as sole source of truth
### Bias and Fairness
The model may exhibit biases present in the training data. We've implemented:
- Bias evaluation across demographic groups
- Regular fairness audits
- User feedback integration
- Ongoing bias mitigation efforts
## Responsible Use
Users should:
- Verify critical information from authoritative sources
- Implement appropriate safeguards for production use
- Monitor outputs for accuracy and appropriateness
- Comply with applicable laws and regulations
- Provide proper attribution for AI-generated content
## Citation
```bibtex
@misc{helion-v1.5-2024,
author = {DeepXR},
title = {Helion-V1.5: Enhanced Conversational AI},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/DeepXR/Helion-V1.5}
}
```
---
**Model Version:** 1.5.0 | **Release:** December 2025