File size: 7,553 Bytes

---
license: apache-2.0
base_model: meta-llama/Llama-2-7b-hf
tags:
- text-generation
- conversational
- llama-2
- autotrain_compatible
- function-calling
language:
- en
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: Helion-V1.5
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MT-Bench
      type: mt-bench
    metrics:
    - type: score
      value: 7.2
      name: MT-Bench Score
  - task:
      type: text-generation
      name: Conversational
    dataset:
      name: AlpacaEval
      type: alpaca-eval
    metrics:
    - type: win_rate
      value: 78.5
      name: Win Rate %
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: HumanEval
      type: humaneval
    metrics:
    - type: pass@1
      value: 42.3
      name: Pass@1
widget:
- text: "Explain the difference between machine learning and deep learning"
  example_title: "Technical Explanation"
- text: "Write a Python function to calculate fibonacci numbers"
  example_title: "Code Generation"
---

<div align="center">

  <img src="https://imgur.com/aUIJXf7.png" alt="Helion-V1 Logo" width="100%"/>

</div>

---

# Helion-V1.5

**Helion-V1.5** is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.

## Model Details

**Architecture:** Llama-2-7B with LoRA adapters  
**Parameters:** 7 billion (base) + 67M (LoRA)  
**Context Length:** 4096 tokens  
**Training:** QLoRA (4-bit) fine-tuning on high-quality instruction data  
**License:** Apache 2.0

### Key Improvements over Helion-V1

| Feature | Helion-V1 | Helion-V1.5 | Improvement |
|---------|-----------|-------------|-------------|
| **MT-Bench Score** | 6.8 | 7.2 | +5.9% |
| **AlpacaEval Win Rate** | 72.3% | 78.5% | +8.6% |
| **HumanEval Pass@1** | 38.1% | 42.3% | +11.0% |
| **Avg Response Time** | 2.3s | 1.8s | -21.7% |
| **Function Calling** | ❌ | ✅ | New |
| **Streaming Support** | Basic | Full | Enhanced |

### Technical Specifications

| Component | Value |
|-----------|-------|
| Hidden Size | 4096 |
| Layers | 32 |
| Attention Heads | 32 |
| Intermediate Size | 11008 |
| Vocabulary | 32000 tokens |
| Position Encoding | RoPE |
| Precision | bfloat16 |

**LoRA Configuration:**
- Rank: 64
- Alpha: 128  
- Target Modules: All linear layers (q,k,v,o,gate,up,down)
- Dropout: 0.05

## Performance Benchmarks

| Benchmark | Score | Category |
|-----------|-------|----------|
| MT-Bench | 7.2/10 | Multi-turn conversation |
| AlpacaEval | 78.5% | Instruction following |
| HumanEval | 42.3% | Code generation |
| GSM8K | 35.7% | Mathematical reasoning |
| TruthfulQA | 51.2% | Factual accuracy |
| MMLU | 48.9% | Knowledge |  

## How to Use

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "DeepXR/Helion-V1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare messages
messages = [
    {"role": "user", "content": "Explain machine learning in simple terms"}
]

# Apply chat template
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

# Generate response
output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

### Using with Text Generation Inference (TGI)

```bash
docker run --gpus all --shm-size 1g -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id DeepXR/Helion-V1.5 \
  --max-input-length 3584 \
  --max-total-tokens 4096
```

### Using with vLLM

```python
from vllm import LLM, SamplingParams

llm = LLM(model="DeepXR/Helion-V1.5")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

prompts = ["Explain quantum computing"]
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)
```

### Using with LangChain

```python
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="DeepXR/Helion-V1.5",
    max_new_tokens=512
)

llm = HuggingFacePipeline(pipeline=pipe)
response = llm("What is artificial intelligence?")
```

## Training Data

### Dataset Composition

The model was trained on a curated dataset including:

- **Conversational Data** (40%): Multi-turn dialogues focusing on helpfulness
- **Instruction Following** (30%): Task completion and instruction adherence
- **Safety Examples** (15%): Refusal training for harmful requests
- **Domain-Specific** (15%): Programming, writing, analysis tasks

**Total Training Examples:** ~50,000  
**Data Quality:** High-quality, manually filtered and safety-checked

### Data Processing

- Deduplication using MinHash
- Safety filtering for harmful content
- Quality scoring and filtering (score > 0.7)
- Format standardization to chat template
- Context length trimming (max 4096 tokens)

## Evaluation

### Benchmark Results

| Benchmark | Score | Description |
|-----------|-------|-------------|
| **MT-Bench** | 7.2/10 | Multi-turn conversation quality |
| **AlpacaEval** | 78.5% | Win rate vs. text-davinci-003 |
| **HumanEval** | 42.3% | Python code generation (pass@1) |
| **GSM8K** | 35.7% | Math word problems |
| **TruthfulQA** | 51.2% | Truthfulness in answers |
| **MMLU** | 48.9% | Multi-task language understanding |

## Capabilities

### Advanced Features

- **Function Calling**: Supports structured function/tool calling
- **Code Execution**: Can generate and explain code across multiple languages
- **Multi-turn Context**: Maintains conversation context up to 4096 tokens
- **Streaming Support**: Compatible with streaming inference
- **Batch Processing**: Efficient batch generation support
- **Custom System Prompts**: Flexible system message configuration  

## Limitations

### Known Limitations

1. **Knowledge Cutoff:** Training data up to April 2023
2. **Hallucinations:** May generate plausible but incorrect information
3. **Context Limitations:** 4096 token context window
4. **Math Reasoning:** Struggles with complex multi-step calculations
5. **Multilingual:** Primarily English, limited other languages
6. **Temporal Reasoning:** May not accurately understand time-sensitive queries
7. **Factual Accuracy:** Not suitable as sole source of truth

### Bias and Fairness

The model may exhibit biases present in the training data. We've implemented:
- Bias evaluation across demographic groups
- Regular fairness audits
- User feedback integration
- Ongoing bias mitigation efforts

## Responsible Use

Users should:
- Verify critical information from authoritative sources
- Implement appropriate safeguards for production use
- Monitor outputs for accuracy and appropriateness
- Comply with applicable laws and regulations
- Provide proper attribution for AI-generated content

## Citation

```bibtex
@misc{helion-v1.5-2024,
  author = {DeepXR},
  title = {Helion-V1.5: Enhanced Conversational AI},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/DeepXR/Helion-V1.5}
}
```
---

**Model Version:** 1.5.0 | **Release:** December 2025