|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: meta-llama/Llama-2-7b-hf |
|
|
tags: |
|
|
- text-generation |
|
|
- conversational |
|
|
- llama-2 |
|
|
- autotrain_compatible |
|
|
- function-calling |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
model-index: |
|
|
- name: Helion-V1.5 |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: MT-Bench |
|
|
type: mt-bench |
|
|
metrics: |
|
|
- type: score |
|
|
value: 7.2 |
|
|
name: MT-Bench Score |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Conversational |
|
|
dataset: |
|
|
name: AlpacaEval |
|
|
type: alpaca-eval |
|
|
metrics: |
|
|
- type: win_rate |
|
|
value: 78.5 |
|
|
name: Win Rate % |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Code Generation |
|
|
dataset: |
|
|
name: HumanEval |
|
|
type: humaneval |
|
|
metrics: |
|
|
- type: pass@1 |
|
|
value: 42.3 |
|
|
name: Pass@1 |
|
|
widget: |
|
|
- text: "Explain the difference between machine learning and deep learning" |
|
|
example_title: "Technical Explanation" |
|
|
- text: "Write a Python function to calculate fibonacci numbers" |
|
|
example_title: "Code Generation" |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<img src="https://imgur.com/aUIJXf7.png" alt="Helion-V1 Logo" width="100%"/> |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
# Helion-V1.5 |
|
|
|
|
|
**Helion-V1.5** is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
**Architecture:** Llama-2-7B with LoRA adapters |
|
|
**Parameters:** 7 billion (base) + 67M (LoRA) |
|
|
**Context Length:** 4096 tokens |
|
|
**Training:** QLoRA (4-bit) fine-tuning on high-quality instruction data |
|
|
**License:** Apache 2.0 |
|
|
|
|
|
### Key Improvements over Helion-V1 |
|
|
|
|
|
| Feature | Helion-V1 | Helion-V1.5 | Improvement | |
|
|
|---------|-----------|-------------|-------------| |
|
|
| **MT-Bench Score** | 6.8 | 7.2 | +5.9% | |
|
|
| **AlpacaEval Win Rate** | 72.3% | 78.5% | +8.6% | |
|
|
| **HumanEval Pass@1** | 38.1% | 42.3% | +11.0% | |
|
|
| **Avg Response Time** | 2.3s | 1.8s | -21.7% | |
|
|
| **Function Calling** | ❌ | ✅ | New | |
|
|
| **Streaming Support** | Basic | Full | Enhanced | |
|
|
|
|
|
### Technical Specifications |
|
|
|
|
|
| Component | Value | |
|
|
|-----------|-------| |
|
|
| Hidden Size | 4096 | |
|
|
| Layers | 32 | |
|
|
| Attention Heads | 32 | |
|
|
| Intermediate Size | 11008 | |
|
|
| Vocabulary | 32000 tokens | |
|
|
| Position Encoding | RoPE | |
|
|
| Precision | bfloat16 | |
|
|
|
|
|
**LoRA Configuration:** |
|
|
- Rank: 64 |
|
|
- Alpha: 128 |
|
|
- Target Modules: All linear layers (q,k,v,o,gate,up,down) |
|
|
- Dropout: 0.05 |
|
|
|
|
|
## Performance Benchmarks |
|
|
|
|
|
| Benchmark | Score | Category | |
|
|
|-----------|-------|----------| |
|
|
| MT-Bench | 7.2/10 | Multi-turn conversation | |
|
|
| AlpacaEval | 78.5% | Instruction following | |
|
|
| HumanEval | 42.3% | Code generation | |
|
|
| GSM8K | 35.7% | Mathematical reasoning | |
|
|
| TruthfulQA | 51.2% | Factual accuracy | |
|
|
| MMLU | 48.9% | Knowledge | |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "DeepXR/Helion-V1.5" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Prepare messages |
|
|
messages = [ |
|
|
{"role": "user", "content": "Explain machine learning in simple terms"} |
|
|
] |
|
|
|
|
|
# Apply chat template |
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
# Generate response |
|
|
output = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens=512, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Using with Text Generation Inference (TGI) |
|
|
|
|
|
```bash |
|
|
docker run --gpus all --shm-size 1g -p 8080:80 \ |
|
|
ghcr.io/huggingface/text-generation-inference:latest \ |
|
|
--model-id DeepXR/Helion-V1.5 \ |
|
|
--max-input-length 3584 \ |
|
|
--max-total-tokens 4096 |
|
|
``` |
|
|
|
|
|
### Using with vLLM |
|
|
|
|
|
```python |
|
|
from vllm import LLM, SamplingParams |
|
|
|
|
|
llm = LLM(model="DeepXR/Helion-V1.5") |
|
|
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512) |
|
|
|
|
|
prompts = ["Explain quantum computing"] |
|
|
outputs = llm.generate(prompts, sampling_params) |
|
|
|
|
|
for output in outputs: |
|
|
print(output.outputs[0].text) |
|
|
``` |
|
|
|
|
|
### Using with LangChain |
|
|
|
|
|
```python |
|
|
from langchain.llms import HuggingFacePipeline |
|
|
from transformers import pipeline |
|
|
|
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model="DeepXR/Helion-V1.5", |
|
|
max_new_tokens=512 |
|
|
) |
|
|
|
|
|
llm = HuggingFacePipeline(pipeline=pipe) |
|
|
response = llm("What is artificial intelligence?") |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
### Dataset Composition |
|
|
|
|
|
The model was trained on a curated dataset including: |
|
|
|
|
|
- **Conversational Data** (40%): Multi-turn dialogues focusing on helpfulness |
|
|
- **Instruction Following** (30%): Task completion and instruction adherence |
|
|
- **Safety Examples** (15%): Refusal training for harmful requests |
|
|
- **Domain-Specific** (15%): Programming, writing, analysis tasks |
|
|
|
|
|
**Total Training Examples:** ~50,000 |
|
|
**Data Quality:** High-quality, manually filtered and safety-checked |
|
|
|
|
|
### Data Processing |
|
|
|
|
|
- Deduplication using MinHash |
|
|
- Safety filtering for harmful content |
|
|
- Quality scoring and filtering (score > 0.7) |
|
|
- Format standardization to chat template |
|
|
- Context length trimming (max 4096 tokens) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Benchmark Results |
|
|
|
|
|
| Benchmark | Score | Description | |
|
|
|-----------|-------|-------------| |
|
|
| **MT-Bench** | 7.2/10 | Multi-turn conversation quality | |
|
|
| **AlpacaEval** | 78.5% | Win rate vs. text-davinci-003 | |
|
|
| **HumanEval** | 42.3% | Python code generation (pass@1) | |
|
|
| **GSM8K** | 35.7% | Math word problems | |
|
|
| **TruthfulQA** | 51.2% | Truthfulness in answers | |
|
|
| **MMLU** | 48.9% | Multi-task language understanding | |
|
|
|
|
|
## Capabilities |
|
|
|
|
|
### Advanced Features |
|
|
|
|
|
- **Function Calling**: Supports structured function/tool calling |
|
|
- **Code Execution**: Can generate and explain code across multiple languages |
|
|
- **Multi-turn Context**: Maintains conversation context up to 4096 tokens |
|
|
- **Streaming Support**: Compatible with streaming inference |
|
|
- **Batch Processing**: Efficient batch generation support |
|
|
- **Custom System Prompts**: Flexible system message configuration |
|
|
|
|
|
## Limitations |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
1. **Knowledge Cutoff:** Training data up to April 2023 |
|
|
2. **Hallucinations:** May generate plausible but incorrect information |
|
|
3. **Context Limitations:** 4096 token context window |
|
|
4. **Math Reasoning:** Struggles with complex multi-step calculations |
|
|
5. **Multilingual:** Primarily English, limited other languages |
|
|
6. **Temporal Reasoning:** May not accurately understand time-sensitive queries |
|
|
7. **Factual Accuracy:** Not suitable as sole source of truth |
|
|
|
|
|
### Bias and Fairness |
|
|
|
|
|
The model may exhibit biases present in the training data. We've implemented: |
|
|
- Bias evaluation across demographic groups |
|
|
- Regular fairness audits |
|
|
- User feedback integration |
|
|
- Ongoing bias mitigation efforts |
|
|
|
|
|
## Responsible Use |
|
|
|
|
|
Users should: |
|
|
- Verify critical information from authoritative sources |
|
|
- Implement appropriate safeguards for production use |
|
|
- Monitor outputs for accuracy and appropriateness |
|
|
- Comply with applicable laws and regulations |
|
|
- Provide proper attribution for AI-generated content |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{helion-v1.5-2024, |
|
|
author = {DeepXR}, |
|
|
title = {Helion-V1.5: Enhanced Conversational AI}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/DeepXR/Helion-V1.5} |
|
|
} |
|
|
``` |
|
|
--- |
|
|
|
|
|
**Model Version:** 1.5.0 | **Release:** December 2025 |