--- license: apache-2.0 base_model: meta-llama/Llama-2-7b-hf tags: - text-generation - conversational - llama-2 - autotrain_compatible - function-calling language: - en pipeline_tag: text-generation library_name: transformers model-index: - name: Helion-V1.5 results: - task: type: text-generation name: Text Generation dataset: name: MT-Bench type: mt-bench metrics: - type: score value: 7.2 name: MT-Bench Score - task: type: text-generation name: Conversational dataset: name: AlpacaEval type: alpaca-eval metrics: - type: win_rate value: 78.5 name: Win Rate % - task: type: text-generation name: Code Generation dataset: name: HumanEval type: humaneval metrics: - type: pass@1 value: 42.3 name: Pass@1 widget: - text: "Explain the difference between machine learning and deep learning" example_title: "Technical Explanation" - text: "Write a Python function to calculate fibonacci numbers" example_title: "Code Generation" ---
Helion-V1 Logo
--- # Helion-V1.5 **Helion-V1.5** is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities. ## Model Details **Architecture:** Llama-2-7B with LoRA adapters **Parameters:** 7 billion (base) + 67M (LoRA) **Context Length:** 4096 tokens **Training:** QLoRA (4-bit) fine-tuning on high-quality instruction data **License:** Apache 2.0 ### Key Improvements over Helion-V1 | Feature | Helion-V1 | Helion-V1.5 | Improvement | |---------|-----------|-------------|-------------| | **MT-Bench Score** | 6.8 | 7.2 | +5.9% | | **AlpacaEval Win Rate** | 72.3% | 78.5% | +8.6% | | **HumanEval Pass@1** | 38.1% | 42.3% | +11.0% | | **Avg Response Time** | 2.3s | 1.8s | -21.7% | | **Function Calling** | ❌ | ✅ | New | | **Streaming Support** | Basic | Full | Enhanced | ### Technical Specifications | Component | Value | |-----------|-------| | Hidden Size | 4096 | | Layers | 32 | | Attention Heads | 32 | | Intermediate Size | 11008 | | Vocabulary | 32000 tokens | | Position Encoding | RoPE | | Precision | bfloat16 | **LoRA Configuration:** - Rank: 64 - Alpha: 128 - Target Modules: All linear layers (q,k,v,o,gate,up,down) - Dropout: 0.05 ## Performance Benchmarks | Benchmark | Score | Category | |-----------|-------|----------| | MT-Bench | 7.2/10 | Multi-turn conversation | | AlpacaEval | 78.5% | Instruction following | | HumanEval | 42.3% | Code generation | | GSM8K | 35.7% | Mathematical reasoning | | TruthfulQA | 51.2% | Factual accuracy | | MMLU | 48.9% | Knowledge | ## How to Use ### Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "DeepXR/Helion-V1.5" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) # Prepare messages messages = [ {"role": "user", "content": "Explain machine learning in simple terms"} ] # Apply chat template input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) # Generate response output = model.generate( input_ids, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True ) response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True) print(response) ``` ### Using with Text Generation Inference (TGI) ```bash docker run --gpus all --shm-size 1g -p 8080:80 \ ghcr.io/huggingface/text-generation-inference:latest \ --model-id DeepXR/Helion-V1.5 \ --max-input-length 3584 \ --max-total-tokens 4096 ``` ### Using with vLLM ```python from vllm import LLM, SamplingParams llm = LLM(model="DeepXR/Helion-V1.5") sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512) prompts = ["Explain quantum computing"] outputs = llm.generate(prompts, sampling_params) for output in outputs: print(output.outputs[0].text) ``` ### Using with LangChain ```python from langchain.llms import HuggingFacePipeline from transformers import pipeline pipe = pipeline( "text-generation", model="DeepXR/Helion-V1.5", max_new_tokens=512 ) llm = HuggingFacePipeline(pipeline=pipe) response = llm("What is artificial intelligence?") ``` ## Training Data ### Dataset Composition The model was trained on a curated dataset including: - **Conversational Data** (40%): Multi-turn dialogues focusing on helpfulness - **Instruction Following** (30%): Task completion and instruction adherence - **Safety Examples** (15%): Refusal training for harmful requests - **Domain-Specific** (15%): Programming, writing, analysis tasks **Total Training Examples:** ~50,000 **Data Quality:** High-quality, manually filtered and safety-checked ### Data Processing - Deduplication using MinHash - Safety filtering for harmful content - Quality scoring and filtering (score > 0.7) - Format standardization to chat template - Context length trimming (max 4096 tokens) ## Evaluation ### Benchmark Results | Benchmark | Score | Description | |-----------|-------|-------------| | **MT-Bench** | 7.2/10 | Multi-turn conversation quality | | **AlpacaEval** | 78.5% | Win rate vs. text-davinci-003 | | **HumanEval** | 42.3% | Python code generation (pass@1) | | **GSM8K** | 35.7% | Math word problems | | **TruthfulQA** | 51.2% | Truthfulness in answers | | **MMLU** | 48.9% | Multi-task language understanding | ## Capabilities ### Advanced Features - **Function Calling**: Supports structured function/tool calling - **Code Execution**: Can generate and explain code across multiple languages - **Multi-turn Context**: Maintains conversation context up to 4096 tokens - **Streaming Support**: Compatible with streaming inference - **Batch Processing**: Efficient batch generation support - **Custom System Prompts**: Flexible system message configuration ## Limitations ### Known Limitations 1. **Knowledge Cutoff:** Training data up to April 2023 2. **Hallucinations:** May generate plausible but incorrect information 3. **Context Limitations:** 4096 token context window 4. **Math Reasoning:** Struggles with complex multi-step calculations 5. **Multilingual:** Primarily English, limited other languages 6. **Temporal Reasoning:** May not accurately understand time-sensitive queries 7. **Factual Accuracy:** Not suitable as sole source of truth ### Bias and Fairness The model may exhibit biases present in the training data. We've implemented: - Bias evaluation across demographic groups - Regular fairness audits - User feedback integration - Ongoing bias mitigation efforts ## Responsible Use Users should: - Verify critical information from authoritative sources - Implement appropriate safeguards for production use - Monitor outputs for accuracy and appropriateness - Comply with applicable laws and regulations - Provide proper attribution for AI-generated content ## Citation ```bibtex @misc{helion-v1.5-2024, author = {DeepXR}, title = {Helion-V1.5: Enhanced Conversational AI}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/DeepXR/Helion-V1.5} } ``` --- **Model Version:** 1.5.0 | **Release:** December 2025