Helion-V1.5 / README.md

Update README.md

26ee1fc verified 2 months ago

7.55 kB

	---
	license: apache-2.0
	base_model: meta-llama/Llama-2-7b-hf
	tags:
	- text-generation
	- conversational
	- llama-2
	- autotrain_compatible
	- function-calling
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	model-index:
	- name: Helion-V1.5
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MT-Bench
	type: mt-bench
	metrics:
	- type: score
	value: 7.2
	name: MT-Bench Score
	- task:
	type: text-generation
	name: Conversational
	dataset:
	name: AlpacaEval
	type: alpaca-eval
	metrics:
	- type: win_rate
	value: 78.5
	name: Win Rate %
	- task:
	type: text-generation
	name: Code Generation
	dataset:
	name: HumanEval
	type: humaneval
	metrics:
	- type: pass@1
	value: 42.3
	name: Pass@1
	widget:
	- text: "Explain the difference between machine learning and deep learning"
	example_title: "Technical Explanation"
	- text: "Write a Python function to calculate fibonacci numbers"
	example_title: "Code Generation"
	---

	<div align="center">

	<img src="https://imgur.com/aUIJXf7.png" alt="Helion-V1 Logo" width="100%"/>

	</div>

	---

	# Helion-V1.5

	Helion-V1.5 is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.

	## Model Details

	Architecture: Llama-2-7B with LoRA adapters
	Parameters: 7 billion (base) + 67M (LoRA)
	Context Length: 4096 tokens
	Training: QLoRA (4-bit) fine-tuning on high-quality instruction data
	License: Apache 2.0

	### Key Improvements over Helion-V1

	\| Feature \| Helion-V1 \| Helion-V1.5 \| Improvement \|
	\|---------\|-----------\|-------------\|-------------\|
	\| MT-Bench Score \| 6.8 \| 7.2 \| +5.9% \|
	\| AlpacaEval Win Rate \| 72.3% \| 78.5% \| +8.6% \|
	\| HumanEval Pass@1 \| 38.1% \| 42.3% \| +11.0% \|
	\| Avg Response Time \| 2.3s \| 1.8s \| -21.7% \|
	\| Function Calling \| ❌ \| ✅ \| New \|
	\| Streaming Support \| Basic \| Full \| Enhanced \|

	### Technical Specifications

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Hidden Size \| 4096 \|
	\| Layers \| 32 \|
	\| Attention Heads \| 32 \|
	\| Intermediate Size \| 11008 \|
	\| Vocabulary \| 32000 tokens \|
	\| Position Encoding \| RoPE \|
	\| Precision \| bfloat16 \|

	LoRA Configuration:
	- Rank: 64
	- Alpha: 128
	- Target Modules: All linear layers (q,k,v,o,gate,up,down)
	- Dropout: 0.05

	## Performance Benchmarks

	\| Benchmark \| Score \| Category \|
	\|-----------\|-------\|----------\|
	\| MT-Bench \| 7.2/10 \| Multi-turn conversation \|
	\| AlpacaEval \| 78.5% \| Instruction following \|
	\| HumanEval \| 42.3% \| Code generation \|
	\| GSM8K \| 35.7% \| Mathematical reasoning \|
	\| TruthfulQA \| 51.2% \| Factual accuracy \|
	\| MMLU \| 48.9% \| Knowledge \|

	## How to Use

	### Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and tokenizer
	model_name = "DeepXR/Helion-V1.5"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Prepare messages
	messages = [
	{"role": "user", "content": "Explain machine learning in simple terms"}
	]

	# Apply chat template
	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	# Generate response
	output = model.generate(
	input_ids,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Using with Text Generation Inference (TGI)

	```bash
	docker run --gpus all --shm-size 1g -p 8080:80 \
	ghcr.io/huggingface/text-generation-inference:latest \
	--model-id DeepXR/Helion-V1.5 \
	--max-input-length 3584 \
	--max-total-tokens 4096
	```

	### Using with vLLM

	```python
	from vllm import LLM, SamplingParams

	llm = LLM(model="DeepXR/Helion-V1.5")
	sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

	prompts = ["Explain quantum computing"]
	outputs = llm.generate(prompts, sampling_params)

	for output in outputs:
	print(output.outputs[0].text)
	```

	### Using with LangChain

	```python
	from langchain.llms import HuggingFacePipeline
	from transformers import pipeline

	pipe = pipeline(
	"text-generation",
	model="DeepXR/Helion-V1.5",
	max_new_tokens=512
	)

	llm = HuggingFacePipeline(pipeline=pipe)
	response = llm("What is artificial intelligence?")
	```

	## Training Data

	### Dataset Composition

	The model was trained on a curated dataset including:

	- Conversational Data (40%): Multi-turn dialogues focusing on helpfulness
	- Instruction Following (30%): Task completion and instruction adherence
	- Safety Examples (15%): Refusal training for harmful requests
	- Domain-Specific (15%): Programming, writing, analysis tasks

	Total Training Examples: ~50,000
	Data Quality: High-quality, manually filtered and safety-checked

	### Data Processing

	- Deduplication using MinHash
	- Safety filtering for harmful content
	- Quality scoring and filtering (score > 0.7)
	- Format standardization to chat template
	- Context length trimming (max 4096 tokens)

	## Evaluation

	### Benchmark Results

	\| Benchmark \| Score \| Description \|
	\|-----------\|-------\|-------------\|
	\| MT-Bench \| 7.2/10 \| Multi-turn conversation quality \|
	\| AlpacaEval \| 78.5% \| Win rate vs. text-davinci-003 \|
	\| HumanEval \| 42.3% \| Python code generation (pass@1) \|
	\| GSM8K \| 35.7% \| Math word problems \|
	\| TruthfulQA \| 51.2% \| Truthfulness in answers \|
	\| MMLU \| 48.9% \| Multi-task language understanding \|

	## Capabilities

	### Advanced Features

	- Function Calling: Supports structured function/tool calling
	- Code Execution: Can generate and explain code across multiple languages
	- Multi-turn Context: Maintains conversation context up to 4096 tokens
	- Streaming Support: Compatible with streaming inference
	- Batch Processing: Efficient batch generation support
	- Custom System Prompts: Flexible system message configuration

	## Limitations

	### Known Limitations

	1. Knowledge Cutoff: Training data up to April 2023
	2. Hallucinations: May generate plausible but incorrect information
	3. Context Limitations: 4096 token context window
	4. Math Reasoning: Struggles with complex multi-step calculations
	5. Multilingual: Primarily English, limited other languages
	6. Temporal Reasoning: May not accurately understand time-sensitive queries
	7. Factual Accuracy: Not suitable as sole source of truth

	### Bias and Fairness

	The model may exhibit biases present in the training data. We've implemented:
	- Bias evaluation across demographic groups
	- Regular fairness audits
	- User feedback integration
	- Ongoing bias mitigation efforts

	## Responsible Use

	Users should:
	- Verify critical information from authoritative sources
	- Implement appropriate safeguards for production use
	- Monitor outputs for accuracy and appropriateness
	- Comply with applicable laws and regulations
	- Provide proper attribution for AI-generated content

	## Citation

	```bibtex
	@misc{helion-v1.5-2024,
	author = {DeepXR},
	title = {Helion-V1.5: Enhanced Conversational AI},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/DeepXR/Helion-V1.5}
	}
	```
	---

	Model Version: 1.5.0 \| Release: December 2025