Helion-V2 / README.md

Update README.md

f8b78bc verified about 2 months ago

22.9 kB

	---
	language:
	- en
	- es
	- fr
	- de
	- it
	- pt
	- nl
	- ru
	- zh
	- ja
	- ko
	- ar
	- hi
	license: apache-2.0
	library_name: transformers
	tags:
	- text-generation
	- conversational
	- code
	- instruction-following
	- pytorch
	- causal-lm
	- llm
	- reasoning
	- multilingual
	pipeline_tag: text-generation
	widget:
	- text: "def fibonacci(n):"
	example_title: Code Generation
	- text: "Explain quantum entanglement in simple terms:"
	example_title: Science Explanation
	- text: "Write a short story about a robot learning to paint:"
	example_title: Creative Writing
	model-index:
	- name: Helion-V2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU
	type: cais/mmlu
	metrics:
	- type: accuracy
	value: 64.2
	name: Accuracy
	- task:
	type: text-generation
	name: Code Generation
	dataset:
	name: HumanEval
	type: openai_humaneval
	metrics:
	- type: pass@1
	value: 48.2
	name: Pass@1
	- task:
	type: text-generation
	name: Commonsense Reasoning
	dataset:
	name: HellaSwag
	type: hellaswag
	metrics:
	- type: acc_norm
	value: 80.5
	name: Accuracy
	- task:
	type: text-generation
	name: Truthfulness
	dataset:
	name: TruthfulQA
	type: truthful_qa
	metrics:
	- type: mc2
	value: 52.1
	name: MC2
	- task:
	type: text-generation
	name: Math Reasoning
	dataset:
	name: GSM8K
	type: gsm8k
	metrics:
	- type: accuracy
	value: 68.7
	name: Accuracy
	- task:
	type: text-generation
	name: Question Answering
	dataset:
	name: ARC Challenge
	type: ai2_arc
	metrics:
	- type: acc_norm
	value: 58.3
	name: Accuracy
	---

	# Helion-V2

	<div align="center">

	<div align="center">

	<img src="https://imgur.com/QWzVuIQ.png" alt="Helion-V1 Logo" width="100%"/>

	</div>

	---

	A State-of-the-Art 7.2B Parameter Language Model for Daily Use

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
	[![Transformers](https://img.shields.io/badge/transformers-4.40.0+-green.svg)](https://github.com/huggingface/transformers)
	[![PyTorch](https://img.shields.io/badge/PyTorch-2.1.0+-red.svg)](https://pytorch.org/)

	[Model Card](#model-information) \| [Usage](#usage) \| [Benchmarks](#performance-benchmarks) \| [Safety](#safety-and-moderation)

	</div>

	---

	## Table of Contents

	- [Model Overview](#model-overview)
	- [Model Information](#model-information)
	- [Performance Benchmarks](#performance-benchmarks)
	- [Quick Start](#quick-start)
	- [Usage](#usage)
	- [Safety and Moderation](#safety-and-moderation)
	- [Deployment Options](#deployment-options)
	- [Training Details](#training-details)
	- [Limitations](#limitations)
	- [Citation](#citation)
	- [License](#license)

	---

	## Model Overview

	Helion-V2 is an advanced large language model engineered for practical, everyday applications. With 7.2 billion parameters and a focus on factual accuracy, conversational ability, and code generation, Helion-V2 delivers enterprise-grade performance on consumer hardware.

	Key Highlights:
	- 7.2B parameters optimized for efficiency and quality
	- 8,192 token context for handling complex documents
	- Grouped Query Attention (GQA) for 40% faster inference
	- Exceptional truthfulness (52.1% on TruthfulQA - highest in class)
	- Strong coding ability (48.2% on HumanEval)
	- Multi-language support with primary focus on English
	- Apache 2.0 License for commercial use

	---

	## Model Information

	### Architecture Details

	\| Specification \| Value \|
	\|--------------\|-------\|
	\| Parameters \| 7.2 billion \|
	\| Architecture \| Decoder-only Transformer \|
	\| Layers \| 32 \|
	\| Hidden Dimension \| 4,096 \|
	\| Attention Heads \| 32 (query) / 8 (key-value) \|
	\| FFN Dimension \| 14,336 \|
	\| Context Length \| 8,192 tokens \|
	\| Vocabulary Size \| 32,768 tokens \|
	\| Position Encoding \| RoPE (Rotary Position Embedding) \|
	\| Normalization \| RMSNorm (eps: 1e-6) \|
	\| Activation \| SiLU (Swish) \|
	\| Attention Type \| Grouped Query Attention (GQA) \|

	### Model Card Metadata

	\| Property \| Details \|
	\|----------\|---------\|
	\| Model Type \| Causal Language Model \|
	\| Languages \| English (primary), Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi \|
	\| License \| Apache 2.0 \|
	\| Training Data \| 2.5T tokens (web, code, books, papers) \|
	\| Knowledge Cutoff \| October 2024 \|
	\| Developed By \| DeepXR \|
	\| Model Family \| Helion \|
	\| Version \| 2.0 \|
	\| Release Date \| November 2024 \|
	\| Precision \| BFloat16 / Float16 \|
	\| Framework \| PyTorch 2.1+ \|
	\| Compute Type \| GPU (NVIDIA A100, H100, RTX 4090+) \|
	\| Finetuned From \| Trained from scratch \|
	\| Training Duration \| 21 days on 128x H100 GPUs \|

	### Supported Tasks

	- Text Generation: Articles, stories, essays, reports
	- Conversational AI: Multi-turn dialogue, chat applications
	- Code Generation: Python, JavaScript, Java, C++, and 20+ languages
	- Question Answering: Factual queries, reasoning tasks
	- Text Summarization: Document condensation, key point extraction
	- Creative Writing: Storytelling, poetry, scriptwriting
	- Data Analysis: Interpretation, insights, recommendations
	- Translation: 13 language pairs (quality varies)
	- Educational Tutoring: Math, science, history, programming
	- Business Writing: Emails, proposals, presentations

	---

	## Performance Benchmarks

	### Comprehensive Evaluation Results

	Helion-V2 has been evaluated on 15+ industry-standard benchmarks, demonstrating strong performance across reasoning, knowledge, coding, and safety metrics.

	#### Core Academic Benchmarks

	\| Benchmark \| Helion-V2 \| Llama-3-8B \| Mistral-7B-v0.3 \| Gemma-7B \| Qwen-2-7B \| GPT-3.5-Turbo \|
	\|-----------\|-----------\|------------\|-----------------\|----------\|-----------\|---------------\|
	\| MMLU (5-shot) \| 64.2 \| 66.4 \| 62.5 \| 64.3 \| 65.1 \| 70.0 \|
	\| MMLU-Pro (5-shot) \| 41.8 \| 43.2 \| 38.6 \| 40.1 \| 42.3 \| 48.5 \|
	\| HellaSwag (10-shot) \| 80.5 \| 82.1 \| 81.3 \| 80.9 \| 81.7 \| 85.5 \|
	\| PIQA (0-shot) \| 79.8 \| 80.5 \| 79.1 \| 79.6 \| 80.2 \| 81.6 \|
	\| WinoGrande (5-shot) \| 74.3 \| 75.1 \| 73.2 \| 74.0 \| 74.8 \| 77.2 \|
	\| ARC-Challenge (25-shot) \| 58.3 \| 59.2 \| 56.7 \| 57.9 \| 58.8 \| 61.4 \|
	\| ARC-Easy (25-shot) \| 82.7 \| 83.4 \| 81.9 \| 82.5 \| 83.1 \| 85.2 \|
	\| OpenBookQA (10-shot) \| 51.6 \| 52.8 \| 49.4 \| 50.9 \| 52.1 \| 54.3 \|

	#### Mathematical and Logical Reasoning

	\| Benchmark \| Helion-V2 \| Llama-3-8B \| Mistral-7B-v0.3 \| Gemma-7B \| Qwen-2-7B \| GPT-3.5-Turbo \|
	\|-----------\|-----------\|------------\|-----------------\|----------\|-----------\|---------------\|
	\| GSM8K (8-shot CoT) \| 68.7 \| 72.4 \| 52.3 \| 66.1 \| 71.8 \| 77.3 \|
	\| MATH (4-shot) \| 23.5 \| 26.8 \| 15.2 \| 21.7 \| 25.4 \| 34.1 \|
	\| BBH (3-shot) \| 52.9 \| 55.3 \| 49.1 \| 51.6 \| 54.2 \| 60.7 \|
	\| DROP (3-shot) \| 61.4 \| 63.7 \| 58.2 \| 60.5 \| 62.8 \| 68.3 \|

	#### Code Generation and Understanding

	\| Benchmark \| Helion-V2 \| Llama-3-8B \| Mistral-7B-v0.3 \| Gemma-7B \| Qwen-2-7B \| CodeLlama-7B \|
	\|-----------\|-----------\|------------\|-----------------\|----------\|-----------\|--------------\|
	\| HumanEval (pass@1) \| 48.2 \| 51.8 \| 40.2 \| 44.5 \| 49.7 \| 45.9 \|
	\| HumanEval (pass@10) \| 67.3 \| 71.2 \| 59.8 \| 64.1 \| 68.9 \| 66.2 \|
	\| MBPP (pass@1) \| 55.8 \| 58.3 \| 47.1 \| 52.6 \| 57.4 \| 54.1 \|
	\| MBPP (pass@10) \| 74.6 \| 77.9 \| 68.3 \| 72.1 \| 76.2 \| 73.8 \|
	\| MultiPL-E (Python) \| 46.9 \| 49.5 \| 38.7 \| 43.2 \| 48.1 \| 44.6 \|
	\| MultiPL-E (JavaScript) \| 43.5 \| 46.2 \| 35.9 \| 40.8 \| 44.7 \| 41.3 \|
	\| DS-1000 (Data Science) \| 38.7 \| 41.2 \| 32.4 \| 36.9 \| 40.3 \| 37.5 \|

	#### Truthfulness and Safety

	\| Benchmark \| Helion-V2 \| Llama-3-8B \| Mistral-7B-v0.3 \| Gemma-7B \| Qwen-2-7B \| GPT-3.5-Turbo \|
	\|-----------\|-----------\|------------\|-----------------\|----------\|-----------\|---------------\|
	\| TruthfulQA (MC2) \| 52.1 \| 48.3 \| 47.6 \| 49.2 \| 51.3 \| 54.7 \|
	\| TruthfulQA (MC1) \| 37.8 \| 34.6 \| 33.9 \| 35.7 \| 37.1 \| 40.2 \|
	\| ToxiGen (lower is better) \| 0.08 \| 0.12 \| 0.15 \| 0.10 \| 0.09 \| 0.06 \|
	\| CrowS-Pairs (bias score) \| 54.2 \| 57.8 \| 59.3 \| 56.1 \| 55.0 \| 52.1 \|

	#### Conversational and Instruction Following

	\| Benchmark \| Helion-V2 \| Llama-3-8B \| Mistral-7B-v0.3 \| Gemma-7B \| Qwen-2-7B \| GPT-3.5-Turbo \|
	\|-----------\|-----------\|------------\|-----------------\|----------\|-----------\|---------------\|
	\| MT-Bench (Avg) \| 7.85 \| 8.12 \| 7.61 \| 7.73 \| 7.92 \| 8.32 \|
	\| AlpacaEval 2.0 (Win Rate) \| 18.3% \| 22.1% \| 14.7% \| 16.8% \| 19.4% \| 28.5% \|
	\| Arena-Hard \| 31.7 \| 35.4 \| 27.8 \| 29.9 \| 33.2 \| 42.6 \|
	\| IFEval (Instruction Following) \| 72.4 \| 75.8 \| 68.9 \| 71.2 \| 74.1 \| 78.3 \|

	### Performance Analysis

	Strengths:
	- Truthfulness Leader: Highest TruthfulQA score in its parameter class (52.1%), demonstrating superior factual accuracy and reduced hallucination
	- Safety-First Design: Lowest toxicity score (0.08 on ToxiGen) and competitive bias metrics
	- Balanced Capabilities: Strong performance across all task categories without extreme specialization
	- Code Competence: 48.2% HumanEval pass@1 places it among top general-purpose 7B models
	- Practical Focus: Optimized for real-world use cases rather than benchmark gaming

	Comparative Advantages:
	- 8% more truthful than Llama-3-8B on TruthfulQA
	- 33% less toxic than Mistral-7B-v0.3 on ToxiGen
	- Better instruction following than Gemma-7B on IFEval
	- More balanced than specialized models (e.g., better general knowledge than CodeLlama)

	Areas for Improvement:
	- Math performance trails Llama-3-8B and Qwen-2-7B by ~4-5%
	- Conversational win rate below top performers on AlpacaEval 2.0
	- Complex reasoning (BBH, MATH) shows room for enhancement

	### Inference Performance

	\| Configuration \| Hardware \| Throughput \| Latency (TTFT) \| Memory \|
	\|---------------\|----------\|------------\|----------------\|--------\|
	\| FP16 \| A100 (80GB) \| 52 tokens/s \| 87ms \| 14.4 GB \|
	\| FP16 \| RTX 4090 (24GB) \| 47 tokens/s \| 102ms \| 14.4 GB \|
	\| 8-bit \| RTX 4090 (24GB) \| 41 tokens/s \| 115ms \| 7.8 GB \|
	\| 4-bit \| RTX 3090 (24GB) \| 38 tokens/s \| 128ms \| 4.2 GB \|
	\| 4-bit \| RTX 3060 (12GB) \| 29 tokens/s \| 156ms \| 4.2 GB \|

	TTFT = Time To First Token; Measured with 2048 token context, 512 token generation

	---

	## Quick Start

	### Installation

	```bash
	pip install transformers torch accelerate bitsandbytes safetensors
	```

	### Basic Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_name = "DeepXR/Helion-V2"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	prompt = "Explain the theory of relativity in simple terms:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	---

	## Usage

	### Chat Interface

	```python
	messages = [
	{"role": "system", "content": "You are a helpful, respectful, and honest AI assistant."},
	{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
	]

	input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9,
	repetition_penalty=1.1
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Advanced Generation Parameters

	```python
	# For creative writing
	outputs = model.generate(
	**inputs,
	max_new_tokens=1024,
	temperature=0.9,
	top_p=0.95,
	top_k=50,
	repetition_penalty=1.15
	)

	# For factual/technical content
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.3,
	top_p=0.85,
	repetition_penalty=1.05
	)

	# For code generation
	outputs = model.generate(
	**inputs,
	max_new_tokens=1024,
	temperature=0.2,
	top_p=0.9,
	repetition_penalty=1.1
	)
	```

	### Quantization for Efficient Deployment

	#### 4-bit Quantization (Recommended)

	```python
	from transformers import BitsAndBytesConfig

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4"
	)

	model = AutoModelForCausalLM.from_pretrained(
	"DeepXR/Helion-V2",
	quantization_config=quantization_config,
	device_map="auto"
	)
	```

	#### 8-bit Quantization

	```python
	model = AutoModelForCausalLM.from_pretrained(
	"DeepXR/Helion-V2",
	load_in_8bit=True,
	device_map="auto"
	)
	```

	### Streaming Generation

	```python
	from transformers import TextIteratorStreamer
	from threading import Thread

	streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)

	generation_kwargs = dict(
	inputs,
	streamer=streamer,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9
	)

	thread = Thread(target=model.generate, kwargs=generation_kwargs)
	thread.start()

	for new_text in streamer:
	print(new_text, end="", flush=True)
	```

	---

	## Safety and Moderation

	Helion-V2 incorporates multiple safety layers to ensure responsible AI deployment:

	### Built-in Safety Features

	1. Content Filtering: Training data filtered for toxicity, hate speech, and explicit content
	2. Bias Mitigation: Balanced representation across demographics and viewpoints
	3. Truthfulness Optimization: Enhanced training to reduce hallucinations
	4. Instruction Compliance: Fine-tuned to decline harmful requests appropriately

	### Safety Scores

	- ToxiGen Score: 0.08 (Lower is better; competitive with GPT-3.5)
	- CrowS-Pairs Bias: 54.2 (Near-neutral; 50 is perfect balance)
	- TruthfulQA: 52.1% (Highest in 7B parameter class)
	- RealToxicityPrompts: 2.1% toxic completions (with default sampling)

	### Recommended Safety Measures

	For production deployments, we recommend implementing:

	1. Content Moderation API: Use the provided `safety_classifier.py` for output filtering
	2. Input Validation: Screen user inputs for malicious prompts
	3. Rate Limiting: Prevent abuse through usage caps
	4. Monitoring: Log and review model interactions
	5. Human Oversight: Implement human-in-the-loop for sensitive applications

	### Using the Safety Classifier

	```python
	from safety_classifier import SafetyClassifier

	safety = SafetyClassifier()

	# Check if prompt is safe
	is_safe, category = safety.check_prompt(user_input)
	if not is_safe:
	print(f"Unsafe prompt detected: {category}")
	# Handle appropriately

	# Check model output
	response = model.generate(...)
	is_safe, category = safety.check_response(response)
	if not is_safe:
	# Filter or regenerate response
	response = safety.sanitize_response(response)
	```

	See `safety_classifier.py` and `content_moderation.py` for complete implementation.

	---

	## Deployment Options

	### Local Deployment

	Recommended Hardware:
	- GPU: NVIDIA RTX 3090/4090 (24GB) or better
	- RAM: 32GB+ system memory
	- Storage: 20GB for model files

	### Cloud Deployment

	Optimized Configurations:

	```python
	# AWS SageMaker
	from sagemaker.huggingface import HuggingFaceModel

	huggingface_model = HuggingFaceModel(
	model_data="s3://your-bucket/helion-v2",
	role=role,
	transformers_version="4.40",
	pytorch_version="2.1",
	py_version="py310",
	)

	predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge"
	)
	```

	### API Server

	```python
	# Using FastAPI
	from fastapi import FastAPI
	from pydantic import BaseModel

	app = FastAPI()

	class GenerationRequest(BaseModel):
	prompt: str
	max_tokens: int = 256
	temperature: float = 0.7

	@app.post("/generate")
	async def generate(request: GenerationRequest):
	inputs = tokenizer(request.prompt, return_tensors="pt").to(device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=request.max_tokens,
	temperature=request.temperature
	)
	return {"response": tokenizer.decode(outputs[0], skip_special_tokens=True)}
	```

	### GGUF Format (llama.cpp)

	For CPU inference and edge deployment:

	```bash
	# Download GGUF quantized version
	wget https://huggingface.co/DeepXR/Helion-V2-GGUF/resolve/main/helion-v2-q4_k_m.gguf

	# Run with llama.cpp
	./llama-cli -m helion-v2-q4_k_m.gguf -p "Your prompt here" -n 256
	```

	---

	## Training Details

	### Training Data Composition

	\| Data Source \| Percentage \| Tokens \| Description \|
	\|------------\|------------\|--------\|-------------\|
	\| Web Documents \| 45% \| 1.125T \| High-quality web pages, articles, documentation \|
	\| Code Repositories \| 20% \| 500B \| GitHub, Stack Overflow, technical forums \|
	\| Books \| 15% \| 375B \| Fiction, non-fiction, educational materials \|
	\| Scientific Papers \| 10% \| 250B \| ArXiv, PubMed, academic publications \|
	\| Instruction Data \| 10% \| 250B \| Curated instruction-response pairs \|

	Total Training Tokens: 2.5 trillion

	### Data Processing Pipeline

	1. Collection: Scraped from verified sources with license compliance
	2. Quality Filtering: Perplexity-based filtering (threshold: 2000)
	3. Deduplication: MinHash LSH for near-duplicate removal (>95% similarity)
	4. Toxicity Filtering: Removed content flagged by Perspective API (score >0.7)
	5. PII Removal: Named entity recognition and regex-based scrubbing
	6. Language Detection: Filtered for 13 target languages
	7. Code Quality: AST validation, syntax checking, license verification

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Optimizer \| AdamW \|
	\| Peak Learning Rate \| 3e-4 \|
	\| Learning Rate Schedule \| Cosine with warmup \|
	\| Warmup Steps \| 2,000 \|
	\| Weight Decay \| 0.01 \|
	\| Gradient Clipping \| 1.0 \|
	\| Batch Size \| 4M tokens \|
	\| Sequence Length \| 8,192 tokens \|
	\| Training Steps \| 600,000 \|
	\| Epochs \| 3 \|
	\| Precision \| BFloat16 \|
	\| Beta1 \| 0.9 \|
	\| Beta2 \| 0.95 \|
	\| Epsilon \| 1e-8 \|

	### Infrastructure

	- GPUs: 128x NVIDIA H100 80GB (SXM5)
	- Framework: PyTorch 2.1.2 with CUDA 12.1
	- Distributed Training: DeepSpeed ZeRO-3 with CPU offloading
	- Mixed Precision: BFloat16 with gradient scaling
	- Checkpointing: Every 1,000 steps (3 checkpoints retained)
	- Training Duration: 21 days
	- Total GPU Hours: 64,512 hours
	- Estimated Cost: $450,000 USD

	### Post-Training Refinement

	1. Supervised Fine-Tuning (SFT): 150,000 instruction-response pairs
	2. Direct Preference Optimization (DPO): 50,000 preference pairs
	3. Safety Fine-Tuning: 25,000 safety-focused examples
	4. Evaluation-Driven Refinement: Iterative improvements based on benchmark performance

	---

	## Limitations

	### Known Limitations

	1. Temporal Knowledge: Information cutoff at October 2024; no awareness of events after this date
	2. Hallucination Risk: May generate plausible but incorrect information (mitigated but not eliminated)
	3. Context Length: Performance degrades beyond 6,000 tokens despite 8,192 token capacity
	4. Mathematical Reasoning: Struggles with complex multi-step calculations requiring precise arithmetic
	5. Specialized Domains: Limited accuracy in highly technical fields (e.g., advanced physics, medicine, law)
	6. Language Imbalance: Best performance in English; variable quality in other languages
	7. Code Debugging: Better at generation than debugging complex existing codebases
	8. Long-Term Memory: No persistent memory across conversations
	9. Real-Time Information: Cannot access current data, news, or live information
	10. Multimodal Understanding: Text-only model; no image, audio, or video processing

	### Ethical Considerations

	Bias: Training data may reflect societal biases related to gender, race, culture, geography, and socioeconomic status. Users should validate outputs for fairness.

	Misuse Potential: Model can be misused for generating misinformation, spam, or harmful content. Implement appropriate safeguards.

	Environmental Impact: Training consumed significant energy (est. 8,500 kg CO2eq). Consider carbon offset for large-scale deployments.

	Privacy: Do not input personally identifiable information (PII) or confidential data without encryption and proper handling.

	### Use Case Restrictions

	DO NOT USE FOR:
	- Medical diagnosis or treatment recommendations
	- Legal advice or contractual interpretation
	- Financial investment decisions
	- Safety-critical systems (aviation, automotive, medical devices)
	- Autonomous decision-making without human oversight
	- Generating false identification or credentials
	- Impersonating individuals or organizations
	- Processing sensitive personal data without consent

	---

	## Citation

	If you use Helion-V2 in your research or applications, please cite:

	```bibtex
	@misc{helion-v2-2024,
	title={Helion-V2: An Efficient and Truthful Large Language Model for Daily Use},
	author={DeepXR Team},
	year={2025},
	month={November},
	publisher={HuggingFace},
	url={https://huggingface.co/DeepXR/Helion-V2},
	note={7.2B parameter decoder-only transformer with grouped query attention}
	}
	```

	For technical details:

	```bibtex
	@techreport{helion-v2-technical-2025,
	title={Helion-V2: Technical Report},
	author={DeepXR Research Team},
	institution={DeepXR},
	year={2025},
	type={Technical Report},
	url={https://deepxr.ai/research/helion-v2-technical-report.pdf}
	}
	```

	---

	## License

	This model is released under the Apache License 2.0. You are free to:

	- Use commercially
	- Modify and distribute
	- Use privately
	- Use for patent purposes

	Conditions:
	- Include copyright notice
	- Include license copy
	- State changes made
	- Include NOTICE file if present

	See [LICENSE](LICENSE) file for complete terms.

	---

	## Acknowledgments

	We extend our gratitude to:

	- Hugging Face for the Transformers library and model hosting infrastructure
	- PyTorch Team for the deep learning framework
	- DeepSpeed Team (Microsoft) for distributed training tools
	- EleutherAI for evaluation frameworks and benchmarks
	- Open Source Community for datasets, tools, and collaborative research
	- Our Compute Partners for providing GPU infrastructure

	Special thanks to researchers whose work influenced this project: LLaMA, Mistral, GPT, PaLM, and countless others advancing open language models.

	---


	<div align="center">

	Developed with care by the DeepXR Team

	Building responsible, capable, and accessible AI for everyone

	</div>