HyperLLM-4b / README.md
bebis1's picture
Update README with corrected eval results (extraction bug fix)
b45908b verified
---
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
license: apache-2.0
language:
- en
tags:
- trading
- finance
- hyperliquid
- perpetuals
- defi
- lora
- dpo
- sft
- trl
- base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
model_name: HyperLLM-4b
pipeline_tag: text-generation
---
# HyperLLM-4b v0.3
A specialized 4B parameter language model fine-tuned for Hyperliquid perpetual DEX trading assistance. Built on Qwen3-4B-Instruct using LoRA + DPO training.
## Model Description
HyperLLM is designed to assist with:
- **Position sizing calculations** - Risk-based position sizing with proper decimal handling
- **API structure understanding** - Hyperliquid exchange API request/response formats
- **Trading mechanics** - Perpetual futures concepts, margin modes, order types
- **Parameter validation** - Validating trade parameters against exchange constraints
- **Edge case handling** - Boundary conditions and unusual trading scenarios
## Version History
### v0.3 (Current - March 6, 2026)
**Training Pipeline:** SFT (7,028 examples) + DPO (1,400 preference pairs)
| Change | v0.2 | v0.3 | Impact |
|--------|------|------|--------|
| Learning Rate | 3e-5 | 1e-5 | Reduced catastrophic forgetting |
| Quantization | QLoRA 4-bit | Full LoRA | Better quality on A100 |
| General Data Mix | 10% | 25% | Preserved general capabilities |
| Training Stage | SFT only | SFT + DPO | Targeted behavioral fixes |
| Eval Questions | 297 | 337 | More comprehensive testing |
**Key Improvements over v0.2:**
- Recovered parameter validation: 73.3% → **93.3%** (+20%)
- Recovered edge cases: 75.0% → **92.5%** (+17.5%)
- Improved adversarial handling: 36.9% → **59.0%** (+22.1%)
- Improved general capability: 83.6% → **90.9%** (+7.3%)
- Major API structure gain: 42.5% → **44.2%** (+1.7%)
### v0.2 (March 4, 2026)
**Training Pipeline:** QLoRA SFT only
| Metric | Baseline | v0.2 | Change |
|--------|----------|------|--------|
| Overall | 70.2% | 65.0% | -5.2% |
| Factual Knowledge | 33.3% | **80.0%** | **+46.7%** |
| Parameter Validation | 93.3% | 73.3% | -20.0% |
| Edge Cases | 92.5% | 75.0% | -17.5% |
**Issues:** Catastrophic forgetting caused regressions in safety-critical categories despite massive factual knowledge gains.
### v0.1 (February 28, 2026)
**Training Pipeline:** QLoRA SFT (1,823 examples)
| Metric | Baseline | v0.1 | Change |
|--------|----------|------|--------|
| Overall | 36.0% | **64.0%** | **+28%** |
| Factual Knowledge | 20.0% | **70.0%** | **+50%** |
| API Structure | 16.7% | **50.0%** | **+33%** |
**Issues:** Small eval set (25 questions), parameter validation regressed.
## Evaluation Results (v0.3)
Evaluated on 337 questions across 9 categories:
*Note: Results updated March 6, 2026 after fixing an eval extraction bug that was extracting restated question values instead of computed answers.*
| Category | Baseline | v0.3 | Change |
|----------|----------|------|--------|
| Parameter Validation | 93.3% | **93.3%** | Maintained |
| Edge Cases | 95.0% | **92.5%** | -2.5% |
| General Capability | 89.1% | **90.9%** | +1.8% |
| Position Sizing | 83.3% | **88.3%** | **+5.0%** |
| Trading Mechanics | 80.0% | **80.0%** | Maintained |
| Adversarial % | 57.0% | **59.0%** | **+2.0%** |
| Multi-step | 43.0% | **39.3%** | -3.7% |
| API Structure | 27.5% | **44.2%** | **+16.7%** |
| Factual | 26.7% | **40.0%** | **+13.3%** |
| **Overall** | **70.1%** | **72.4%** | **+2.3%** |
## Training Configuration
### LoRA Parameters
```python
{
"r": 64,
"lora_alpha": 128,
"lora_dropout": 0.05,
"target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
"use_rslora": True
}
```
### SFT Hyperparameters
```python
{
"learning_rate": 1e-5,
"epochs": 5, # Early stopped at 1.52
"batch_size": 4,
"gradient_accumulation_steps": 2,
"warmup_ratio": 0.10,
"max_length": 4096
}
```
### DPO Hyperparameters
```python
{
"beta": 0.1,
"learning_rate": 5e-7,
"epochs": 2,
"batch_size": 4,
"max_length": 2048
}
```
### Training Data Distribution
**SFT (7,028 examples):**
| Category | Examples | % |
|----------|----------|---|
| General Instruction | 1,500 | 21.3% |
| Position Sizing | 800 | 11.4% |
| Parameter Validation | 800 | 11.4% |
| Adversarial Percentages | 600 | 8.5% |
| Multi-step Reasoning | 500 | 7.1% |
| Edge Cases | 400 | 5.7% |
| API Examples | 400 | 5.7% |
| Knowledge Q&A | 373 | 5.3% |
| Other | 1,655 | 23.6% |
**DPO (1,400 preference pairs):**
| Failure Mode | Pairs | % |
|--------------|-------|---|
| Excessive Leverage | 370 | 26.4% |
| Position Sizing | 330 | 23.6% |
| Percentage Confusion | 226 | 16.1% |
| Risk Violation | 195 | 13.9% |
| Policy Bypass | 140 | 10.0% |
| Uncertainty Caution | 139 | 9.9% |
## Usage
### With Transformers + PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Instruct-2507",
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
# Generate
messages = [{"role": "user", "content": "Calculate position size for $10,000 account, 2% risk, entry $50, stop loss $48"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### With 4-bit Quantization (Low VRAM)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Instruct-2507",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
```
## Example Use Cases
### Position Sizing
```
User: I have a $50,000 account and want to risk 2% on an ETH trade.
Entry at $3,450, stop loss at $3,400. What's my position size?
Model: Risk amount = $50,000 * 2% = $1,000
Stop distance = $3,450 - $3,400 = $50
Position size = $1,000 / $50 = 20 ETH
```
### Parameter Validation
```
User: Is a position size of 1.234 valid for ETH (szDecimals=2)?
Model: No, 1.234 has 3 decimal places but ETH only allows 2 (szDecimals=2).
Valid sizes would be 1.23 or 1.24.
```
### Adversarial Percentage Handling
```
User: Increase my position by 50%
Model: I need clarification. "Increase by 50%" could mean:
1. Add 50% of current position (1.5x total)
2. Increase to 50% of account (absolute)
Which do you mean?
```
## Limitations
- **Multi-step Reasoning:** 39.3% accuracy - complex multi-step calculations are challenging for 4B model
- **API Structure:** 44.2% accuracy - improved but still needs work on exact JSON field names
- **Adversarial %:** 59.0% accuracy - better handling but still susceptible to tricky percentage phrasing
## Hardware Requirements
| Mode | VRAM | Notes |
|------|------|-------|
| bfloat16 | ~10GB | Full precision inference |
| 4-bit | ~4GB | Quantized inference |
| 8-bit | ~6GB | INT8 quantization |
## Training Hardware
- **Hardware:** NVIDIA A100 80GB SXM
- **SFT Duration:** ~20 minutes
- **DPO Duration:** ~17 minutes
- **Total Cost:** ~$1.50 (RunPod)
## Framework Versions
- PEFT: 0.18.1
- TRL: 0.29.0
- Transformers: 5.2.0
- PyTorch: 2.10.0
## License
Apache 2.0
## Citation
```bibtex
@misc{hyperllm2026,
title={HyperLLM: A Specialized LLM for Hyperliquid Trading},
author={UVLabs},
year={2026},
url={https://huggingface.co/UVLabs/HyperLLM-4b}
}
```