| | --- |
| | base_model: Qwen/Qwen3-4B-Instruct-2507 |
| | library_name: peft |
| | license: apache-2.0 |
| | language: |
| | - en |
| | tags: |
| | - trading |
| | - finance |
| | - hyperliquid |
| | - perpetuals |
| | - defi |
| | - lora |
| | - dpo |
| | - sft |
| | - trl |
| | - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507 |
| | model_name: HyperLLM-4b |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # HyperLLM-4b v0.3 |
| |
|
| | A specialized 4B parameter language model fine-tuned for Hyperliquid perpetual DEX trading assistance. Built on Qwen3-4B-Instruct using LoRA + DPO training. |
| |
|
| | ## Model Description |
| |
|
| | HyperLLM is designed to assist with: |
| | - **Position sizing calculations** - Risk-based position sizing with proper decimal handling |
| | - **API structure understanding** - Hyperliquid exchange API request/response formats |
| | - **Trading mechanics** - Perpetual futures concepts, margin modes, order types |
| | - **Parameter validation** - Validating trade parameters against exchange constraints |
| | - **Edge case handling** - Boundary conditions and unusual trading scenarios |
| |
|
| | ## Version History |
| |
|
| | ### v0.3 (Current - March 6, 2026) |
| |
|
| | **Training Pipeline:** SFT (7,028 examples) + DPO (1,400 preference pairs) |
| |
|
| | | Change | v0.2 | v0.3 | Impact | |
| | |--------|------|------|--------| |
| | | Learning Rate | 3e-5 | 1e-5 | Reduced catastrophic forgetting | |
| | | Quantization | QLoRA 4-bit | Full LoRA | Better quality on A100 | |
| | | General Data Mix | 10% | 25% | Preserved general capabilities | |
| | | Training Stage | SFT only | SFT + DPO | Targeted behavioral fixes | |
| | | Eval Questions | 297 | 337 | More comprehensive testing | |
| |
|
| | **Key Improvements over v0.2:** |
| | - Recovered parameter validation: 73.3% → **93.3%** (+20%) |
| | - Recovered edge cases: 75.0% → **92.5%** (+17.5%) |
| | - Improved adversarial handling: 36.9% → **59.0%** (+22.1%) |
| | - Improved general capability: 83.6% → **90.9%** (+7.3%) |
| | - Major API structure gain: 42.5% → **44.2%** (+1.7%) |
| |
|
| | ### v0.2 (March 4, 2026) |
| |
|
| | **Training Pipeline:** QLoRA SFT only |
| |
|
| | | Metric | Baseline | v0.2 | Change | |
| | |--------|----------|------|--------| |
| | | Overall | 70.2% | 65.0% | -5.2% | |
| | | Factual Knowledge | 33.3% | **80.0%** | **+46.7%** | |
| | | Parameter Validation | 93.3% | 73.3% | -20.0% | |
| | | Edge Cases | 92.5% | 75.0% | -17.5% | |
| |
|
| | **Issues:** Catastrophic forgetting caused regressions in safety-critical categories despite massive factual knowledge gains. |
| |
|
| | ### v0.1 (February 28, 2026) |
| |
|
| | **Training Pipeline:** QLoRA SFT (1,823 examples) |
| |
|
| | | Metric | Baseline | v0.1 | Change | |
| | |--------|----------|------|--------| |
| | | Overall | 36.0% | **64.0%** | **+28%** | |
| | | Factual Knowledge | 20.0% | **70.0%** | **+50%** | |
| | | API Structure | 16.7% | **50.0%** | **+33%** | |
| |
|
| | **Issues:** Small eval set (25 questions), parameter validation regressed. |
| |
|
| | ## Evaluation Results (v0.3) |
| |
|
| | Evaluated on 337 questions across 9 categories: |
| |
|
| | *Note: Results updated March 6, 2026 after fixing an eval extraction bug that was extracting restated question values instead of computed answers.* |
| |
|
| | | Category | Baseline | v0.3 | Change | |
| | |----------|----------|------|--------| |
| | | Parameter Validation | 93.3% | **93.3%** | Maintained | |
| | | Edge Cases | 95.0% | **92.5%** | -2.5% | |
| | | General Capability | 89.1% | **90.9%** | +1.8% | |
| | | Position Sizing | 83.3% | **88.3%** | **+5.0%** | |
| | | Trading Mechanics | 80.0% | **80.0%** | Maintained | |
| | | Adversarial % | 57.0% | **59.0%** | **+2.0%** | |
| | | Multi-step | 43.0% | **39.3%** | -3.7% | |
| | | API Structure | 27.5% | **44.2%** | **+16.7%** | |
| | | Factual | 26.7% | **40.0%** | **+13.3%** | |
| | | **Overall** | **70.1%** | **72.4%** | **+2.3%** | |
| |
|
| | ## Training Configuration |
| |
|
| | ### LoRA Parameters |
| | ```python |
| | { |
| | "r": 64, |
| | "lora_alpha": 128, |
| | "lora_dropout": 0.05, |
| | "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], |
| | "use_rslora": True |
| | } |
| | ``` |
| |
|
| | ### SFT Hyperparameters |
| | ```python |
| | { |
| | "learning_rate": 1e-5, |
| | "epochs": 5, # Early stopped at 1.52 |
| | "batch_size": 4, |
| | "gradient_accumulation_steps": 2, |
| | "warmup_ratio": 0.10, |
| | "max_length": 4096 |
| | } |
| | ``` |
| |
|
| | ### DPO Hyperparameters |
| | ```python |
| | { |
| | "beta": 0.1, |
| | "learning_rate": 5e-7, |
| | "epochs": 2, |
| | "batch_size": 4, |
| | "max_length": 2048 |
| | } |
| | ``` |
| |
|
| | ### Training Data Distribution |
| |
|
| | **SFT (7,028 examples):** |
| |
|
| | | Category | Examples | % | |
| | |----------|----------|---| |
| | | General Instruction | 1,500 | 21.3% | |
| | | Position Sizing | 800 | 11.4% | |
| | | Parameter Validation | 800 | 11.4% | |
| | | Adversarial Percentages | 600 | 8.5% | |
| | | Multi-step Reasoning | 500 | 7.1% | |
| | | Edge Cases | 400 | 5.7% | |
| | | API Examples | 400 | 5.7% | |
| | | Knowledge Q&A | 373 | 5.3% | |
| | | Other | 1,655 | 23.6% | |
| |
|
| | **DPO (1,400 preference pairs):** |
| |
|
| | | Failure Mode | Pairs | % | |
| | |--------------|-------|---| |
| | | Excessive Leverage | 370 | 26.4% | |
| | | Position Sizing | 330 | 23.6% | |
| | | Percentage Confusion | 226 | 16.1% | |
| | | Risk Violation | 195 | 13.9% | |
| | | Policy Bypass | 140 | 10.0% | |
| | | Uncertainty Caution | 139 | 9.9% | |
| |
|
| | ## Usage |
| |
|
| | ### With Transformers + PEFT |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | import torch |
| | |
| | # Load base model |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "Qwen/Qwen3-4B-Instruct-2507", |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | |
| | # Load LoRA adapter |
| | model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b") |
| | tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b") |
| | |
| | # Generate |
| | messages = [{"role": "user", "content": "Calculate position size for $10,000 account, 2% risk, entry $50, stop loss $48"}] |
| | inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) |
| | outputs = model.generate(inputs, max_new_tokens=256) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ### With 4-bit Quantization (Low VRAM) |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| | from peft import PeftModel |
| | import torch |
| | |
| | bnb_config = BitsAndBytesConfig( |
| | load_in_4bit=True, |
| | bnb_4bit_quant_type="nf4", |
| | bnb_4bit_compute_dtype=torch.bfloat16, |
| | ) |
| | |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "Qwen/Qwen3-4B-Instruct-2507", |
| | quantization_config=bnb_config, |
| | device_map="auto", |
| | ) |
| | |
| | model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b") |
| | tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b") |
| | ``` |
| |
|
| | ## Example Use Cases |
| |
|
| | ### Position Sizing |
| | ``` |
| | User: I have a $50,000 account and want to risk 2% on an ETH trade. |
| | Entry at $3,450, stop loss at $3,400. What's my position size? |
| | |
| | Model: Risk amount = $50,000 * 2% = $1,000 |
| | Stop distance = $3,450 - $3,400 = $50 |
| | Position size = $1,000 / $50 = 20 ETH |
| | ``` |
| |
|
| | ### Parameter Validation |
| | ``` |
| | User: Is a position size of 1.234 valid for ETH (szDecimals=2)? |
| | |
| | Model: No, 1.234 has 3 decimal places but ETH only allows 2 (szDecimals=2). |
| | Valid sizes would be 1.23 or 1.24. |
| | ``` |
| |
|
| | ### Adversarial Percentage Handling |
| | ``` |
| | User: Increase my position by 50% |
| | |
| | Model: I need clarification. "Increase by 50%" could mean: |
| | 1. Add 50% of current position (1.5x total) |
| | 2. Increase to 50% of account (absolute) |
| | Which do you mean? |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | - **Multi-step Reasoning:** 39.3% accuracy - complex multi-step calculations are challenging for 4B model |
| | - **API Structure:** 44.2% accuracy - improved but still needs work on exact JSON field names |
| | - **Adversarial %:** 59.0% accuracy - better handling but still susceptible to tricky percentage phrasing |
| |
|
| | ## Hardware Requirements |
| |
|
| | | Mode | VRAM | Notes | |
| | |------|------|-------| |
| | | bfloat16 | ~10GB | Full precision inference | |
| | | 4-bit | ~4GB | Quantized inference | |
| | | 8-bit | ~6GB | INT8 quantization | |
| |
|
| | ## Training Hardware |
| |
|
| | - **Hardware:** NVIDIA A100 80GB SXM |
| | - **SFT Duration:** ~20 minutes |
| | - **DPO Duration:** ~17 minutes |
| | - **Total Cost:** ~$1.50 (RunPod) |
| |
|
| | ## Framework Versions |
| |
|
| | - PEFT: 0.18.1 |
| | - TRL: 0.29.0 |
| | - Transformers: 5.2.0 |
| | - PyTorch: 2.10.0 |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{hyperllm2026, |
| | title={HyperLLM: A Specialized LLM for Hyperliquid Trading}, |
| | author={UVLabs}, |
| | year={2026}, |
| | url={https://huggingface.co/UVLabs/HyperLLM-4b} |
| | } |
| | ``` |
| |
|