HyperLLM-4b v0.3

A specialized 4B parameter language model fine-tuned for Hyperliquid perpetual DEX trading assistance. Built on Qwen3-4B-Instruct using LoRA + DPO training.

Model Description

HyperLLM is designed to assist with:

  • Position sizing calculations - Risk-based position sizing with proper decimal handling
  • API structure understanding - Hyperliquid exchange API request/response formats
  • Trading mechanics - Perpetual futures concepts, margin modes, order types
  • Parameter validation - Validating trade parameters against exchange constraints
  • Edge case handling - Boundary conditions and unusual trading scenarios

Version History

v0.3 (Current - March 6, 2026)

Training Pipeline: SFT (7,028 examples) + DPO (1,400 preference pairs)

Change v0.2 v0.3 Impact
Learning Rate 3e-5 1e-5 Reduced catastrophic forgetting
Quantization QLoRA 4-bit Full LoRA Better quality on A100
General Data Mix 10% 25% Preserved general capabilities
Training Stage SFT only SFT + DPO Targeted behavioral fixes
Eval Questions 297 337 More comprehensive testing

Key Improvements over v0.2:

  • Recovered parameter validation: 73.3% → 93.3% (+20%)
  • Recovered edge cases: 75.0% → 92.5% (+17.5%)
  • Improved adversarial handling: 36.9% → 59.0% (+22.1%)
  • Improved general capability: 83.6% → 90.9% (+7.3%)
  • Major API structure gain: 42.5% → 44.2% (+1.7%)

v0.2 (March 4, 2026)

Training Pipeline: QLoRA SFT only

Metric Baseline v0.2 Change
Overall 70.2% 65.0% -5.2%
Factual Knowledge 33.3% 80.0% +46.7%
Parameter Validation 93.3% 73.3% -20.0%
Edge Cases 92.5% 75.0% -17.5%

Issues: Catastrophic forgetting caused regressions in safety-critical categories despite massive factual knowledge gains.

v0.1 (February 28, 2026)

Training Pipeline: QLoRA SFT (1,823 examples)

Metric Baseline v0.1 Change
Overall 36.0% 64.0% +28%
Factual Knowledge 20.0% 70.0% +50%
API Structure 16.7% 50.0% +33%

Issues: Small eval set (25 questions), parameter validation regressed.

Evaluation Results (v0.3)

Evaluated on 337 questions across 9 categories:

Note: Results updated March 6, 2026 after fixing an eval extraction bug that was extracting restated question values instead of computed answers.

Category Baseline v0.3 Change
Parameter Validation 93.3% 93.3% Maintained
Edge Cases 95.0% 92.5% -2.5%
General Capability 89.1% 90.9% +1.8%
Position Sizing 83.3% 88.3% +5.0%
Trading Mechanics 80.0% 80.0% Maintained
Adversarial % 57.0% 59.0% +2.0%
Multi-step 43.0% 39.3% -3.7%
API Structure 27.5% 44.2% +16.7%
Factual 26.7% 40.0% +13.3%
Overall 70.1% 72.4% +2.3%

Training Configuration

LoRA Parameters

{
    "r": 64,
    "lora_alpha": 128,
    "lora_dropout": 0.05,
    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    "use_rslora": True
}

SFT Hyperparameters

{
    "learning_rate": 1e-5,
    "epochs": 5,  # Early stopped at 1.52
    "batch_size": 4,
    "gradient_accumulation_steps": 2,
    "warmup_ratio": 0.10,
    "max_length": 4096
}

DPO Hyperparameters

{
    "beta": 0.1,
    "learning_rate": 5e-7,
    "epochs": 2,
    "batch_size": 4,
    "max_length": 2048
}

Training Data Distribution

SFT (7,028 examples):

Category Examples %
General Instruction 1,500 21.3%
Position Sizing 800 11.4%
Parameter Validation 800 11.4%
Adversarial Percentages 600 8.5%
Multi-step Reasoning 500 7.1%
Edge Cases 400 5.7%
API Examples 400 5.7%
Knowledge Q&A 373 5.3%
Other 1,655 23.6%

DPO (1,400 preference pairs):

Failure Mode Pairs %
Excessive Leverage 370 26.4%
Position Sizing 330 23.6%
Percentage Confusion 226 16.1%
Risk Violation 195 13.9%
Policy Bypass 140 10.0%
Uncertainty Caution 139 9.9%

Usage

With Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")

# Generate
messages = [{"role": "user", "content": "Calculate position size for $10,000 account, 2% risk, entry $50, stop loss $48"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With 4-bit Quantization (Low VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507",
    quantization_config=bnb_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")

Example Use Cases

Position Sizing

User: I have a $50,000 account and want to risk 2% on an ETH trade.
      Entry at $3,450, stop loss at $3,400. What's my position size?

Model: Risk amount = $50,000 * 2% = $1,000
       Stop distance = $3,450 - $3,400 = $50
       Position size = $1,000 / $50 = 20 ETH

Parameter Validation

User: Is a position size of 1.234 valid for ETH (szDecimals=2)?

Model: No, 1.234 has 3 decimal places but ETH only allows 2 (szDecimals=2).
       Valid sizes would be 1.23 or 1.24.

Adversarial Percentage Handling

User: Increase my position by 50%

Model: I need clarification. "Increase by 50%" could mean:
       1. Add 50% of current position (1.5x total)
       2. Increase to 50% of account (absolute)
       Which do you mean?

Limitations

  • Multi-step Reasoning: 39.3% accuracy - complex multi-step calculations are challenging for 4B model
  • API Structure: 44.2% accuracy - improved but still needs work on exact JSON field names
  • Adversarial %: 59.0% accuracy - better handling but still susceptible to tricky percentage phrasing

Hardware Requirements

Mode VRAM Notes
bfloat16 ~10GB Full precision inference
4-bit ~4GB Quantized inference
8-bit ~6GB INT8 quantization

Training Hardware

  • Hardware: NVIDIA A100 80GB SXM
  • SFT Duration: ~20 minutes
  • DPO Duration: ~17 minutes
  • Total Cost: ~$1.50 (RunPod)

Framework Versions

  • PEFT: 0.18.1
  • TRL: 0.29.0
  • Transformers: 5.2.0
  • PyTorch: 2.10.0

License

Apache 2.0

Citation

@misc{hyperllm2026,
  title={HyperLLM: A Specialized LLM for Hyperliquid Trading},
  author={UVLabs},
  year={2026},
  url={https://huggingface.co/UVLabs/HyperLLM-4b}
}
Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UVLabs/HyperLLM-4b

Adapter
(5263)
this model