Upload HyperLLM v0.3 - SFT+DPO trained model

Browse files

Files changed (6) hide show

README.md +221 -131
adapter_config.json +10 -3
adapter_model.safetensors +1 -1
tokenizer.json +2 -2
tokenizer_config.json +7 -217
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,110 +1,196 @@
 ---
 base_model: Qwen/Qwen3-4B-Instruct-2507
 library_name: peft
-license: mit
 language:
-- en
 tags:
-- trading
-- hyperliquid
-- perpetuals
-- defi
-- lora
-- qlora
-datasets:
-- custom
 pipeline_tag: text-generation
 ---
-# HyperLLM-4b v0.2
-A specialized trading assistant fine-tuned for [Hyperliquid](https://hyperliquid.xyz), a perpetual futures DEX. Built on Qwen3-4B-Instruct using QLoRA.
 ## Model Description
-HyperLLM is designed to assist with Hyperliquid perpetual trading tasks including:
-- Position sizing calculations with proper risk management
-- Hyperliquid API request/response formatting
-- Parameter validation for trades
-- Hyperliquid-specific knowledge (order types, leverage limits, API endpoints)
-**This is a LoRA adapter** - you need to load it on top of the base model.
-## What's New in v0.2 (vs v0.1)
-| Change | v0.1 | v0.2 |
-|--------|------|------|
-| **Hardware** | Local consumer GPU | A100 80GB (RunPod) |
-| **Max Sequence Length** | 2048 | 4096 |
-| **Batch Size** | 1 | 4 |
-| **rsLoRA** | No | Yes |
-| **Flash Attention** | No | Yes |
-| **Early Stopping** | No | Yes (patience=3) |
-| **Training Precision** | fp16 | bf16 |
-| **Evaluation** | Basic | Comprehensive (297 questions) |
-### Key Improvements
-- **+46.7% factual knowledge**: Hyperliquid-specific facts improved from 33.3% → 80.0%
-- **+6.7% API structure**: Better at formatting Hyperliquid API requests
-- **+3.3% position sizing**: Core trading calculation improvements
-- **Longer context**: 4096 tokens vs 2048 for complex multi-step reasoning
-- **rsLoRA**: Rank-stabilized LoRA for better training stability
-### Known Regressions
-v0.2 exhibits some catastrophic forgetting compared to the base model:
-- Parameter validation: -20% (73.3% vs 93.3% baseline)
-- Edge case handling: -17.5% (75.0% vs 92.5% baseline)
-- Adversarial percentage questions: -12.5% (36.9% vs 49.4% baseline)
-These will be addressed in v0.3 with replay data and DPO training.
-## Training Details
-| Parameter | Value |
-|-----------|-------|
-| Base Model | Qwen/Qwen3-4B-Instruct-2507 |
-| LoRA Rank | 64 |
-| LoRA Alpha | 128 |
-| Dropout | 0.05 |
-| Learning Rate | 3e-5 |
-| Effective Batch Size | 8 |
-| Training Loss | 0.159 |
-| Token Accuracy | 95.5% |
-| Training Time | 26 minutes |
-| Hardware | NVIDIA A100 80GB |
-| Quantization | 4-bit NF4 (QLoRA) |
-### Target Modules
-- q_proj, k_proj, v_proj, o_proj (attention)
-- gate_proj, up_proj, down_proj (MLP)
-## Evaluation Results
-Tested on 297 questions across 9 categories:
-| Category | Score | vs Baseline |
-|----------|-------|-------------|
-| Factual Knowledge | 80.0% | **+46.7%** |
-| API Structure | 42.5% | +6.7% |
-| Position Sizing | 83.3% | +3.3% |
-| Trading Mechanics | 70.0% | -10.0% |
-| Parameter Validation | 73.3% | -20.0% |
-| Edge Cases | 75.0% | -17.5% |
-| General Capability | 83.6% | -7.3% |
-| Adversarial % | 36.9% | -12.5% |
-| Multi-step Reasoning | 24.0% | -3.0% |
-| **Overall** | **65.0%** | -5.2% |
 ## Usage
 ### With Transformers + PEFT
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 from peft import PeftModel
 import torch
-# Load base model with 4-bit quantization
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
@@ -117,75 +203,79 @@ base_model = AutoModelForCausalLM.from_pretrained(
     device_map="auto",
 )
-# Load LoRA adapter
-model = PeftModel.from_pretrained(
-    base_model,
-    "UVLabs/HyperLLM-4b",
-    revision="v0.2"
-)
-tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b", revision="v0.2")
-# Example: Position sizing
-messages = [
-    {"role": "user", "content": "I have $10,000 and want to risk 2% on a BTC long at $50,000 with a stop at $48,000. What position size?"}
-]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-### Without Quantization (More VRAM)
-```python
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
-base_model = AutoModelForCausalLM.from_pretrained(
-    "Qwen/Qwen3-4B-Instruct-2507",
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-)
-model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b", revision="v0.2")
 ```
-## Intended Use
-- Assisting with Hyperliquid perpetual trading calculations
-- Learning Hyperliquid API structure and parameters
-- Position sizing with risk management
-- Understanding Hyperliquid-specific concepts
-## Limitations
-- **Not financial advice**: This model is for educational/informational purposes only
-- **Verify calculations**: Always double-check position sizes and risk calculations
-- **Catastrophic forgetting**: Some general capabilities regressed vs base model
-- **Adversarial inputs**: Model can be confused by tricky percentage questions
 ## License
-MIT
 ## Citation
 ```bibtex
 @misc{hyperllm2026,
-  title={HyperLLM: A Specialized Trading Assistant for Hyperliquid},
   author={UVLabs},
   year={2026},
-  publisher={Hugging Face},
   url={https://huggingface.co/UVLabs/HyperLLM-4b}
 }
 ```
-## Framework Versions
-- PEFT: 0.15.0
-- Transformers: 4.52.0
-- PyTorch: 2.7.0
-- bitsandbytes: 0.45.4

 ---
 base_model: Qwen/Qwen3-4B-Instruct-2507
 library_name: peft
+license: apache-2.0
 language:
+  - en
 tags:
+  - trading
+  - finance
+  - hyperliquid
+  - perpetuals
+  - defi
+  - lora
+  - dpo
+  - sft
+  - trl
+  - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
+model_name: HyperLLM-4b
 pipeline_tag: text-generation
 ---
+# HyperLLM-4b v0.3
+A specialized 4B parameter language model fine-tuned for Hyperliquid perpetual DEX trading assistance. Built on Qwen3-4B-Instruct using LoRA + DPO training.
 ## Model Description
+HyperLLM is designed to assist with:
+- **Position sizing calculations** - Risk-based position sizing with proper decimal handling
+- **API structure understanding** - Hyperliquid exchange API request/response formats
+- **Trading mechanics** - Perpetual futures concepts, margin modes, order types
+- **Parameter validation** - Validating trade parameters against exchange constraints
+- **Edge case handling** - Boundary conditions and unusual trading scenarios
+## Version History
+### v0.3 (Current - March 6, 2026)
+**Training Pipeline:** SFT (7,028 examples) + DPO (1,400 preference pairs)
+| Change | v0.2 | v0.3 | Impact |
+|--------|------|------|--------|
+| Learning Rate | 3e-5 | 1e-5 | Reduced catastrophic forgetting |
+| Quantization | QLoRA 4-bit | Full LoRA | Better quality on A100 |
+| General Data Mix | 10% | 25% | Preserved general capabilities |
+| Training Stage | SFT only | SFT + DPO | Targeted behavioral fixes |
+| Eval Questions | 297 | 337 | More comprehensive testing |
+**Key Improvements over v0.2:**
+- Recovered parameter validation: 73.3% &rarr; **93.3%** (+20%)
+- Recovered edge cases: 75.0% &rarr; **92.5%** (+17.5%)
+- Improved adversarial handling: 36.9% &rarr; **51.5%** (+14.6%)
+- Improved general capability: 83.6% &rarr; **90.9%** (+7.3%)
+### v0.2 (March 4, 2026)
+**Training Pipeline:** QLoRA SFT only
+| Metric | Baseline | v0.2 | Change |
+|--------|----------|------|--------|
+| Overall | 70.2% | 65.0% | -5.2% |
+| Factual Knowledge | 33.3% | **80.0%** | **+46.7%** |
+| Parameter Validation | 93.3% | 73.3% | -20.0% |
+| Edge Cases | 92.5% | 75.0% | -17.5% |
+**Issues:** Catastrophic forgetting caused regressions in safety-critical categories despite massive factual knowledge gains.
+### v0.1 (February 28, 2026)
+**Training Pipeline:** QLoRA SFT (1,823 examples)
+| Metric | Baseline | v0.1 | Change |
+|--------|----------|------|--------|
+| Overall | 36.0% | **64.0%** | **+28%** |
+| Factual Knowledge | 20.0% | **70.0%** | **+50%** |
+| API Structure | 16.7% | **50.0%** | **+33%** |
+**Issues:** Small eval set (25 questions), parameter validation regressed.
+## Evaluation Results (v0.3)
+Evaluated on 337 questions across 9 categories:
+| Category | Baseline | v0.3 | Change |
+|----------|----------|------|--------|
+| Parameter Validation | 93.3% | **93.3%** | Maintained |
+| Edge Cases | 92.5% | **92.5%** | Maintained |
+| General Capability | 89.1% | **90.9%** | +1.8% |
+| Position Sizing | 83.3% | **83.3%** | Maintained |
+| Trading Mechanics | 80.0% | **80.0%** | Maintained |
+| Adversarial % | 53.5% | **51.5%** | -2.0% |
+| Factual | 20.0% | **40.0%** | **+20%** |
+| Multi-step | 31.3% | **30.3%** | -1.0% |
+| API Structure | 27.5% | **27.5%** | Maintained |
+| **Overall** | **67.4%** | **67.9%** | **+0.5%** |
+## Training Configuration
+### LoRA Parameters
+```python
+{
+    "r": 64,
+    "lora_alpha": 128,
+    "lora_dropout": 0.05,
+    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+    "use_rslora": True
+}
+```
+### SFT Hyperparameters
+```python
+{
+    "learning_rate": 1e-5,
+    "epochs": 5,  # Early stopped at 1.52
+    "batch_size": 4,
+    "gradient_accumulation_steps": 2,
+    "warmup_ratio": 0.10,
+    "max_length": 4096
+}
+```
+### DPO Hyperparameters
+```python
+{
+    "beta": 0.1,
+    "learning_rate": 5e-7,
+    "epochs": 2,
+    "batch_size": 4,
+    "max_length": 2048
+}
+```
+### Training Data Distribution
+**SFT (7,028 examples):**
+| Category | Examples | % |
+|----------|----------|---|
+| General Instruction | 1,500 | 21.3% |
+| Position Sizing | 800 | 11.4% |
+| Parameter Validation | 800 | 11.4% |
+| Adversarial Percentages | 600 | 8.5% |
+| Multi-step Reasoning | 500 | 7.1% |
+| Edge Cases | 400 | 5.7% |
+| API Examples | 400 | 5.7% |
+| Knowledge Q&A | 373 | 5.3% |
+| Other | 1,655 | 23.6% |
+**DPO (1,400 preference pairs):**
+| Failure Mode | Pairs | % |
+|--------------|-------|---|
+| Excessive Leverage | 370 | 26.4% |
+| Position Sizing | 330 | 23.6% |
+| Percentage Confusion | 226 | 16.1% |
+| Risk Violation | 195 | 13.9% |
+| Policy Bypass | 140 | 10.0% |
+| Uncertainty Caution | 139 | 9.9% |
 ## Usage
 ### With Transformers + PEFT
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen3-4B-Instruct-2507",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
+tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
+# Generate
+messages = [{"role": "user", "content": "Calculate position size for $10,000 account, 2% risk, entry $50, stop loss $48"}]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### With 4-bit Quantization (Low VRAM)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 from peft import PeftModel
 import torch
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
     device_map="auto",
 )
+model = PeftModel.from_pretrained(base_model, "UVLabs/HyperLLM-4b")
+tokenizer = AutoTokenizer.from_pretrained("UVLabs/HyperLLM-4b")
+```
+## Example Use Cases
+### Position Sizing
+```
+User: I have a $50,000 account and want to risk 2% on an ETH trade.
+      Entry at $3,450, stop loss at $3,400. What's my position size?
+Model: Risk amount = $50,000 * 2% = $1,000
+       Stop distance = $3,450 - $3,400 = $50
+       Position size = $1,000 / $50 = 20 ETH
 ```
+### Parameter Validation
+```
+User: Is a position size of 1.234 valid for ETH (szDecimals=2)?
+Model: No, 1.234 has 3 decimal places but ETH only allows 2 (szDecimals=2).
+       Valid sizes would be 1.23 or 1.24.
+```
+### Adversarial Percentage Handling
+```
+User: Increase my position by 50%
+Model: I need clarification. "Increase by 50%" could mean:
+       1. Add 50% of current position (1.5x total)
+       2. Increase to 50% of account (absolute)
+       Which do you mean?
 ```
+## Limitations
+- **API Structure:** 27.5% accuracy - struggles with exact JSON field names
+- **Multi-step Reasoning:** 30.3% accuracy - complex multi-step calculations are challenging for 4B model
+- **Adversarial %:** 51.5% accuracy - still susceptible to tricky percentage phrasing
+## Hardware Requirements
+| Mode | VRAM | Notes |
+|------|------|-------|
+| bfloat16 | ~10GB | Full precision inference |
+| 4-bit | ~4GB | Quantized inference |
+| 8-bit | ~6GB | INT8 quantization |
+## Training Hardware
+- **Hardware:** NVIDIA A100 80GB SXM
+- **SFT Duration:** ~20 minutes
+- **DPO Duration:** ~17 minutes
+- **Total Cost:** ~$1.50 (RunPod)
+## Framework Versions
+- PEFT: 0.18.1
+- TRL: 0.29.0
+- Transformers: 5.2.0
+- PyTorch: 2.10.0
 ## License
+Apache 2.0
 ## Citation
 ```bibtex
 @misc{hyperllm2026,
+  title={HyperLLM: A Specialized LLM for Hyperliquid Trading},
   author={UVLabs},
   year={2026},
   url={https://huggingface.co/UVLabs/HyperLLM-4b}
 }
 ```

adapter_config.json CHANGED Viewed

@@ -1,9 +1,12 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
   "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "exclude_modules": null,
   "fan_in_fan_out": false,
@@ -20,20 +23,24 @@
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
   "r": 64,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "up_proj",
     "v_proj",
-    "k_proj",
     "gate_proj",
     "q_proj",
     "down_proj",
-    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,
   "use_dora": false,
   "use_rslora": true
 }

 {
+  "alora_invocation_tokens": null,
   "alpha_pattern": {},
+  "arrow_config": null,
   "auto_mapping": null,
   "base_model_name_or_path": "Qwen/Qwen3-4B-Instruct-2507",
   "bias": "none",
   "corda_config": null,
+  "ensure_weight_tying": false,
   "eva_config": null,
   "exclude_modules": null,
   "fan_in_fan_out": false,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
   "r": 64,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
     "gate_proj",
+    "o_proj",
     "q_proj",
+    "k_proj",
     "down_proj",
+    "up_proj"
   ],
+  "target_parameters": null,
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,
   "use_dora": false,
+  "use_qalora": false,
   "use_rslora": true
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e09b554fe2bded98e640b169e10f78a2bcb75946bdd6631f3786dde799ffb390
 size 528550256

 version https://git-lfs.github.com/spec/v1
+oid sha256:650cda8c308105a0855653408b067a03990775c015a3f1f425bbaff87c4c52b9
 size 528550256

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
-size 11422654

 version https://git-lfs.github.com/spec/v1
+oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
+size 11422650

tokenizer_config.json CHANGED Viewed

@@ -1,217 +1,11 @@
 {
-  "add_bos_token": false,
   "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151665": {
-      "content": "<tool_response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151666": {
-      "content": "</tool_response>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151667": {
-      "content": "<think>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151668": {
-      "content": "</think>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
     "<|im_start|>",
     "<|im_end|>",
     "<|object_ref_start|>",
@@ -226,11 +20,7 @@
     "<|image_pad|>",
     "<|video_pad|>"
   ],
-  "bos_token": null,
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
-  "errors": "replace",
-  "extra_special_tokens": {},
   "model_max_length": 1010000,
   "pad_token": "<|endoftext|>",
   "split_special_tokens": false,

 {
   "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
     "<|im_start|>",
     "<|im_end|>",
     "<|object_ref_start|>",
     "<|image_pad|>",
     "<|video_pad|>"
   ],
+  "is_local": false,
   "model_max_length": 1010000,
   "pad_token": "<|endoftext|>",
   "split_special_tokens": false,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7cf6dcfb7fa01a043537d453752ebabff6db298fb689b4df160bb4e3b59dd414
-size 5688

 version https://git-lfs.github.com/spec/v1
+oid sha256:f53f4121f9ec2db0158bb7463f5c20ce5cf4bca3d032b9b05ff3d04ce1ae9be6
+size 5432