TimberGu
/

Llama_for_Finance

@@ -14,36 +14,35 @@ tags:
 # Llama for Finance (LoRA)
-A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`. Trained on a filtered subset of Finance-Instruct-500k with English-only enforcement and length-aware batching to reduce padding waste.
 ## Model Details
 - **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
 - **Adapter type:** LoRA (PEFT)
 - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
 - **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
-- **Precision:** fp16 (bf16 if available) with gradient checkpointing
-- **Length bucketing:** enabled (`group_by_length=True`, custom bucket boundaries)
-- **Context length:** adaptively capped (up to 2048 in this run)
-- **Language:** English (non-English texts filtered via ASCII ratio heuristic)
 ## Training Data & Filtering
 - **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
-- **Sampling caps:** 40k train / 4k validation (post-filtering counts may be lower)
-- **Chat formatting:** `apply_chat_template` for system/user/assistant turns
 - **Filters:**
-  - drop rows without user/assistant text
-  - truncate to max_length (adaptive)
-  - minimum length (≥30 tokens)
-  - English-only heuristic (configurable `filter_english_only`, `min_english_ratio`)
 ## Training Setup
-- **Epochs:** 5
-- **Batching:** per-device batch 16, grad accumulation 4 (effective 64)
 - **Optimizer:** paged_adamw_8bit
 - **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
 - **Regularization:** weight_decay 0.01, max_grad_norm 1.0
-- **Eval/save:** eval_steps=50, save_steps=100, load_best_model_at_end=True
-- **Length-aware sampler:** custom bucket sampler to reduce padding
 ## Usage
 ```python
@@ -56,7 +55,7 @@ adapter = "TimberGu/Llama_for_Finance"
 tokenizer = AutoTokenizer.from_pretrained(adapter)
 tokenizer.pad_token = tokenizer.eos_token
-tokenizer.padding_side = "left"
 dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
 base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
@@ -70,7 +69,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
 ```
 ## Evaluation
-- See `test_results.json` for the held-out validation metrics produced after training. (No public benchmark beyond the split provided in Finance-Instruct-500k.)
 ## Limitations & Risks
 - Domain-focused on finance/economics; may underperform on general tasks.

 # Llama for Finance (LoRA)
+A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained with length-aware batching and an English-only heuristic.
 ## Model Details
 - **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
 - **Adapter type:** LoRA (PEFT)
 - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
 - **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
+- **Precision:** fp16 (bf16 when available); gradient checkpointing on
+- **Length bucketing:** enabled (`group_by_length=True`, boundaries 512/1024/1536/2048)
+- **Context length:** up to 2048 tokens
+- **Language:** English (non-English filtered via ASCII-ratio heuristic)
 ## Training Data & Filtering
 - **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
+- **Sampling caps:** max_train_samples=25k, max_val_samples=2.5k after filtering
+- **Chat formatting:** preformatted `text` field with system/user/assistant turns
 - **Filters:**
+  - drop rows without text
+  - English-only heuristic (`min_english_ratio`≈0.85, `min_chars_for_lang_check`=40)
+  - EOS enforced at end of samples
 ## Training Setup
+- **Epochs:** 2
+- **Batching:** per-device train 16, grad accumulation 4 (effective 64); eval batch 8
 - **Optimizer:** paged_adamw_8bit
 - **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
 - **Regularization:** weight_decay 0.01, max_grad_norm 1.0
+- **Eval/save:** eval_steps=50, save_steps=100 (load_best_model_at_end=True)
+- **Length-aware sampler:** custom bucket sampler reduces padding waste
 ## Usage
 ```python
 tokenizer = AutoTokenizer.from_pretrained(adapter)
 tokenizer.pad_token = tokenizer.eos_token
+tokenizer.padding_side = "right"  # matches training setup
 dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
 base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
 ```
 ## Evaluation
+- Held-out validation (`test_results.json`): eval_loss ≈1.05 over 2 epochs. No public benchmark beyond the filtered split.
 ## Limitations & Risks
 - Domain-focused on finance/economics; may underperform on general tasks.

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:514660419b63f1e3441974ff1333e8e5ebd527c76f57cea27804c5cb6f7f2c9c
-size 134

 version https://git-lfs.github.com/spec/v1
+oid sha256:f1e250c9fc37f1ef99c278a376e517d65f87ba92a2c4d8275e72d16c2b8aff49
+size 671149168

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d784636fdb3a5cdb109dea866892821408d3647e3be2c651c8d0a7a9bf0317e5
-size 129

 version https://git-lfs.github.com/spec/v1
+oid sha256:85d222ca2fe3ff64de47b929a9048a06670381a176a331bbcf3de4cff4f64239
+size 5905