TimberGu commited on
Commit
66fa74c
·
verified ·
1 Parent(s): 031c0ce

Upload 10 files

Browse files
Files changed (3) hide show
  1. README.md +16 -17
  2. adapter_model.safetensors +2 -2
  3. training_args.bin +2 -2
README.md CHANGED
@@ -14,36 +14,35 @@ tags:
14
 
15
  # Llama for Finance (LoRA)
16
 
17
- A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`. Trained on a filtered subset of Finance-Instruct-500k with English-only enforcement and length-aware batching to reduce padding waste.
18
 
19
  ## Model Details
20
  - **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
21
  - **Adapter type:** LoRA (PEFT)
22
  - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
23
  - **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
24
- - **Precision:** fp16 (bf16 if available) with gradient checkpointing
25
- - **Length bucketing:** enabled (`group_by_length=True`, custom bucket boundaries)
26
- - **Context length:** adaptively capped (up to 2048 in this run)
27
- - **Language:** English (non-English texts filtered via ASCII ratio heuristic)
28
 
29
  ## Training Data & Filtering
30
  - **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
31
- - **Sampling caps:** 40k train / 4k validation (post-filtering counts may be lower)
32
- - **Chat formatting:** `apply_chat_template` for system/user/assistant turns
33
  - **Filters:**
34
- - drop rows without user/assistant text
35
- - truncate to max_length (adaptive)
36
- - minimum length (≥30 tokens)
37
- - English-only heuristic (configurable `filter_english_only`, `min_english_ratio`)
38
 
39
  ## Training Setup
40
- - **Epochs:** 5
41
- - **Batching:** per-device batch 16, grad accumulation 4 (effective 64)
42
  - **Optimizer:** paged_adamw_8bit
43
  - **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
44
  - **Regularization:** weight_decay 0.01, max_grad_norm 1.0
45
- - **Eval/save:** eval_steps=50, save_steps=100, load_best_model_at_end=True
46
- - **Length-aware sampler:** custom bucket sampler to reduce padding
47
 
48
  ## Usage
49
  ```python
@@ -56,7 +55,7 @@ adapter = "TimberGu/Llama_for_Finance"
56
 
57
  tokenizer = AutoTokenizer.from_pretrained(adapter)
58
  tokenizer.pad_token = tokenizer.eos_token
59
- tokenizer.padding_side = "left"
60
 
61
  dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
62
  base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
@@ -70,7 +69,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
70
  ```
71
 
72
  ## Evaluation
73
- - See `test_results.json` for the held-out validation metrics produced after training. (No public benchmark beyond the split provided in Finance-Instruct-500k.)
74
 
75
  ## Limitations & Risks
76
  - Domain-focused on finance/economics; may underperform on general tasks.
 
14
 
15
  # Llama for Finance (LoRA)
16
 
17
+ A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained with length-aware batching and an English-only heuristic.
18
 
19
  ## Model Details
20
  - **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
21
  - **Adapter type:** LoRA (PEFT)
22
  - **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
23
  - **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
24
+ - **Precision:** fp16 (bf16 when available); gradient checkpointing on
25
+ - **Length bucketing:** enabled (`group_by_length=True`, boundaries 512/1024/1536/2048)
26
+ - **Context length:** up to 2048 tokens
27
+ - **Language:** English (non-English filtered via ASCII-ratio heuristic)
28
 
29
  ## Training Data & Filtering
30
  - **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
31
+ - **Sampling caps:** max_train_samples=25k, max_val_samples=2.5k after filtering
32
+ - **Chat formatting:** preformatted `text` field with system/user/assistant turns
33
  - **Filters:**
34
+ - drop rows without text
35
+ - English-only heuristic (`min_english_ratio`≈0.85, `min_chars_for_lang_check`=40)
36
+ - EOS enforced at end of samples
 
37
 
38
  ## Training Setup
39
+ - **Epochs:** 2
40
+ - **Batching:** per-device train 16, grad accumulation 4 (effective 64); eval batch 8
41
  - **Optimizer:** paged_adamw_8bit
42
  - **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
43
  - **Regularization:** weight_decay 0.01, max_grad_norm 1.0
44
+ - **Eval/save:** eval_steps=50, save_steps=100 (load_best_model_at_end=True)
45
+ - **Length-aware sampler:** custom bucket sampler reduces padding waste
46
 
47
  ## Usage
48
  ```python
 
55
 
56
  tokenizer = AutoTokenizer.from_pretrained(adapter)
57
  tokenizer.pad_token = tokenizer.eos_token
58
+ tokenizer.padding_side = "right" # matches training setup
59
 
60
  dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
61
  base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
 
69
  ```
70
 
71
  ## Evaluation
72
+ - Held-out validation (`test_results.json`): eval_loss ≈1.05 over 2 epochs. No public benchmark beyond the filtered split.
73
 
74
  ## Limitations & Risks
75
  - Domain-focused on finance/economics; may underperform on general tasks.
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:514660419b63f1e3441974ff1333e8e5ebd527c76f57cea27804c5cb6f7f2c9c
3
- size 134
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e250c9fc37f1ef99c278a376e517d65f87ba92a2c4d8275e72d16c2b8aff49
3
+ size 671149168
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d784636fdb3a5cdb109dea866892821408d3647e3be2c651c8d0a7a9bf0317e5
3
- size 129
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85d222ca2fe3ff64de47b929a9048a06670381a176a331bbcf3de4cff4f64239
3
+ size 5905