Upload 10 files
Browse files- README.md +16 -17
- adapter_model.safetensors +2 -2
- training_args.bin +2 -2
README.md
CHANGED
|
@@ -14,36 +14,35 @@ tags:
|
|
| 14 |
|
| 15 |
# Llama for Finance (LoRA)
|
| 16 |
|
| 17 |
-
A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`
|
| 18 |
|
| 19 |
## Model Details
|
| 20 |
- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
|
| 21 |
- **Adapter type:** LoRA (PEFT)
|
| 22 |
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
| 23 |
- **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
|
| 24 |
-
- **Precision:** fp16 (bf16
|
| 25 |
-
- **Length bucketing:** enabled (`group_by_length=True`,
|
| 26 |
-
- **Context length:**
|
| 27 |
-
- **Language:** English (non-English
|
| 28 |
|
| 29 |
## Training Data & Filtering
|
| 30 |
- **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
|
| 31 |
-
- **Sampling caps:**
|
| 32 |
-
- **Chat formatting:** `
|
| 33 |
- **Filters:**
|
| 34 |
-
- drop rows without
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
- English-only heuristic (configurable `filter_english_only`, `min_english_ratio`)
|
| 38 |
|
| 39 |
## Training Setup
|
| 40 |
-
- **Epochs:**
|
| 41 |
-
- **Batching:** per-device
|
| 42 |
- **Optimizer:** paged_adamw_8bit
|
| 43 |
- **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
|
| 44 |
- **Regularization:** weight_decay 0.01, max_grad_norm 1.0
|
| 45 |
-
- **Eval/save:** eval_steps=50, save_steps=100
|
| 46 |
-
- **Length-aware sampler:** custom bucket sampler
|
| 47 |
|
| 48 |
## Usage
|
| 49 |
```python
|
|
@@ -56,7 +55,7 @@ adapter = "TimberGu/Llama_for_Finance"
|
|
| 56 |
|
| 57 |
tokenizer = AutoTokenizer.from_pretrained(adapter)
|
| 58 |
tokenizer.pad_token = tokenizer.eos_token
|
| 59 |
-
tokenizer.padding_side = "
|
| 60 |
|
| 61 |
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
|
| 62 |
base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
|
|
@@ -70,7 +69,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
|
|
| 70 |
```
|
| 71 |
|
| 72 |
## Evaluation
|
| 73 |
-
-
|
| 74 |
|
| 75 |
## Limitations & Risks
|
| 76 |
- Domain-focused on finance/economics; may underperform on general tasks.
|
|
|
|
| 14 |
|
| 15 |
# Llama for Finance (LoRA)
|
| 16 |
|
| 17 |
+
A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained with length-aware batching and an English-only heuristic.
|
| 18 |
|
| 19 |
## Model Details
|
| 20 |
- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct
|
| 21 |
- **Adapter type:** LoRA (PEFT)
|
| 22 |
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
| 23 |
- **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none
|
| 24 |
+
- **Precision:** fp16 (bf16 when available); gradient checkpointing on
|
| 25 |
+
- **Length bucketing:** enabled (`group_by_length=True`, boundaries 512/1024/1536/2048)
|
| 26 |
+
- **Context length:** up to 2048 tokens
|
| 27 |
+
- **Language:** English (non-English filtered via ASCII-ratio heuristic)
|
| 28 |
|
| 29 |
## Training Data & Filtering
|
| 30 |
- **Source dataset:** `Josephgflowers/Finance-Instruct-500k`
|
| 31 |
+
- **Sampling caps:** max_train_samples=25k, max_val_samples=2.5k after filtering
|
| 32 |
+
- **Chat formatting:** preformatted `text` field with system/user/assistant turns
|
| 33 |
- **Filters:**
|
| 34 |
+
- drop rows without text
|
| 35 |
+
- English-only heuristic (`min_english_ratio`≈0.85, `min_chars_for_lang_check`=40)
|
| 36 |
+
- EOS enforced at end of samples
|
|
|
|
| 37 |
|
| 38 |
## Training Setup
|
| 39 |
+
- **Epochs:** 2
|
| 40 |
+
- **Batching:** per-device train 16, grad accumulation 4 (effective 64); eval batch 8
|
| 41 |
- **Optimizer:** paged_adamw_8bit
|
| 42 |
- **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05
|
| 43 |
- **Regularization:** weight_decay 0.01, max_grad_norm 1.0
|
| 44 |
+
- **Eval/save:** eval_steps=50, save_steps=100 (load_best_model_at_end=True)
|
| 45 |
+
- **Length-aware sampler:** custom bucket sampler reduces padding waste
|
| 46 |
|
| 47 |
## Usage
|
| 48 |
```python
|
|
|
|
| 55 |
|
| 56 |
tokenizer = AutoTokenizer.from_pretrained(adapter)
|
| 57 |
tokenizer.pad_token = tokenizer.eos_token
|
| 58 |
+
tokenizer.padding_side = "right" # matches training setup
|
| 59 |
|
| 60 |
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
|
| 61 |
base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto")
|
|
|
|
| 69 |
```
|
| 70 |
|
| 71 |
## Evaluation
|
| 72 |
+
- Held-out validation (`test_results.json`): eval_loss ≈1.05 over 2 epochs. No public benchmark beyond the filtered split.
|
| 73 |
|
| 74 |
## Limitations & Risks
|
| 75 |
- Domain-focused on finance/economics; may underperform on general tasks.
|
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f1e250c9fc37f1ef99c278a376e517d65f87ba92a2c4d8275e72d16c2b8aff49
|
| 3 |
+
size 671149168
|
training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:85d222ca2fe3ff64de47b929a9048a06670381a176a331bbcf3de4cff4f64239
|
| 3 |
+
size 5905
|