|
|
--- |
|
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- lora |
|
|
- finance |
|
|
- instruction-tuning |
|
|
- english |
|
|
- transformers |
|
|
- adapter |
|
|
--- |
|
|
|
|
|
# Llama for Finance (LoRA) |
|
|
|
|
|
A financial-domain instruction-tuned LoRA adapter for `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained with length-aware batching and an English-only heuristic. |
|
|
|
|
|
## Model Details |
|
|
- **Base model:** meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
- **Adapter type:** LoRA (PEFT) |
|
|
- **Target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
- **LoRA hyperparams:** r=64, alpha=128, dropout=0.1, bias=none |
|
|
- **Precision:** fp16 (bf16 when available); gradient checkpointing on |
|
|
- **Length bucketing:** enabled (`group_by_length=True`, boundaries 512/1024/1536/2048) |
|
|
- **Context length:** up to 2048 tokens |
|
|
- **Language:** English (non-English filtered via ASCII-ratio heuristic) |
|
|
|
|
|
## Training Data & Filtering |
|
|
- **Source dataset:** `Josephgflowers/Finance-Instruct-500k` |
|
|
- **Sampling caps:** max_train_samples=25k, max_val_samples=2.5k after filtering |
|
|
- **Chat formatting:** preformatted `text` field with system/user/assistant turns |
|
|
- **Filters:** |
|
|
- drop rows without text |
|
|
- English-only heuristic (`min_english_ratio`≈0.85, `min_chars_for_lang_check`=40) |
|
|
- EOS enforced at end of samples |
|
|
|
|
|
## Training Setup |
|
|
- **Epochs:** 2 |
|
|
- **Batching:** per-device train 16, grad accumulation 4 (effective 64); eval batch 8 |
|
|
- **Optimizer:** paged_adamw_8bit |
|
|
- **LR / schedule:** 1e-4, cosine, warmup_ratio 0.05 |
|
|
- **Regularization:** weight_decay 0.01, max_grad_norm 1.0 |
|
|
- **Eval/save:** eval_steps=50, save_steps=100 (load_best_model_at_end=True) |
|
|
- **Length-aware sampler:** custom bucket sampler reduces padding waste |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
base = "meta-llama/Meta-Llama-3.1-8B-Instruct" |
|
|
adapter = "TimberGu/Llama_for_Finance" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(adapter) |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
tokenizer.padding_side = "right" # matches training setup |
|
|
|
|
|
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16 |
|
|
base_model = AutoModelForCausalLM.from_pretrained(base, dtype=dtype, device_map="auto") |
|
|
model = PeftModel.from_pretrained(base_model, adapter) |
|
|
model.eval() |
|
|
|
|
|
prompt = "Explain what a yield curve inversion implies for equities." |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
out = model.generate(**inputs, max_new_tokens=256, temperature=0.8, top_p=0.9) |
|
|
print(tokenizer.decode(out[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Evaluation |
|
|
- Held-out validation (`eval_50_gpt_judged_raw.jsonl`): eval_loss ≈1.05 over 2 epochs. No public benchmark beyond the filtered split. |
|
|
|
|
|
## Limitations & Risks |
|
|
- Domain-focused on finance/economics; may underperform on general tasks. |
|
|
- English-centric; non-English input was filtered during training. |
|
|
- Hallucinations remain possible—do not use for financial advice without human review. |
|
|
|
|
|
## Files |
|
|
- `adapter_model.safetensors`, `adapter_config.json`: LoRA weights/config |
|
|
- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`, `chat_template.jinja` |
|
|
- `training_config.json`, `training_args.bin`, `test_results.json` |