SoteriaInitiative's picture
Update README.md
8725ca8 verified
---
base_model:
- swiss-ai/Apertus-8B-2509
library_name: peft
license: apache-2.0
tags:
- finance
- financialcrime
- compliance
---
# Model Card for Apertus-8B-Instruct-OFAC-FAQ
A model fined tuned for sanctions and AML related OFAC FAQ questions with the Swiss AI
Apertus 8B Instruct model which was then used as teacher and distilled to TinyLlama 1.1B. The model is 6-7 X smaller than the original. Quantization to INT8 should allow even low-memory CPU inference
deployments if model latency is not a primary concern. PEFT LoRA adapter are included for use with base model.
## Model Details
### Model Description
The model includes INT8 quantized weights for CPU inference and a LoRA adapter for GPU inference with
a matching base.
- **Developed by:** Soteria Initiative
- **Funded by:** Soteria Initiative
- **Shared by:** Soteria Initiative
- **Model type:** Text generation, LlamaForCausalLM, context length 2048
- **Language(s) (NLP):** English, Others
- **License:** Apache-2.0
- **Finetuned from model:** Apertus 8B Instruct
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://huggingface.co/SoteriaInitiative/Apertus-8B-Instruct-OFAC-FAQ
- **Demo:** _WIP_
## Uses
Use for chat or assistant applications where compliance or financial crime analysis need to
get answers regarding FATF or OFAC FAQ matters.
### Direct Use
This model can directly be used with the FCCAssistant https://github.com/SoteriaInitiative/fccassistant
once a model endpoint has been deployed.
### Out-of-Scope Use
This model is not intended for production deployment.
## Bias, Risks, and Limitations
The model is fine tuned for FATF and OFAC FAQ matters and hence should be restricted to such
use cases where this is of a concern.
### Recommendations
Perform model quality evaluation before use.
## How to Get Started with the Model
Use the Jupyter Notebook linked in the **Demo** references for a comprehensive overview.
For a quick start try:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
ADAPTER = "./peft" # or "org/repo-name" if pushed to HF
# Tokenizer (includes the chat template)
tokenizer = AutoTokenizer.from_pretrained(BASE)
# Base model (GPU, 8-bit). For CPU, remove load_in_8bit and device_map.
model = AutoModelForCausalLM.from_pretrained(
BASE,
device_map="auto",
load_in_8bit=True,
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
# Chat prompt via tokenizer's chat_template
messages = [
{"role": "system", "content": "You are a helpful assistant for sanctions/AML."},
{"role": "user", "content": "Summarize the key OFAC FAQ topics."},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.inference_mode():
out = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```
Notes:
- GPU 8-bit is shown. For CPU-only, drop load_in_8bit=True and device_map="auto", then model.to("cpu").
- If you plan to export a merged model, load the base in full precision and then model =
model.merge_and_unload() (optional, not needed for standard PEFT inference).
## Training Details
### Training Data
The following sources where used for fine tuning:
- OFAC FAQ: https://ofac.treasury.gov/faqs
- FATF Recommendations: https://www.fatf-gafi.org/content/dam/fatf-gafi/recommendations/FATF%20Recommendations%202012.pdf.coredownload.inline.pdf
### Training Procedure
Supervised fine tuning has been applied to the Apertus 8B Instruct model with a training dataset
of FAQ question/answer pairs as well as FATF titles and recommendation pairs.
## Evaluation
Model evaluation has NOT been performed yet!
- PEFT 0.13.2