Update README.md

8725ca8 verified 5 months ago

4.06 kB

	---
	base_model:
	- swiss-ai/Apertus-8B-2509
	library_name: peft
	license: apache-2.0
	tags:
	- finance
	- financialcrime
	- compliance
	---

	# Model Card for Apertus-8B-Instruct-OFAC-FAQ

	A model fined tuned for sanctions and AML related OFAC FAQ questions with the Swiss AI
	Apertus 8B Instruct model which was then used as teacher and distilled to TinyLlama 1.1B. The model is 6-7 X smaller than the original. Quantization to INT8 should allow even low-memory CPU inference
	deployments if model latency is not a primary concern. PEFT LoRA adapter are included for use with base model.


	## Model Details

	### Model Description

	The model includes INT8 quantized weights for CPU inference and a LoRA adapter for GPU inference with
	a matching base.


	- Developed by: Soteria Initiative
	- Funded by: Soteria Initiative
	- Shared by: Soteria Initiative
	- Model type: Text generation, LlamaForCausalLM, context length 2048
	- Language(s) (NLP): English, Others
	- License: Apache-2.0
	- Finetuned from model: Apertus 8B Instruct

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://huggingface.co/SoteriaInitiative/Apertus-8B-Instruct-OFAC-FAQ
	- Demo: _WIP_

	## Uses

	Use for chat or assistant applications where compliance or financial crime analysis need to
	get answers regarding FATF or OFAC FAQ matters.

	### Direct Use

	This model can directly be used with the FCCAssistant https://github.com/SoteriaInitiative/fccassistant
	once a model endpoint has been deployed.


	### Out-of-Scope Use

	This model is not intended for production deployment.

	## Bias, Risks, and Limitations

	The model is fine tuned for FATF and OFAC FAQ matters and hence should be restricted to such
	use cases where this is of a concern.


	### Recommendations

	Perform model quality evaluation before use.

	## How to Get Started with the Model

	Use the Jupyter Notebook linked in the Demo references for a comprehensive overview.

	For a quick start try:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	BASE = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
	ADAPTER = "./peft" # or "org/repo-name" if pushed to HF

	# Tokenizer (includes the chat template)
	tokenizer = AutoTokenizer.from_pretrained(BASE)

	# Base model (GPU, 8-bit). For CPU, remove load_in_8bit and device_map.
	model = AutoModelForCausalLM.from_pretrained(
	BASE,
	device_map="auto",
	load_in_8bit=True,
	)
	model = PeftModel.from_pretrained(model, ADAPTER)
	model.eval()

	# Chat prompt via tokenizer's chat_template
	messages = [
	{"role": "system", "content": "You are a helpful assistant for sanctions/AML."},
	{"role": "user", "content": "Summarize the key OFAC FAQ topics."},
	]
	inputs = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True, return_tensors="pt"
	).to(model.device)

	with torch.inference_mode():
	out = model.generate(
	inputs,
	max_new_tokens=256,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id,
	)

	print(tokenizer.decode(out[0], skip_special_tokens=True))

	```

	Notes:

	- GPU 8-bit is shown. For CPU-only, drop load_in_8bit=True and device_map="auto", then model.to("cpu").
	- If you plan to export a merged model, load the base in full precision and then model =
	model.merge_and_unload() (optional, not needed for standard PEFT inference).

	## Training Details

	### Training Data
	The following sources where used for fine tuning:

	- OFAC FAQ: https://ofac.treasury.gov/faqs
	- FATF Recommendations: https://www.fatf-gafi.org/content/dam/fatf-gafi/recommendations/FATF%20Recommendations%202012.pdf.coredownload.inline.pdf


	### Training Procedure

	Supervised fine tuning has been applied to the Apertus 8B Instruct model with a training dataset
	of FAQ question/answer pairs as well as FATF titles and recommendation pairs.



	## Evaluation

	Model evaluation has NOT been performed yet!


	- PEFT 0.13.2