File size: 10,642 Bytes
6b2608a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | ---
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
- lora
- transformers
- financial
- compliance
- xbrl
- sentiment-analysis
- sec-filings
---
# FinGPT Compliance Agents
A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing.
## Model Details
### Model Description
FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at:
- **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing
- **Financial Q&A**: Answer questions about company filings and financial statements
- **Sentiment Analysis**: Classify financial text sentiment with high accuracy
- **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data
- **Regulatory Compliance**: Handle real-time financial data retrieval and analysis
- **Developed by:** SecureFinAI Contest 2025 - Task 2 Team
- **Model type:** Causal Language Model with LoRA adaptation
- **Language(s) (NLP):** English (primary), Russian (audio processing)
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct
### Model Sources
- **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents)
- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets
## Uses
### Direct Use
This model is designed for direct use in financial compliance applications:
- **Financial Q&A Systems**: Answer questions about company filings and financial data
- **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment
- **XBRL Data Processing**: Extract and analyze structured financial data
- **Regulatory Compliance**: Process SEC filings and regulatory documents
- **Audio Processing**: Transcribe and analyze financial audio content
### Downstream Use
The model can be further fine-tuned for specific financial domains:
- **Banking Compliance**: Anti-money laundering, fraud detection
- **Insurance**: Risk assessment, claims processing
- **Investment Analysis**: Portfolio management, risk evaluation
- **Regulatory Reporting**: Automated compliance reporting
### Out-of-Scope Use
This model should not be used for:
- Financial advice or investment recommendations
- Legal advice or regulatory interpretation
- High-stakes financial decisions without human oversight
- Non-financial compliance tasks
## Bias, Risks, and Limitations
### Known Limitations
- **Model Size**: Limited to 1B parameters, may not capture complex financial relationships
- **Training Data**: Primarily English financial data, limited multilingual support
- **Temporal Scope**: Training data may not include recent financial events
- **Domain Specificity**: Optimized for compliance tasks, not general financial advice
### Recommendations
Users should:
- Validate model outputs with domain experts
- Use appropriate guardrails for financial applications
- Regularly retrain with updated financial data
- Implement human oversight for critical decisions
## How to Get Started with the Model
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load the model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents")
tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents")
# Generate response
def generate_response(prompt, max_length=512):
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'"
response = generate_response(prompt)
print(response)
```
### Financial Q&A
```python
# Financial Q&A example
qa_prompt = """
Question: What was the company's revenue growth in Q3 2023?
Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B.
Answer:
"""
response = generate_response(qa_prompt)
```
### Sentiment Analysis
```python
# Sentiment analysis example
sentiment_prompt = """
Classify the sentiment of this financial text as positive, negative, or neutral:
"The company's stock price plummeted 20% after missing earnings expectations."
Sentiment:
"""
response = generate_response(sentiment_prompt)
```
## Training Details
### Training Data
The model was trained on a diverse collection of financial datasets:
- **FinanceBench**: 150 financial Q&A examples from SEC filings
- **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction
- **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset
- **Total Training Examples**: 7,153 (5,722 train, 1,431 test)
### Training Procedure
#### Preprocessing
- **Text Processing**: Standardized to conversation format with system/user/assistant roles
- **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length
- **Data Splitting**: 80/20 train/test split with stratified sampling
#### Training Hyperparameters
- **Training regime**: LoRA fine-tuning with 4-bit quantization
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **LoRA Parameters**: r=8, alpha=16, dropout=0.1
- **Batch Size**: 1 with gradient accumulation of 4 steps
- **Learning Rate**: 1e-4 with linear warmup
- **Epochs**: 1 (845 training steps)
- **Optimizer**: AdamW
- **Scheduler**: Linear with warmup
#### Speeds, Sizes, Times
- **Training Time**: ~2 hours on single GPU
- **Model Size**: ~1.1GB (base model + LoRA weights)
- **Inference Speed**: ~50 tokens/second on GPU
- **Memory Usage**: ~4GB VRAM for inference
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- **FinanceBench**: 31 financial Q&A examples
- **XBRL Analysis**: 574 XBRL processing examples
- **Financial Sentiment**: 826 sentiment classification examples
- **Audio Processing**: 5 financial audio samples
#### Metrics
- **Accuracy**: Overall correctness across all tasks
- **F1-Score**: Harmonic mean of precision and recall
- **Precision**: True positives / (True positives + False positives)
- **Recall**: True positives / (True positives + False negatives)
### Results
#### Financial Q&A Performance
- **Accuracy**: 67.7% (21/31 correct)
- **Sample Size**: 31 questions
#### Sentiment Analysis Performance
- **Accuracy**: 43.5% (359/826 correct)
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
- **Sample Size**: 826 examples
#### XBRL Processing Performance
- **Tag Extraction**: 89.6% accuracy
- **Value Extraction**: 63.6% accuracy
- **Formula Construction**: 99.4% accuracy
- **Formula Calculation**: 82.2% accuracy
- **Overall XBRL**: 88.3% accuracy
- **Sample Size**: 574 examples
#### Overall Performance
- **Accuracy**: 55.6%
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
#### Summary
The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data.
## Model Examination
### Key Strengths
1. **XBRL Processing**: Excellent performance on structured financial data
2. **Formula Construction**: Near-perfect accuracy (99.4%)
3. **Financial Q&A**: Solid performance on factual questions
4. **Efficiency**: Fast inference with 1B parameter model
### Areas for Improvement
1. **Sentiment Analysis**: Needs more diverse training data
2. **Complex Reasoning**: Limited by model size for complex financial analysis
3. **Multilingual Support**: Primarily English-focused
## Environmental Impact
- **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference)
- **Hours used**: ~2 hours training
- **Cloud Provider**: Local development
- **Compute Region**: N/A
- **Carbon Emitted**: Estimated <1kg CO2
## Technical Specifications
### Model Architecture and Objective
- **Architecture**: Transformer-based causal language model
- **Parameters**: 1.1B (1B base + 0.1B LoRA)
- **Context Length**: 2048 tokens
- **Vocabulary Size**: 128,256 tokens
- **Objective**: Next token prediction with instruction following
### Compute Infrastructure
#### Hardware
- **Training**: Single GPU (NVIDIA RTX 4090 or similar)
- **Inference**: CPU or GPU
#### Software
- **Framework**: PyTorch 2.0+
- **LoRA**: PEFT 0.17.1
- **Transformers**: 4.44.0+
- **Quantization**: bitsandbytes 0.41.0+
## Citation
**BibTeX:**
```bibtex
@misc{fingpt-compliance-agents2025,
title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance},
author={SecureFinAI Contest 2025 Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}}
}
```
**APA:**
SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents
## Glossary
- **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting
- **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method
- **SEC Filings**: Securities and Exchange Commission regulatory filings
- **FinanceBench**: Financial question-answering benchmark dataset
- **FPB**: Financial Phrase Bank - sentiment analysis dataset
## Model Card Authors
- **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team
- **Contributors**: FinGPT development community
- **Reviewers**: Financial compliance domain experts
## Model Card Contact
For questions about this model:
- **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues)
- **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions)
### Framework versions
- PEFT 0.17.1
- Transformers 4.44.0
- PyTorch 2.0.0
- bitsandbytes 0.41.0 |