File size: 10,642 Bytes
6b2608a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
---
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
- lora
- transformers
- financial
- compliance
- xbrl
- sentiment-analysis
- sec-filings
---

# FinGPT Compliance Agents

A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing.

## Model Details

### Model Description

FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at:

- **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing
- **Financial Q&A**: Answer questions about company filings and financial statements
- **Sentiment Analysis**: Classify financial text sentiment with high accuracy
- **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data
- **Regulatory Compliance**: Handle real-time financial data retrieval and analysis

- **Developed by:** SecureFinAI Contest 2025 - Task 2 Team
- **Model type:** Causal Language Model with LoRA adaptation
- **Language(s) (NLP):** English (primary), Russian (audio processing)
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct

### Model Sources

- **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents)
- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets

## Uses

### Direct Use

This model is designed for direct use in financial compliance applications:

- **Financial Q&A Systems**: Answer questions about company filings and financial data
- **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment
- **XBRL Data Processing**: Extract and analyze structured financial data
- **Regulatory Compliance**: Process SEC filings and regulatory documents
- **Audio Processing**: Transcribe and analyze financial audio content

### Downstream Use

The model can be further fine-tuned for specific financial domains:

- **Banking Compliance**: Anti-money laundering, fraud detection
- **Insurance**: Risk assessment, claims processing
- **Investment Analysis**: Portfolio management, risk evaluation
- **Regulatory Reporting**: Automated compliance reporting

### Out-of-Scope Use

This model should not be used for:

- Financial advice or investment recommendations
- Legal advice or regulatory interpretation
- High-stakes financial decisions without human oversight
- Non-financial compliance tasks

## Bias, Risks, and Limitations

### Known Limitations

- **Model Size**: Limited to 1B parameters, may not capture complex financial relationships
- **Training Data**: Primarily English financial data, limited multilingual support
- **Temporal Scope**: Training data may not include recent financial events
- **Domain Specificity**: Optimized for compliance tasks, not general financial advice

### Recommendations

Users should:

- Validate model outputs with domain experts
- Use appropriate guardrails for financial applications
- Regularly retrain with updated financial data
- Implement human oversight for critical decisions

## How to Get Started with the Model

### Basic Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load the model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents")
tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents")

# Generate response
def generate_response(prompt, max_length=512):
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'"
response = generate_response(prompt)
print(response)
```

### Financial Q&A

```python
# Financial Q&A example
qa_prompt = """
Question: What was the company's revenue growth in Q3 2023?
Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B.
Answer:
"""
response = generate_response(qa_prompt)
```

### Sentiment Analysis

```python
# Sentiment analysis example
sentiment_prompt = """
Classify the sentiment of this financial text as positive, negative, or neutral:
"The company's stock price plummeted 20% after missing earnings expectations."
Sentiment:
"""
response = generate_response(sentiment_prompt)
```

## Training Details

### Training Data

The model was trained on a diverse collection of financial datasets:

- **FinanceBench**: 150 financial Q&A examples from SEC filings
- **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction
- **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset
- **Total Training Examples**: 7,153 (5,722 train, 1,431 test)

### Training Procedure

#### Preprocessing

- **Text Processing**: Standardized to conversation format with system/user/assistant roles
- **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length
- **Data Splitting**: 80/20 train/test split with stratified sampling

#### Training Hyperparameters

- **Training regime**: LoRA fine-tuning with 4-bit quantization
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **LoRA Parameters**: r=8, alpha=16, dropout=0.1
- **Batch Size**: 1 with gradient accumulation of 4 steps
- **Learning Rate**: 1e-4 with linear warmup
- **Epochs**: 1 (845 training steps)
- **Optimizer**: AdamW
- **Scheduler**: Linear with warmup

#### Speeds, Sizes, Times

- **Training Time**: ~2 hours on single GPU
- **Model Size**: ~1.1GB (base model + LoRA weights)
- **Inference Speed**: ~50 tokens/second on GPU
- **Memory Usage**: ~4GB VRAM for inference

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

- **FinanceBench**: 31 financial Q&A examples
- **XBRL Analysis**: 574 XBRL processing examples
- **Financial Sentiment**: 826 sentiment classification examples
- **Audio Processing**: 5 financial audio samples

#### Metrics

- **Accuracy**: Overall correctness across all tasks
- **F1-Score**: Harmonic mean of precision and recall
- **Precision**: True positives / (True positives + False positives)
- **Recall**: True positives / (True positives + False negatives)

### Results

#### Financial Q&A Performance
- **Accuracy**: 67.7% (21/31 correct)
- **Sample Size**: 31 questions

#### Sentiment Analysis Performance
- **Accuracy**: 43.5% (359/826 correct)
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
- **Sample Size**: 826 examples

#### XBRL Processing Performance
- **Tag Extraction**: 89.6% accuracy
- **Value Extraction**: 63.6% accuracy
- **Formula Construction**: 99.4% accuracy
- **Formula Calculation**: 82.2% accuracy
- **Overall XBRL**: 88.3% accuracy
- **Sample Size**: 574 examples

#### Overall Performance
- **Accuracy**: 55.6%
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%

#### Summary

The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data.

## Model Examination

### Key Strengths

1. **XBRL Processing**: Excellent performance on structured financial data
2. **Formula Construction**: Near-perfect accuracy (99.4%)
3. **Financial Q&A**: Solid performance on factual questions
4. **Efficiency**: Fast inference with 1B parameter model

### Areas for Improvement

1. **Sentiment Analysis**: Needs more diverse training data
2. **Complex Reasoning**: Limited by model size for complex financial analysis
3. **Multilingual Support**: Primarily English-focused

## Environmental Impact

- **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference)
- **Hours used**: ~2 hours training
- **Cloud Provider**: Local development
- **Compute Region**: N/A
- **Carbon Emitted**: Estimated <1kg CO2

## Technical Specifications

### Model Architecture and Objective

- **Architecture**: Transformer-based causal language model
- **Parameters**: 1.1B (1B base + 0.1B LoRA)
- **Context Length**: 2048 tokens
- **Vocabulary Size**: 128,256 tokens
- **Objective**: Next token prediction with instruction following

### Compute Infrastructure

#### Hardware
- **Training**: Single GPU (NVIDIA RTX 4090 or similar)
- **Inference**: CPU or GPU

#### Software
- **Framework**: PyTorch 2.0+
- **LoRA**: PEFT 0.17.1
- **Transformers**: 4.44.0+
- **Quantization**: bitsandbytes 0.41.0+

## Citation

**BibTeX:**
```bibtex
@misc{fingpt-compliance-agents2025,
  title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance},
  author={SecureFinAI Contest 2025 Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}}
}
```

**APA:**
SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents

## Glossary

- **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting
- **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method
- **SEC Filings**: Securities and Exchange Commission regulatory filings
- **FinanceBench**: Financial question-answering benchmark dataset
- **FPB**: Financial Phrase Bank - sentiment analysis dataset

## Model Card Authors

- **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team
- **Contributors**: FinGPT development community
- **Reviewers**: Financial compliance domain experts

## Model Card Contact

For questions about this model:
- **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues)
- **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions)

### Framework versions

- PEFT 0.17.1
- Transformers 4.44.0
- PyTorch 2.0.0
- bitsandbytes 0.41.0