File size: 10,642 Bytes

6b2608a

---
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
- lora
- transformers
- financial
- compliance
- xbrl
- sentiment-analysis
- sec-filings
---

# FinGPT Compliance Agents

A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing.

## Model Details

### Model Description

FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at:

- **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing
- **Financial Q&A**: Answer questions about company filings and financial statements
- **Sentiment Analysis**: Classify financial text sentiment with high accuracy
- **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data
- **Regulatory Compliance**: Handle real-time financial data retrieval and analysis

- **Developed by:** SecureFinAI Contest 2025 - Task 2 Team
- **Model type:** Causal Language Model with LoRA adaptation
- **Language(s) (NLP):** English (primary), Russian (audio processing)
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct

### Model Sources

- **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents)
- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets

## Uses

### Direct Use

This model is designed for direct use in financial compliance applications:

- **Financial Q&A Systems**: Answer questions about company filings and financial data
- **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment
- **XBRL Data Processing**: Extract and analyze structured financial data
- **Regulatory Compliance**: Process SEC filings and regulatory documents
- **Audio Processing**: Transcribe and analyze financial audio content

### Downstream Use

The model can be further fine-tuned for specific financial domains:

- **Banking Compliance**: Anti-money laundering, fraud detection
- **Insurance**: Risk assessment, claims processing
- **Investment Analysis**: Portfolio management, risk evaluation
- **Regulatory Reporting**: Automated compliance reporting

### Out-of-Scope Use

This model should not be used for:

- Financial advice or investment recommendations
- Legal advice or regulatory interpretation
- High-stakes financial decisions without human oversight
- Non-financial compliance tasks

## Bias, Risks, and Limitations

### Known Limitations

- **Model Size**: Limited to 1B parameters, may not capture complex financial relationships
- **Training Data**: Primarily English financial data, limited multilingual support
- **Temporal Scope**: Training data may not include recent financial events
- **Domain Specificity**: Optimized for compliance tasks, not general financial advice

### Recommendations

Users should:

- Validate model outputs with domain experts
- Use appropriate guardrails for financial applications
- Regularly retrain with updated financial data
- Implement human oversight for critical decisions

## How to Get Started with the Model

### Basic Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load the model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents")
tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents")

# Generate response
def generate_response(prompt, max_length=512):
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'"
response = generate_response(prompt)
print(response)
```

### Financial Q&A

```python
# Financial Q&A example
qa_prompt = """
Question: What was the company's revenue growth in Q3 2023?
Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B.
Answer:
"""
response = generate_response(qa_prompt)
```

### Sentiment Analysis

```python
# Sentiment analysis example
sentiment_prompt = """
Classify the sentiment of this financial text as positive, negative, or neutral:
"The company's stock price plummeted 20% after missing earnings expectations."
Sentiment:
"""
response = generate_response(sentiment_prompt)
```

## Training Details

### Training Data

The model was trained on a diverse collection of financial datasets:

- **FinanceBench**: 150 financial Q&A examples from SEC filings
- **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction
- **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset
- **Total Training Examples**: 7,153 (5,722 train, 1,431 test)

### Training Procedure

#### Preprocessing

- **Text Processing**: Standardized to conversation format with system/user/assistant roles
- **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length
- **Data Splitting**: 80/20 train/test split with stratified sampling

#### Training Hyperparameters

- **Training regime**: LoRA fine-tuning with 4-bit quantization
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **LoRA Parameters**: r=8, alpha=16, dropout=0.1
- **Batch Size**: 1 with gradient accumulation of 4 steps
- **Learning Rate**: 1e-4 with linear warmup
- **Epochs**: 1 (845 training steps)
- **Optimizer**: AdamW
- **Scheduler**: Linear with warmup

#### Speeds, Sizes, Times

- **Training Time**: ~2 hours on single GPU
- **Model Size**: ~1.1GB (base model + LoRA weights)
- **Inference Speed**: ~50 tokens/second on GPU
- **Memory Usage**: ~4GB VRAM for inference

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

- **FinanceBench**: 31 financial Q&A examples
- **XBRL Analysis**: 574 XBRL processing examples
- **Financial Sentiment**: 826 sentiment classification examples
- **Audio Processing**: 5 financial audio samples

#### Metrics

- **Accuracy**: Overall correctness across all tasks
- **F1-Score**: Harmonic mean of precision and recall
- **Precision**: True positives / (True positives + False positives)
- **Recall**: True positives / (True positives + False negatives)

### Results

#### Financial Q&A Performance
- **Accuracy**: 67.7% (21/31 correct)
- **Sample Size**: 31 questions

#### Sentiment Analysis Performance
- **Accuracy**: 43.5% (359/826 correct)
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
- **Sample Size**: 826 examples

#### XBRL Processing Performance
- **Tag Extraction**: 89.6% accuracy
- **Value Extraction**: 63.6% accuracy
- **Formula Construction**: 99.4% accuracy
- **Formula Calculation**: 82.2% accuracy
- **Overall XBRL**: 88.3% accuracy
- **Sample Size**: 574 examples

#### Overall Performance
- **Accuracy**: 55.6%
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%

#### Summary

The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data.

## Model Examination

### Key Strengths

1. **XBRL Processing**: Excellent performance on structured financial data
2. **Formula Construction**: Near-perfect accuracy (99.4%)
3. **Financial Q&A**: Solid performance on factual questions
4. **Efficiency**: Fast inference with 1B parameter model

### Areas for Improvement

1. **Sentiment Analysis**: Needs more diverse training data
2. **Complex Reasoning**: Limited by model size for complex financial analysis
3. **Multilingual Support**: Primarily English-focused

## Environmental Impact

- **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference)
- **Hours used**: ~2 hours training
- **Cloud Provider**: Local development
- **Compute Region**: N/A
- **Carbon Emitted**: Estimated <1kg CO2

## Technical Specifications

### Model Architecture and Objective

- **Architecture**: Transformer-based causal language model
- **Parameters**: 1.1B (1B base + 0.1B LoRA)
- **Context Length**: 2048 tokens
- **Vocabulary Size**: 128,256 tokens
- **Objective**: Next token prediction with instruction following

### Compute Infrastructure

#### Hardware
- **Training**: Single GPU (NVIDIA RTX 4090 or similar)
- **Inference**: CPU or GPU

#### Software
- **Framework**: PyTorch 2.0+
- **LoRA**: PEFT 0.17.1
- **Transformers**: 4.44.0+
- **Quantization**: bitsandbytes 0.41.0+

## Citation

**BibTeX:**
```bibtex
@misc{fingpt-compliance-agents2025,
  title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance},
  author={SecureFinAI Contest 2025 Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}}
}
```

**APA:**
SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents

## Glossary

- **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting
- **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method
- **SEC Filings**: Securities and Exchange Commission regulatory filings
- **FinanceBench**: Financial question-answering benchmark dataset
- **FPB**: Financial Phrase Bank - sentiment analysis dataset

## Model Card Authors

- **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team
- **Contributors**: FinGPT development community
- **Reviewers**: Financial compliance domain experts

## Model Card Contact

For questions about this model:
- **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues)
- **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions)

### Framework versions

- PEFT 0.17.1
- Transformers 4.44.0
- PyTorch 2.0.0
- bitsandbytes 0.41.0