xsa-dev's picture
Upload README.md with huggingface_hub
6b2608a verified
---
base_model: meta-llama/Llama-3.2-1B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
- lora
- transformers
- financial
- compliance
- xbrl
- sentiment-analysis
- sec-filings
---
# FinGPT Compliance Agents
A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing.
## Model Details
### Model Description
FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at:
- **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing
- **Financial Q&A**: Answer questions about company filings and financial statements
- **Sentiment Analysis**: Classify financial text sentiment with high accuracy
- **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data
- **Regulatory Compliance**: Handle real-time financial data retrieval and analysis
- **Developed by:** SecureFinAI Contest 2025 - Task 2 Team
- **Model type:** Causal Language Model with LoRA adaptation
- **Language(s) (NLP):** English (primary), Russian (audio processing)
- **License:** Apache 2.0
- **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct
### Model Sources
- **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents)
- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets
## Uses
### Direct Use
This model is designed for direct use in financial compliance applications:
- **Financial Q&A Systems**: Answer questions about company filings and financial data
- **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment
- **XBRL Data Processing**: Extract and analyze structured financial data
- **Regulatory Compliance**: Process SEC filings and regulatory documents
- **Audio Processing**: Transcribe and analyze financial audio content
### Downstream Use
The model can be further fine-tuned for specific financial domains:
- **Banking Compliance**: Anti-money laundering, fraud detection
- **Insurance**: Risk assessment, claims processing
- **Investment Analysis**: Portfolio management, risk evaluation
- **Regulatory Reporting**: Automated compliance reporting
### Out-of-Scope Use
This model should not be used for:
- Financial advice or investment recommendations
- Legal advice or regulatory interpretation
- High-stakes financial decisions without human oversight
- Non-financial compliance tasks
## Bias, Risks, and Limitations
### Known Limitations
- **Model Size**: Limited to 1B parameters, may not capture complex financial relationships
- **Training Data**: Primarily English financial data, limited multilingual support
- **Temporal Scope**: Training data may not include recent financial events
- **Domain Specificity**: Optimized for compliance tasks, not general financial advice
### Recommendations
Users should:
- Validate model outputs with domain experts
- Use appropriate guardrails for financial applications
- Regularly retrain with updated financial data
- Implement human oversight for critical decisions
## How to Get Started with the Model
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load the model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents")
tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents")
# Generate response
def generate_response(prompt, max_length=512):
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'"
response = generate_response(prompt)
print(response)
```
### Financial Q&A
```python
# Financial Q&A example
qa_prompt = """
Question: What was the company's revenue growth in Q3 2023?
Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B.
Answer:
"""
response = generate_response(qa_prompt)
```
### Sentiment Analysis
```python
# Sentiment analysis example
sentiment_prompt = """
Classify the sentiment of this financial text as positive, negative, or neutral:
"The company's stock price plummeted 20% after missing earnings expectations."
Sentiment:
"""
response = generate_response(sentiment_prompt)
```
## Training Details
### Training Data
The model was trained on a diverse collection of financial datasets:
- **FinanceBench**: 150 financial Q&A examples from SEC filings
- **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction
- **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset
- **Total Training Examples**: 7,153 (5,722 train, 1,431 test)
### Training Procedure
#### Preprocessing
- **Text Processing**: Standardized to conversation format with system/user/assistant roles
- **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length
- **Data Splitting**: 80/20 train/test split with stratified sampling
#### Training Hyperparameters
- **Training regime**: LoRA fine-tuning with 4-bit quantization
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **LoRA Parameters**: r=8, alpha=16, dropout=0.1
- **Batch Size**: 1 with gradient accumulation of 4 steps
- **Learning Rate**: 1e-4 with linear warmup
- **Epochs**: 1 (845 training steps)
- **Optimizer**: AdamW
- **Scheduler**: Linear with warmup
#### Speeds, Sizes, Times
- **Training Time**: ~2 hours on single GPU
- **Model Size**: ~1.1GB (base model + LoRA weights)
- **Inference Speed**: ~50 tokens/second on GPU
- **Memory Usage**: ~4GB VRAM for inference
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- **FinanceBench**: 31 financial Q&A examples
- **XBRL Analysis**: 574 XBRL processing examples
- **Financial Sentiment**: 826 sentiment classification examples
- **Audio Processing**: 5 financial audio samples
#### Metrics
- **Accuracy**: Overall correctness across all tasks
- **F1-Score**: Harmonic mean of precision and recall
- **Precision**: True positives / (True positives + False positives)
- **Recall**: True positives / (True positives + False negatives)
### Results
#### Financial Q&A Performance
- **Accuracy**: 67.7% (21/31 correct)
- **Sample Size**: 31 questions
#### Sentiment Analysis Performance
- **Accuracy**: 43.5% (359/826 correct)
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
- **Sample Size**: 826 examples
#### XBRL Processing Performance
- **Tag Extraction**: 89.6% accuracy
- **Value Extraction**: 63.6% accuracy
- **Formula Construction**: 99.4% accuracy
- **Formula Calculation**: 82.2% accuracy
- **Overall XBRL**: 88.3% accuracy
- **Sample Size**: 574 examples
#### Overall Performance
- **Accuracy**: 55.6%
- **F1-Score**: 46.7%
- **Precision**: 54.6%
- **Recall**: 43.5%
#### Summary
The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data.
## Model Examination
### Key Strengths
1. **XBRL Processing**: Excellent performance on structured financial data
2. **Formula Construction**: Near-perfect accuracy (99.4%)
3. **Financial Q&A**: Solid performance on factual questions
4. **Efficiency**: Fast inference with 1B parameter model
### Areas for Improvement
1. **Sentiment Analysis**: Needs more diverse training data
2. **Complex Reasoning**: Limited by model size for complex financial analysis
3. **Multilingual Support**: Primarily English-focused
## Environmental Impact
- **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference)
- **Hours used**: ~2 hours training
- **Cloud Provider**: Local development
- **Compute Region**: N/A
- **Carbon Emitted**: Estimated <1kg CO2
## Technical Specifications
### Model Architecture and Objective
- **Architecture**: Transformer-based causal language model
- **Parameters**: 1.1B (1B base + 0.1B LoRA)
- **Context Length**: 2048 tokens
- **Vocabulary Size**: 128,256 tokens
- **Objective**: Next token prediction with instruction following
### Compute Infrastructure
#### Hardware
- **Training**: Single GPU (NVIDIA RTX 4090 or similar)
- **Inference**: CPU or GPU
#### Software
- **Framework**: PyTorch 2.0+
- **LoRA**: PEFT 0.17.1
- **Transformers**: 4.44.0+
- **Quantization**: bitsandbytes 0.41.0+
## Citation
**BibTeX:**
```bibtex
@misc{fingpt-compliance-agents2025,
title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance},
author={SecureFinAI Contest 2025 Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}}
}
```
**APA:**
SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents
## Glossary
- **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting
- **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method
- **SEC Filings**: Securities and Exchange Commission regulatory filings
- **FinanceBench**: Financial question-answering benchmark dataset
- **FPB**: Financial Phrase Bank - sentiment analysis dataset
## Model Card Authors
- **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team
- **Contributors**: FinGPT development community
- **Reviewers**: Financial compliance domain experts
## Model Card Contact
For questions about this model:
- **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues)
- **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions)
### Framework versions
- PEFT 0.17.1
- Transformers 4.44.0
- PyTorch 2.0.0
- bitsandbytes 0.41.0