xsa-dev
/

fingpt-compliance-agents

+---
+base_model: meta-llama/Llama-3.2-1B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
+- lora
+- transformers
+- financial
+- compliance
+- xbrl
+- sentiment-analysis
+- sec-filings
+---
+# FinGPT Compliance Agents
+A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing.
+## Model Details
+### Model Description
+FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at:
+- **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing
+- **Financial Q&A**: Answer questions about company filings and financial statements
+- **Sentiment Analysis**: Classify financial text sentiment with high accuracy
+- **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data
+- **Regulatory Compliance**: Handle real-time financial data retrieval and analysis
+- **Developed by:** SecureFinAI Contest 2025 - Task 2 Team
+- **Model type:** Causal Language Model with LoRA adaptation
+- **Language(s) (NLP):** English (primary), Russian (audio processing)
+- **License:** Apache 2.0
+- **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct
+### Model Sources
+- **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents)
+- **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
+- **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets
+## Uses
+### Direct Use
+This model is designed for direct use in financial compliance applications:
+- **Financial Q&A Systems**: Answer questions about company filings and financial data
+- **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment
+- **XBRL Data Processing**: Extract and analyze structured financial data
+- **Regulatory Compliance**: Process SEC filings and regulatory documents
+- **Audio Processing**: Transcribe and analyze financial audio content
+### Downstream Use
+The model can be further fine-tuned for specific financial domains:
+- **Banking Compliance**: Anti-money laundering, fraud detection
+- **Insurance**: Risk assessment, claims processing
+- **Investment Analysis**: Portfolio management, risk evaluation
+- **Regulatory Reporting**: Automated compliance reporting
+### Out-of-Scope Use
+This model should not be used for:
+- Financial advice or investment recommendations
+- Legal advice or regulatory interpretation
+- High-stakes financial decisions without human oversight
+- Non-financial compliance tasks
+## Bias, Risks, and Limitations
+### Known Limitations
+- **Model Size**: Limited to 1B parameters, may not capture complex financial relationships
+- **Training Data**: Primarily English financial data, limited multilingual support
+- **Temporal Scope**: Training data may not include recent financial events
+- **Domain Specificity**: Optimized for compliance tasks, not general financial advice
+### Recommendations
+Users should:
+- Validate model outputs with domain experts
+- Use appropriate guardrails for financial applications
+- Regularly retrain with updated financial data
+- Implement human oversight for critical decisions
+## How to Get Started with the Model
+### Basic Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+# Load the model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-3.2-1B-Instruct",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents")
+tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents")
+# Generate response
+def generate_response(prompt, max_length=512):
+    inputs = tokenizer(prompt, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=max_length,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Example usage
+prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'"
+response = generate_response(prompt)
+print(response)
+```
+### Financial Q&A
+```python
+# Financial Q&A example
+qa_prompt = """
+Question: What was the company's revenue growth in Q3 2023?
+Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B.
+Answer:
+"""
+response = generate_response(qa_prompt)
+```
+### Sentiment Analysis
+```python
+# Sentiment analysis example
+sentiment_prompt = """
+Classify the sentiment of this financial text as positive, negative, or neutral:
+"The company's stock price plummeted 20% after missing earnings expectations."
+Sentiment:
+"""
+response = generate_response(sentiment_prompt)
+```
+## Training Details
+### Training Data
+The model was trained on a diverse collection of financial datasets:
+- **FinanceBench**: 150 financial Q&A examples from SEC filings
+- **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction
+- **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset
+- **Total Training Examples**: 7,153 (5,722 train, 1,431 test)
+### Training Procedure
+#### Preprocessing
+- **Text Processing**: Standardized to conversation format with system/user/assistant roles
+- **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length
+- **Data Splitting**: 80/20 train/test split with stratified sampling
+#### Training Hyperparameters
+- **Training regime**: LoRA fine-tuning with 4-bit quantization
+- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
+- **LoRA Parameters**: r=8, alpha=16, dropout=0.1
+- **Batch Size**: 1 with gradient accumulation of 4 steps
+- **Learning Rate**: 1e-4 with linear warmup
+- **Epochs**: 1 (845 training steps)
+- **Optimizer**: AdamW
+- **Scheduler**: Linear with warmup
+#### Speeds, Sizes, Times
+- **Training Time**: ~2 hours on single GPU
+- **Model Size**: ~1.1GB (base model + LoRA weights)
+- **Inference Speed**: ~50 tokens/second on GPU
+- **Memory Usage**: ~4GB VRAM for inference
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+- **FinanceBench**: 31 financial Q&A examples
+- **XBRL Analysis**: 574 XBRL processing examples
+- **Financial Sentiment**: 826 sentiment classification examples
+- **Audio Processing**: 5 financial audio samples
+#### Metrics
+- **Accuracy**: Overall correctness across all tasks
+- **F1-Score**: Harmonic mean of precision and recall
+- **Precision**: True positives / (True positives + False positives)
+- **Recall**: True positives / (True positives + False negatives)
+### Results
+#### Financial Q&A Performance
+- **Accuracy**: 67.7% (21/31 correct)
+- **Sample Size**: 31 questions
+#### Sentiment Analysis Performance
+- **Accuracy**: 43.5% (359/826 correct)
+- **F1-Score**: 46.7%
+- **Precision**: 54.6%
+- **Recall**: 43.5%
+- **Sample Size**: 826 examples
+#### XBRL Processing Performance
+- **Tag Extraction**: 89.6% accuracy
+- **Value Extraction**: 63.6% accuracy
+- **Formula Construction**: 99.4% accuracy
+- **Formula Calculation**: 82.2% accuracy
+- **Overall XBRL**: 88.3% accuracy
+- **Sample Size**: 574 examples
+#### Overall Performance
+- **Accuracy**: 55.6%
+- **F1-Score**: 46.7%
+- **Precision**: 54.6%
+- **Recall**: 43.5%
+#### Summary
+The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data.
+## Model Examination
+### Key Strengths
+1. **XBRL Processing**: Excellent performance on structured financial data
+2. **Formula Construction**: Near-perfect accuracy (99.4%)
+3. **Financial Q&A**: Solid performance on factual questions
+4. **Efficiency**: Fast inference with 1B parameter model
+### Areas for Improvement
+1. **Sentiment Analysis**: Needs more diverse training data
+2. **Complex Reasoning**: Limited by model size for complex financial analysis
+3. **Multilingual Support**: Primarily English-focused
+## Environmental Impact
+- **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference)
+- **Hours used**: ~2 hours training
+- **Cloud Provider**: Local development
+- **Compute Region**: N/A
+- **Carbon Emitted**: Estimated <1kg CO2
+## Technical Specifications
+### Model Architecture and Objective
+- **Architecture**: Transformer-based causal language model
+- **Parameters**: 1.1B (1B base + 0.1B LoRA)
+- **Context Length**: 2048 tokens
+- **Vocabulary Size**: 128,256 tokens
+- **Objective**: Next token prediction with instruction following
+### Compute Infrastructure
+#### Hardware
+- **Training**: Single GPU (NVIDIA RTX 4090 or similar)
+- **Inference**: CPU or GPU
+#### Software
+- **Framework**: PyTorch 2.0+
+- **LoRA**: PEFT 0.17.1
+- **Transformers**: 4.44.0+
+- **Quantization**: bitsandbytes 0.41.0+
+## Citation
+**BibTeX:**
+```bibtex
+@misc{fingpt-compliance-agents2025,
+  title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance},
+  author={SecureFinAI Contest 2025 Team},
+  year={2025},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}}
+}
+```
+**APA:**
+SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents
+## Glossary
+- **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting
+- **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method
+- **SEC Filings**: Securities and Exchange Commission regulatory filings
+- **FinanceBench**: Financial question-answering benchmark dataset
+- **FPB**: Financial Phrase Bank - sentiment analysis dataset
+## Model Card Authors
+- **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team
+- **Contributors**: FinGPT development community
+- **Reviewers**: Financial compliance domain experts
+## Model Card Contact
+For questions about this model:
+- **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues)
+- **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions)
+### Framework versions
+- PEFT 0.17.1
+- Transformers 4.44.0
+- PyTorch 2.0.0
+- bitsandbytes 0.41.0