--- base_model: meta-llama/Llama-3.2-1B-Instruct library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:meta-llama/Llama-3.2-1B-Instruct - lora - transformers - financial - compliance - xbrl - sentiment-analysis - sec-filings --- # FinGPT Compliance Agents A specialized language model for financial compliance and regulatory tasks, fine-tuned on SEC filings analysis, regulatory compliance, sentiment analysis, and XBRL data processing. ## Model Details ### Model Description FinGPT Compliance Agents is a LoRA fine-tuned version of Llama-3.2-1B-Instruct, specifically designed for financial compliance and regulatory tasks. The model excels at: - **SEC Filings Analysis**: Extract insights from SEC filings and XBRL data processing - **Financial Q&A**: Answer questions about company filings and financial statements - **Sentiment Analysis**: Classify financial text sentiment with high accuracy - **XBRL Processing**: Extract tags, values, and construct formulas from XBRL data - **Regulatory Compliance**: Handle real-time financial data retrieval and analysis - **Developed by:** SecureFinAI Contest 2025 - Task 2 Team - **Model type:** Causal Language Model with LoRA adaptation - **Language(s) (NLP):** English (primary), Russian (audio processing) - **License:** Apache 2.0 - **Finetuned from model:** meta-llama/Llama-3.2-1B-Instruct ### Model Sources - **Repository:** [GitHub Repository](https://github.com/your-repo/fingpt-compliance-agents) - **Base Model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) - **Training Data:** FinanceBench, XBRL Analysis, Financial Sentiment datasets ## Uses ### Direct Use This model is designed for direct use in financial compliance applications: - **Financial Q&A Systems**: Answer questions about company filings and financial data - **Sentiment Analysis**: Classify financial news, earnings calls, and market sentiment - **XBRL Data Processing**: Extract and analyze structured financial data - **Regulatory Compliance**: Process SEC filings and regulatory documents - **Audio Processing**: Transcribe and analyze financial audio content ### Downstream Use The model can be further fine-tuned for specific financial domains: - **Banking Compliance**: Anti-money laundering, fraud detection - **Insurance**: Risk assessment, claims processing - **Investment Analysis**: Portfolio management, risk evaluation - **Regulatory Reporting**: Automated compliance reporting ### Out-of-Scope Use This model should not be used for: - Financial advice or investment recommendations - Legal advice or regulatory interpretation - High-stakes financial decisions without human oversight - Non-financial compliance tasks ## Bias, Risks, and Limitations ### Known Limitations - **Model Size**: Limited to 1B parameters, may not capture complex financial relationships - **Training Data**: Primarily English financial data, limited multilingual support - **Temporal Scope**: Training data may not include recent financial events - **Domain Specificity**: Optimized for compliance tasks, not general financial advice ### Recommendations Users should: - Validate model outputs with domain experts - Use appropriate guardrails for financial applications - Regularly retrain with updated financial data - Implement human oversight for critical decisions ## How to Get Started with the Model ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Load the model base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-1B-Instruct", torch_dtype=torch.float16, device_map="auto" ) model = PeftModel.from_pretrained(base_model, "QXPS/fingpt-compliance-agents") tokenizer = AutoTokenizer.from_pretrained("QXPS/fingpt-compliance-agents") # Generate response def generate_response(prompt, max_length=512): inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_length, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example usage prompt = "Analyze the sentiment of this financial news: 'Company X reported strong quarterly earnings with 15% revenue growth.'" response = generate_response(prompt) print(response) ``` ### Financial Q&A ```python # Financial Q&A example qa_prompt = """ Question: What was the company's revenue growth in Q3 2023? Context: The company reported Q3 2023 revenue of $2.5B, up 15% from Q3 2022 revenue of $2.17B. Answer: """ response = generate_response(qa_prompt) ``` ### Sentiment Analysis ```python # Sentiment analysis example sentiment_prompt = """ Classify the sentiment of this financial text as positive, negative, or neutral: "The company's stock price plummeted 20% after missing earnings expectations." Sentiment: """ response = generate_response(sentiment_prompt) ``` ## Training Details ### Training Data The model was trained on a diverse collection of financial datasets: - **FinanceBench**: 150 financial Q&A examples from SEC filings - **XBRL Analysis**: 574 examples of XBRL tag extraction, value extraction, and formula construction - **Financial Sentiment**: 826 examples from FPB (Financial Phrase Bank) dataset - **Total Training Examples**: 7,153 (5,722 train, 1,431 test) ### Training Procedure #### Preprocessing - **Text Processing**: Standardized to conversation format with system/user/assistant roles - **Tokenization**: Using Llama-3.2 tokenizer with 2048 max length - **Data Splitting**: 80/20 train/test split with stratified sampling #### Training Hyperparameters - **Training regime**: LoRA fine-tuning with 4-bit quantization - **Base Model**: meta-llama/Llama-3.2-1B-Instruct - **LoRA Parameters**: r=8, alpha=16, dropout=0.1 - **Batch Size**: 1 with gradient accumulation of 4 steps - **Learning Rate**: 1e-4 with linear warmup - **Epochs**: 1 (845 training steps) - **Optimizer**: AdamW - **Scheduler**: Linear with warmup #### Speeds, Sizes, Times - **Training Time**: ~2 hours on single GPU - **Model Size**: ~1.1GB (base model + LoRA weights) - **Inference Speed**: ~50 tokens/second on GPU - **Memory Usage**: ~4GB VRAM for inference ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - **FinanceBench**: 31 financial Q&A examples - **XBRL Analysis**: 574 XBRL processing examples - **Financial Sentiment**: 826 sentiment classification examples - **Audio Processing**: 5 financial audio samples #### Metrics - **Accuracy**: Overall correctness across all tasks - **F1-Score**: Harmonic mean of precision and recall - **Precision**: True positives / (True positives + False positives) - **Recall**: True positives / (True positives + False negatives) ### Results #### Financial Q&A Performance - **Accuracy**: 67.7% (21/31 correct) - **Sample Size**: 31 questions #### Sentiment Analysis Performance - **Accuracy**: 43.5% (359/826 correct) - **F1-Score**: 46.7% - **Precision**: 54.6% - **Recall**: 43.5% - **Sample Size**: 826 examples #### XBRL Processing Performance - **Tag Extraction**: 89.6% accuracy - **Value Extraction**: 63.6% accuracy - **Formula Construction**: 99.4% accuracy - **Formula Calculation**: 82.2% accuracy - **Overall XBRL**: 88.3% accuracy - **Sample Size**: 574 examples #### Overall Performance - **Accuracy**: 55.6% - **F1-Score**: 46.7% - **Precision**: 54.6% - **Recall**: 43.5% #### Summary The model shows strong performance in XBRL processing tasks (88.3% accuracy) and moderate performance in financial Q&A (67.7% accuracy). Sentiment analysis performance is lower (43.5%) but shows room for improvement with additional training data. ## Model Examination ### Key Strengths 1. **XBRL Processing**: Excellent performance on structured financial data 2. **Formula Construction**: Near-perfect accuracy (99.4%) 3. **Financial Q&A**: Solid performance on factual questions 4. **Efficiency**: Fast inference with 1B parameter model ### Areas for Improvement 1. **Sentiment Analysis**: Needs more diverse training data 2. **Complex Reasoning**: Limited by model size for complex financial analysis 3. **Multilingual Support**: Primarily English-focused ## Environmental Impact - **Hardware Type**: NVIDIA GPU (training), CPU/GPU (inference) - **Hours used**: ~2 hours training - **Cloud Provider**: Local development - **Compute Region**: N/A - **Carbon Emitted**: Estimated <1kg CO2 ## Technical Specifications ### Model Architecture and Objective - **Architecture**: Transformer-based causal language model - **Parameters**: 1.1B (1B base + 0.1B LoRA) - **Context Length**: 2048 tokens - **Vocabulary Size**: 128,256 tokens - **Objective**: Next token prediction with instruction following ### Compute Infrastructure #### Hardware - **Training**: Single GPU (NVIDIA RTX 4090 or similar) - **Inference**: CPU or GPU #### Software - **Framework**: PyTorch 2.0+ - **LoRA**: PEFT 0.17.1 - **Transformers**: 4.44.0+ - **Quantization**: bitsandbytes 0.41.0+ ## Citation **BibTeX:** ```bibtex @misc{fingpt-compliance-agents2025, title={FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance}, author={SecureFinAI Contest 2025 Team}, year={2025}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/QXPS/fingpt-compliance-agents}} } ``` **APA:** SecureFinAI Contest 2025 Team. (2025). FinGPT Compliance Agents: A Specialized Language Model for Financial Compliance. Hugging Face. https://huggingface.co/QXPS/fingpt-compliance-agents ## Glossary - **XBRL**: eXtensible Business Reporting Language - XML-based standard for financial reporting - **LoRA**: Low-Rank Adaptation - Parameter-efficient fine-tuning method - **SEC Filings**: Securities and Exchange Commission regulatory filings - **FinanceBench**: Financial question-answering benchmark dataset - **FPB**: Financial Phrase Bank - sentiment analysis dataset ## Model Card Authors - **Primary Authors**: SecureFinAI Contest 2025 - Task 2 Team - **Contributors**: FinGPT development community - **Reviewers**: Financial compliance domain experts ## Model Card Contact For questions about this model: - **GitHub Issues**: [Repository Issues](https://github.com/your-repo/fingpt-compliance-agents/issues) - **Hugging Face**: [Model Discussion](https://huggingface.co/QXPS/fingpt-compliance-agents/discussions) ### Framework versions - PEFT 0.17.1 - Transformers 4.44.0 - PyTorch 2.0.0 - bitsandbytes 0.41.0