| # FinGPT Compliance Agent with RAG for XBRL Specifications | |
| This project demonstrates a specialized compliance agent built using a Retrieval-Augmented Generation (RAG) framework. The core Large Language Model (LLM) is **TheFinAI/Fin-o1-8B**, which is augmented with a custom-built knowledge base of XBRL (eXtensible Business Reporting Language) specifications. | |
| The agent can handle two types of queries: | |
| 1. **General Financial Questions**: Answered directly by the Fin-o1-8B model use code like: | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "TheFinAI/Fin-o1-8B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name) | |
| input_text = "What is the results of 3-5?" | |
| inputs = tokenizer(input_text, return_tensors="pt") | |
| output = model.generate(**inputs, max_new_tokens=200) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| 2. **XBRL-Specific Compliance Questions**: Answered using the RAG pipeline, which retrieves relevant context from a local knowledge base (`xbrl_results_2_spec_filtered_reindexed.json`) before generating a response. This ensures that answers related to XBRL are accurate, detailed, and grounded in official documentation. | |
| ## | |
| Project Structure | |
| To use this framework correctly, please organize your project files as follows. All project files, except for `inference.py` and the JSON knowledge base, should be placed inside a directory named after the model, `Fin-o1-8B`. | |
| - **`Fin-o1-8B/`**: This directory should contain the downloaded model artifacts for `TheFinAI/Fin-o1-8B`. The `transformers` library will automatically cache the model here if you specify it as the save directory. | |
| - **`inference.py`**: The main script for running the RAG-powered XBRL compliance agent. | |
| - **`xbrl_results_2_spec_filtered_reindexed.json`**: The pre-built knowledge base containing crawled data from XBRL specification websites. |