fengxb30's picture
Update README.md (#1)
9d26fcc verified
# FinGPT Compliance Agent with RAG for XBRL Specifications
This project demonstrates a specialized compliance agent built using a Retrieval-Augmented Generation (RAG) framework. The core Large Language Model (LLM) is **TheFinAI/Fin-o1-8B**, which is augmented with a custom-built knowledge base of XBRL (eXtensible Business Reporting Language) specifications.
The agent can handle two types of queries:
1. **General Financial Questions**: Answered directly by the Fin-o1-8B model use code like:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheFinAI/Fin-o1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
input_text = "What is the results of 3-5?"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
2. **XBRL-Specific Compliance Questions**: Answered using the RAG pipeline, which retrieves relevant context from a local knowledge base (`xbrl_results_2_spec_filtered_reindexed.json`) before generating a response. This ensures that answers related to XBRL are accurate, detailed, and grounded in official documentation.
##
Project Structure
To use this framework correctly, please organize your project files as follows. All project files, except for `inference.py` and the JSON knowledge base, should be placed inside a directory named after the model, `Fin-o1-8B`.
- **`Fin-o1-8B/`**: This directory should contain the downloaded model artifacts for `TheFinAI/Fin-o1-8B`. The `transformers` library will automatically cache the model here if you specify it as the save directory.
- **`inference.py`**: The main script for running the RAG-powered XBRL compliance agent.
- **`xbrl_results_2_spec_filtered_reindexed.json`**: The pre-built knowledge base containing crawled data from XBRL specification websites.