FinGPT Compliance Agent with RAG for XBRL Specifications

This project demonstrates a specialized compliance agent built using a Retrieval-Augmented Generation (RAG) framework. The core Large Language Model (LLM) is TheFinAI/Fin-o1-8B, which is augmented with a custom-built knowledge base of XBRL (eXtensible Business Reporting Language) specifications.

The agent can handle two types of queries:

General Financial Questions: Answered directly by the Fin-o1-8B model use code like:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TheFinAI/Fin-o1-8B"

tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What is the results of 3-5?" inputs = tokenizer(input_text, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(output[0], skip_special_tokens=True))

XBRL-Specific Compliance Questions: Answered using the RAG pipeline, which retrieves relevant context from a local knowledge base (xbrl_results_2_spec_filtered_reindexed.json) before generating a response. This ensures that answers related to XBRL are accurate, detailed, and grounded in official documentation.

Project Structure

To use this framework correctly, please organize your project files as follows. All project files, except for inference.py and the JSON knowledge base, should be placed inside a directory named after the model, Fin-o1-8B.

Fin-o1-8B/: This directory should contain the downloaded model artifacts for TheFinAI/Fin-o1-8B. The transformers library will automatically cache the model here if you specify it as the save directory.
inference.py: The main script for running the RAG-powered XBRL compliance agent.
xbrl_results_2_spec_filtered_reindexed.json: The pre-built knowledge base containing crawled data from XBRL specification websites.