File size: 1,936 Bytes
2543635
 
 
 
 
 
9d26fcc
2543635
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# FinGPT Compliance Agent with RAG for XBRL Specifications

This project demonstrates a specialized compliance agent built using a Retrieval-Augmented Generation (RAG) framework. The core Large Language Model (LLM) is **TheFinAI/Fin-o1-8B**, which is augmented with a custom-built knowledge base of XBRL (eXtensible Business Reporting Language) specifications.

The agent can handle two types of queries:
1.  **General Financial Questions**: Answered directly by the Fin-o1-8B model use code like:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TheFinAI/Fin-o1-8B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What is the results of 3-5?"
inputs = tokenizer(input_text, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))

2.  **XBRL-Specific Compliance Questions**: Answered using the RAG pipeline, which retrieves relevant context from a local knowledge base (`xbrl_results_2_spec_filtered_reindexed.json`) before generating a response. This ensures that answers related to XBRL are accurate, detailed, and grounded in official documentation.

## 
Project Structure

To use this framework correctly, please organize your project files as follows. All project files, except for `inference.py` and the JSON knowledge base, should be placed inside a directory named after the model, `Fin-o1-8B`.

- **`Fin-o1-8B/`**: This directory should contain the downloaded model artifacts for `TheFinAI/Fin-o1-8B`. The `transformers` library will automatically cache the model here if you specify it as the save directory.
- **`inference.py`**: The main script for running the RAG-powered XBRL compliance agent.
- **`xbrl_results_2_spec_filtered_reindexed.json`**: The pre-built knowledge base containing crawled data from XBRL specification websites.