fengxb30
/

FinGPT_TaskII_Compliance

Model card Files Files and versions

FinGPT_TaskII_Compliance / README.md

fengxb30's picture

Update README.md (#1)

9d26fcc verified about 2 months ago

|

history blame contribute delete

1.94 kB

	# FinGPT Compliance Agent with RAG for XBRL Specifications

	This project demonstrates a specialized compliance agent built using a Retrieval-Augmented Generation (RAG) framework. The core Large Language Model (LLM) is TheFinAI/Fin-o1-8B, which is augmented with a custom-built knowledge base of XBRL (eXtensible Business Reporting Language) specifications.

	The agent can handle two types of queries:
	1. General Financial Questions: Answered directly by the Fin-o1-8B model use code like:

	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "TheFinAI/Fin-o1-8B"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	input_text = "What is the results of 3-5?"
	inputs = tokenizer(input_text, return_tensors="pt")

	output = model.generate(**inputs, max_new_tokens=200)
	print(tokenizer.decode(output[0], skip_special_tokens=True))

	2. XBRL-Specific Compliance Questions: Answered using the RAG pipeline, which retrieves relevant context from a local knowledge base (`xbrl_results_2_spec_filtered_reindexed.json`) before generating a response. This ensures that answers related to XBRL are accurate, detailed, and grounded in official documentation.

	##
	Project Structure

	To use this framework correctly, please organize your project files as follows. All project files, except for `inference.py` and the JSON knowledge base, should be placed inside a directory named after the model, `Fin-o1-8B`.

	- `Fin-o1-8B/`: This directory should contain the downloaded model artifacts for `TheFinAI/Fin-o1-8B`. The `transformers` library will automatically cache the model here if you specify it as the save directory.
	- `inference.py`: The main script for running the RAG-powered XBRL compliance agent.
	- `xbrl_results_2_spec_filtered_reindexed.json`: The pre-built knowledge base containing crawled data from XBRL specification websites.