Spaces:

yashgori20
/

FinLLM-RAG

Sleeping

App Files Files Community

FinLLM-RAG / README.md

yashgori20

Update README.md

42c2248 verified 10 months ago

preview code

raw

history blame contribute delete

4.51 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

metadata

title: FinLLM RAG
emoji: ⚡
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.40.1
app_file: app.py
pinned: false

💸 Finance Assistant

This project is a multi-functional financial assistant built with Streamlit. It leverages large language models and retrieval-augmented generation (RAG) to provide a suite of tools for financial analysis, compliance, and data retrieval.

Features

The application is divided into several key functionalities:

Circular Compliance Assistant: Analyzes user-provided scenarios for compliance against RBI Master Circulars on Management of Advances. It uses a FAISS vector database to retrieve relevant sections of the circular and a language model to generate a detailed compliance report.
Industry Classification Assistant: Suggests appropriate industry classification codes based on user-provided keywords. This feature also utilizes a RAG pipeline to search through an industry classification master document.
Calculation Methodology: Provides interactive calculators for key financial metrics:
- Maximum Permissible Bank Finance (MPBF)
- Drawing Power (DP)
Financial Data Assistant: Answers questions about historical (1980-2015) state-wise financial data for India. It can retrieve specific metrics for a given state and year.
Model 1 Chat: A general-purpose chat interface powered by the gemma2-9b-it model via the Groq API.

How It Works

The core of the "Circular Compliance" and "Industry Classification" assistants is a Retrieval-Augmented Generation (RAG) pipeline.

Indexing: Source documents (Master Circular.pdf, Industry Classification Master.pdf) are chunked, and the text chunks are converted into vector embeddings using a SentenceTransformer model. These embeddings are stored in a FAISS index for efficient similarity search.
Retrieval: When a user enters a query, the query is embedded, and the FAISS index is searched to find the most relevant document chunks.
Generation: The retrieved chunks are passed as context, along with the user's query, to a large language model (gemma2-9b-it). The model then generates a comprehensive and context-aware response.

The "Financial Data Assistant" works by directly parsing the user's query for state, year, and metric information and looking up the corresponding data from a pre-loaded data file.

Setup and Installation

Clone the Repository:

git clone <your-repository-url>
cd <your-repository-directory>

Install Dependencies: Install the necessary Python libraries using the requirements.txt file.
```
pip install -r requirements.txt
```
Set Up Assets: The application requires pre-built FAISS indexes and data files.
- Create a folder named assets in the root directory.
- Generate and place the following files into the assets folder (You will need a separate script to process the source PDFs and JSON to create these files):
  - industry_index.faiss
  - industry_chunks.pkl
  - circular_index.faiss
  - circular_chunks.pkl
  - financial_index.faiss
  - financial_statements.pkl
API Key: Insert your Groq API key directly into the app.py file at the following line:
```
GROQ_API_KEY = "your-groq-api-key-here"
```

Usage

Run the Streamlit App: Execute the following command in your terminal:
```
streamlit run app.py
```
Interact with the Application:
- Open the URL provided by Streamlit (usually http://localhost:8501) in your web browser.
- Use the radio buttons at the top of the page to navigate between the different functionalities: "Calculation Methodology", "Circular Compliance", "Industry Classification", "Model 1", and "Model 2".
- Follow the on-screen instructions for each tool.

Dependencies

This project relies on the following major libraries:

streamlit: For creating the web application interface.
groq: The client for accessing the Groq API.
sentence-transformers: For generating text embeddings.
faiss-cpu: For efficient similarity search in the vector database.
pandas: For data manipulation, particularly in the financial data assistant.
numpy: For numerical operations.
torch & transformers: Core dependencies for the sentence transformer models.