FinLLM-RAG / README.md
yashgori20's picture
Update README.md
42c2248 verified

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: FinLLM RAG
emoji: 
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.40.1
app_file: app.py
pinned: false

💸 Finance Assistant

This project is a multi-functional financial assistant built with Streamlit. It leverages large language models and retrieval-augmented generation (RAG) to provide a suite of tools for financial analysis, compliance, and data retrieval.

Features

The application is divided into several key functionalities:

  • Circular Compliance Assistant: Analyzes user-provided scenarios for compliance against RBI Master Circulars on Management of Advances. It uses a FAISS vector database to retrieve relevant sections of the circular and a language model to generate a detailed compliance report.
  • Industry Classification Assistant: Suggests appropriate industry classification codes based on user-provided keywords. This feature also utilizes a RAG pipeline to search through an industry classification master document.
  • Calculation Methodology: Provides interactive calculators for key financial metrics:
    • Maximum Permissible Bank Finance (MPBF)
    • Drawing Power (DP)
  • Financial Data Assistant: Answers questions about historical (1980-2015) state-wise financial data for India. It can retrieve specific metrics for a given state and year.
  • Model 1 Chat: A general-purpose chat interface powered by the gemma2-9b-it model via the Groq API.

How It Works

The core of the "Circular Compliance" and "Industry Classification" assistants is a Retrieval-Augmented Generation (RAG) pipeline.

  1. Indexing: Source documents (Master Circular.pdf, Industry Classification Master.pdf) are chunked, and the text chunks are converted into vector embeddings using a SentenceTransformer model. These embeddings are stored in a FAISS index for efficient similarity search.
  2. Retrieval: When a user enters a query, the query is embedded, and the FAISS index is searched to find the most relevant document chunks.
  3. Generation: The retrieved chunks are passed as context, along with the user's query, to a large language model (gemma2-9b-it). The model then generates a comprehensive and context-aware response.

The "Financial Data Assistant" works by directly parsing the user's query for state, year, and metric information and looking up the corresponding data from a pre-loaded data file.

Setup and Installation

  1. Clone the Repository:

    git clone <your-repository-url>
    cd <your-repository-directory>
    
  2. Install Dependencies: Install the necessary Python libraries using the requirements.txt file.

    pip install -r requirements.txt
    
  3. Set Up Assets: The application requires pre-built FAISS indexes and data files.

    • Create a folder named assets in the root directory.
    • Generate and place the following files into the assets folder (You will need a separate script to process the source PDFs and JSON to create these files):
      • industry_index.faiss
      • industry_chunks.pkl
      • circular_index.faiss
      • circular_chunks.pkl
      • financial_index.faiss
      • financial_statements.pkl
  4. API Key: Insert your Groq API key directly into the app.py file at the following line:

    GROQ_API_KEY = "your-groq-api-key-here"
    

Usage

  1. Run the Streamlit App: Execute the following command in your terminal:

    streamlit run app.py
    
  2. Interact with the Application:

    • Open the URL provided by Streamlit (usually http://localhost:8501) in your web browser.
    • Use the radio buttons at the top of the page to navigate between the different functionalities: "Calculation Methodology", "Circular Compliance", "Industry Classification", "Model 1", and "Model 2".
    • Follow the on-screen instructions for each tool.

Dependencies

This project relies on the following major libraries:

  • streamlit: For creating the web application interface.
  • groq: The client for accessing the Groq API.
  • sentence-transformers: For generating text embeddings.
  • faiss-cpu: For efficient similarity search in the vector database.
  • pandas: For data manipulation, particularly in the financial data assistant.
  • numpy: For numerical operations.
  • torch & transformers: Core dependencies for the sentence transformer models.