mshauri-fedha / README.md
teofizzy's picture
Clean up user prompt formatting in README
a599ec9 unverified
---
title: Mshauri Fedha
emoji: 🦁
colorFrom: green
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
---
# Mshauri Fedha 🇰🇪
**An AI-Powered Financial Research Assistant for the Kenyan Economy.**
Mshauri Fedha ("Financial Advisor") is a Proof of Concept (PoC) AI agent designed to provide accurate, data-driven insights into the Kenyan financial landscape. Unlike standard chatbots that hallucinate numbers, Mshauri Fedha uses a **Dual-Brain Architecture** to separate exact statistical retrieval from qualitative analysis.
## The Architecture
This project solves the "Broken Table Problem" and the "Hallucination Problem" by splitting the AI's cognition into two distinct systems managed by a Supervisor Agent:
### 1. The Left Brain (Structured / SQL)
* **Role:** The Mathematician.
* **Source:** A SQLite database (`mshauri_fedha_v6.db`) containing rigorous data from **KNBS (Kenya National Bureau of Statistics)** and **CBK (Central Bank of Kenya)**.
* **Capability:** Executes precise SQL queries to answer questions like *"What was the exact inflation rate in 2023?"* or *"Calculate the average tea export value for the last 5 years."*
* **Tech:** LangChain SQL Toolkit.
### 2. The Right Brain (Unstructured / Vector RAG)
* **Role:** The Analyst.
* **Source:** A Vector Database (`ChromaDB`) containing:
* **Official Reports:** Full Markdown conversions of KNBS Economic Surveys and CBK reports (tables preserved).
* **Market News:** Cleaned and deduplicated business news articles (stripped of URLs and noise).
* **Capability:** Performs Semantic Search to answer questions like *"Why did the shilling depreciate?"* or *"What is the sentiment regarding the new Finance Bill?"*
* **Tech:** `nomic-embed-text` embeddings, ChromaDB.
### 3. The Supervisor (ReAct Agent)
* **Role:** The Manager.
* **Logic:** A custom-built **Zero-Dependency ReAct Agent** (Python-based) that analyzes user intent and routes the query to the correct "Brain"—or both—to synthesize a comprehensive answer.
---
## Technology Stack
* **LLM Inference:** [Ollama](https://ollama.com/) (Local)
* **Model:** `qwen2.5:14b` (Chosen for high reasoning capability)
* **Embeddings:** `nomic-embed-text` (High performance for long documents)
* **Orchestration:** Python (Custom ReAct Implementation) & LangChain Community
* **Vector Store:** ChromaDB
* **Database:** SQLite
---
## Setup & Installation
### 1. Prerequisites
* Python 3.10+
* [Ollama](https://ollama.com/) installed and running.
### 2. Install Dependencies
```bash
pip install pandas langchain-ollama langchain-community langchain-chroma chromadb duckduckgo-search tqdm
```
### 3. Pull Required Models
Ensure your local Ollama instance has the "Brain" and "Eyes" installed:
```bash
ollama pull qwen2.5:14b
ollama pull nomic-embed-text
```
### 4. Configuration
Ensure your Ollama server is running.
## Data Ingestion (Building the Brains)
Before the agent can work, you must populate its knowledge base
### Step 1: Ingest News (Right Brain)
Parses CSVs and chunks text.
```bash
python src/load/ingest_news.py
```
### Step 2: Ingest Reports (Right Brain)
Loads Markdown files (converted from PDFs via Marker), preserving table structures for the AI to read.
```bash
python src/load/ingest_md.py
```
### Step 3: Verify SQL (Left Brain)
Ensure `mshauri_fedha_v6.db exists`
## Usage
You can interact with Mshauri Fedha directly via the modular agent script or within a Jupyter Notebook.
**Running via Python Script**
```python
from mshauri_demo import create_mshauri_agent, ask_mshauri
# Initialize the Supervisor
agent = create_mshauri_agent()
# Ask a Hybrid Question (Uses both SQL and Vector)
ask_mshauri(agent, "What is the inflation rate in 2023 and why is it rising?")
```
**Sample Output**
```plaintext
User: What is the inflation rate in 2023 and why is it rising?
----------------------------------------
Starting Agent Loop...
Step 1: Thought: The user is asking for a specific number (inflation rate) and a reason (why).
I should first check the SQL database for the exact rate, then check the reports for the reasons.
Action: sql_db_query
Action Input: SELECT rate FROM inflation WHERE year = 2023
Step 2: Observation: [(7.7,)]
Thought: I have the rate (7.7%). Now I need the reasons.
Action: search_financial_reports_and_news
Action Input: reasons for high inflation 2023 Kenya
Step 3: Observation: ...Reports mention high fuel prices, depreciation of the shilling...
...
Mshauri: The inflation rate in 2023 was 7.7%. This rise was primarily driven by increased fuel costs and the depreciation of the Kenyan Shilling against major currencies, which raised the cost of imports [Source: KNBS/News].
```
## Future Work
Chainlit UI: Deploy a chat interface for easier interaction.
GraphRAG: Implement knowledge graphs to better link entities (e.g., specific politicians to policies).
Live Scraping: Automate the news ingestion to run daily.
-----
**Status**: Prototype (Proof of concept)
**Author** Teofilo Ligawa