KPatelis commited on
Commit
d0ee835
ยท
verified ยท
1 Parent(s): 40b30b8

Upload 2 files

Browse files
Files changed (2) hide show
  1. .gitignore +18 -0
  2. README.md +115 -15
.gitignore CHANGED
@@ -1 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  .env
 
 
 
 
 
 
1
+ # Agent files
2
+ CLAUDE.md
3
+
4
+ # Python-generated files
5
+ __pycache__/
6
+ *.py[oc]
7
+ build/
8
+ dist/
9
+ wheels/
10
+ *.egg-info
11
+
12
+ # Virtual environments
13
+ .venv
14
  .env
15
+
16
+ # Test Files
17
+ test*
18
+
19
+ models/*
README.md CHANGED
@@ -1,15 +1,115 @@
1
- ---
2
- title: Template Final Assignment
3
- emoji: ๐Ÿ•ต๐Ÿปโ€โ™‚๏ธ
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.25.2
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
- hf_oauth_expiration_minutes: 480
13
- ---
14
-
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ๐ŸŒ GAIA Benchmark Agent
2
+
3
+ **An Advanced Multi-Modal AI Agent designed to solve complex, real-world reasoning tasks.**
4
+
5
+ ![Python](https://img.shields.io/badge/Python-3.13-blue.svg)
6
+ ![LangGraph](https://img.shields.io/badge/LangGraph-Orchestration-orange.svg)
7
+ ![HuggingFace](https://img.shields.io/badge/HuggingFace-Models-yellow.svg)
8
+ ![Supabase](https://img.shields.io/badge/Supabase-Vector_Store-green.svg)
9
+
10
+ > [!NOTE]
11
+ > This project was developed as part of the **Hugging Face Agents Course (Unit 4: GAIA)**.
12
+
13
+ ## ๐Ÿ“– Overview
14
+
15
+ This project implements a sophisticated autonomous agent capable of solving General AI Assistants (GAIA) benchmark problems. These problems require multi-step reasoning, tool usage, and the ability to process diverse file types (documents, spreadsheets, audio, images, code).
16
+
17
+ The agent leverages **LangGraph** for orchestration, allowing it to maintain state, plan its actions, and iteratively refine its answers. It integrates with **Hugging Face** for powerful LLM inference (`Qwen/Qwen3-32B-Instruct`) and **Supabase** for Retrieval-Augmented Generation (RAG) to learn from similar past examples.
18
+
19
+ ## ๐Ÿš€ Key Features
20
+
21
+ - **๐Ÿง  Advanced Reasoning Loop**: Uses a "Plan-Execute-Observe-Refine" Chain-of-Thought approach to tackle complex questions.
22
+ - **๐Ÿ“‚ Multi-Modal File Processing**: Native support for analyzing a wide range of files:
23
+ - **Documents**: PDF, Word (`.docx`), PowerPoint (`.pptx`), Text
24
+ - **Data**: Excel (`.xlsx`), CSV (`.csv`), JSON-LD, PDB (Protein Data Bank)
25
+ - **Media**: Audio transcription (Whisper), Intelligent Image Analysis (`Qwen3-VL`)
26
+ - **Code**: Python source code reading
27
+ - **Archives**: ZIP extraction and inspection
28
+ - **๐Ÿ” Intelligent Information Retrieval**:
29
+ - **RAG**: Finds similar solved questions in a vector database to guide complex reasoning.
30
+ - **Web Search**: DuckDuckGo, Tavily, Wikipedia, and ArXiv for real-time information.
31
+ - **๐Ÿ› ๏ธ Extensible Tool Suite**: Modular design allows easy addition of new capabilities.
32
+
33
+ ## ๐Ÿ—๏ธ Architecture
34
+
35
+ The agent operates on a graph-based workflow defined in `agent.py`:
36
+
37
+ ```mermaid
38
+ graph TD
39
+ START --> Retriever["Retriever Node<br/>(Hybrid: Vector + BM25 + RRF)"]
40
+ Retriever --> Reranker["Reranker Node<br/>(ModernBERT Cross-Encoder)"]
41
+ Reranker --> Processor["Processor Node<br/>(Qwen 3 32B)"]
42
+ Processor -->|Decide Tool| Condition{"Requires Tool?"}
43
+ Condition -->|Yes| Tools["Tool Node<br/>(Execute Actions)"]
44
+ Condition -->|No| END
45
+ Tools --> Processor
46
+ ```
47
+
48
+ 1. **Retriever Node (Hybrid)**:
49
+ - **Vector Search**: Finds semantically similar questions in Supabase (using [`Alibaba-NLP/gte-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) embeddings).
50
+ - **BM25 Search**: Finds keyword-based matches in the local `metadata.jsonl` corpus using [`bm25s`](https://github.com/xhluca/bm25s).
51
+ - **RRF Fusion**: Combines results from both methods using Reciprocal Rank Fusion.
52
+ 2. **Reranker Node**: Uses a ModernBERT Cross-Encoder ([`Alibaba-NLP/gte-reranker-modernbert-base`](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base)) to select the top 3 most relevant examples.
53
+ 3. **Processor Node**: The core brain (Qwen 3 32B). It decides whether to answer directly or use a tool.
54
+ 4. **Tool Node**: Executes the requested tool (e.g., `read_excel`, `duck_web_search`) and returns results.
55
+
56
+ ## ๐Ÿ› ๏ธ Tools & Stack
57
+
58
+ | Category | Tools / Libraries | Purpose |
59
+ | :--- | :--- | :--- |
60
+ | **Orchestration** | `langgraph`, `langchain` | State management and graph flow control. |
61
+ | **LLM Inference** | `huggingface_hub` | Inference via `Qwen/Qwen3-32B-Instruct` and `Qwen/Qwen3-VL-32B-Instruct`. |
62
+ | **Vector Store** | `supabase`, `sentence-transformers` | Storing and retrieving semantic embeddings. |
63
+ | **Data Processing** | `polars`, `biopython` | High-performance data manipulation. |
64
+ | **Documents** | `pypdf`, `python-docx`, `python-pptx` | Extracting text from office documents. |
65
+ | **Media** | `transformers` (Whisper), `pillow` | Audio transcription and image handling. |
66
+ | **Web** | `duckduckgo-search`, `tavily-python` | Internet research. |
67
+
68
+ ## ๐Ÿ’ป Installation & Setup
69
+
70
+ 1. **Clone the repository**:
71
+
72
+ ```bash
73
+ git clone <repo_url>
74
+ cd <repo_name>
75
+ ```
76
+
77
+ 2. **Install dependencies**:
78
+
79
+ ```bash
80
+ pip install -r requirements.txt
81
+ # OR using uv
82
+ uv sync
83
+ ```
84
+
85
+ 3. **Configure Environment**:
86
+ Create a `.env` file in the root directory with the following keys:
87
+
88
+ ```ini
89
+ HF_INFERENCE_KEY=hf_... # Hugging Face Inference Token
90
+ SUPABASE_URL=... # Supabase Project URL
91
+ SUPABASE_SERVICE_KEY=... # Supabase Service Role Key
92
+ TAVILY_API_KEY=tvly-... # Tavily Search API Key (Optional)
93
+ ```
94
+
95
+ ## ๐ŸŽฎ Usage
96
+
97
+ Start the Gradio interface to interact with the agent:
98
+
99
+ ```bash
100
+ python app.py
101
+ ```
102
+
103
+ This will launch a web interface where you can:
104
+
105
+ 1. **Log in** with your Hugging Face account.
106
+ 2. **Run Evaluation**: Automatically fetch GAIA questions, execute the agent, and submit answers for scoring.
107
+ 3. **Inspect Results**: View the agent's reasoning, tool outputs, and final answers in real-time.
108
+
109
+ ## ๐Ÿ“‚ Project Structure
110
+
111
+ - `agent.py`: Core logic defining the LangGraph workflow and state.
112
+ - `app.py`: Gradio application for the user interface and evaluation runner.
113
+ - `tools.py`: Consolidated file containing all agent tools (web, math, file processing, VLM).
114
+ - `prompts/`: Directory containing system prompts (`prompt.yaml`, `vlm_prompt.yaml`).
115
+ - `requirements.txt`: Project dependencies.