mayar-waleed commited on
Commit
98e532c
·
2 Parent(s): 9847e533a95d45
Files changed (3) hide show
  1. .gitattributes +35 -0
  2. Legal_Chatbot +1 -0
  3. README.md +11 -380
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Legal_Chatbot ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit 3a95d45832ecd0125af7de34e122f040a1fc13f4
README.md CHANGED
@@ -1,383 +1,14 @@
1
- # ⚖️ Constitutional Legal Assistant - Egyptian Constitution Chatbot
2
-
3
- An intelligent RAG-based chatbot for answering questions about the Egyptian Constitution in Arabic.
4
-
5
- ---
6
-
7
- ## 📁 Project Structure
8
-
9
- ```
10
- Chatbot_me/
11
- ├── app_final.py # Main Streamlit app (v1 - basic)
12
- ├── app_final_pheonix.py # Streamlit app with Phoenix tracing
13
- ├── app_final_updated.py # Latest production version with improvements
14
- ├── evaluate_rag.py # RAG evaluation with RAGAS metrics (simplified output)
15
- ├── evaluate.py # Full standalone evaluation script
16
- ├── requirements.txt # Python dependencies
17
- ├── .env # Environment variables (create this - NOT in repo)
18
- ├── .gitignore # Git ignore rules
19
- ├── test_dataset_5_questions.json # Test dataset (5 questions from different categories)
20
- ├── data/ # Legal documents (NOT in repo)
21
- │ ├── Egyptian_Constitution_legalnature_only.json
22
- │ ├── Egyptian_Civil.json
23
- │ ├── Egyptian_Labour_Law.json
24
- │ ├── Egyptian_Personal Status Laws.json
25
- │ ├── Technology Crimes Law.json
26
- │ └── قانون_الإجراءات_الجنائية.json
27
- ├── chroma_db/ # Vector database (auto-generated - NOT in repo)
28
- ├── reranker/ # Arabic reranker model files (NOT in repo)
29
- │ ├── model.safetensors
30
- │ ├── config.json
31
- │ └── ...
32
- └── *.whl # Local wheel packages for Phoenix (NOT in repo)
33
- ```
34
-
35
- ---
36
-
37
- ## 🚀 Quick Start
38
-
39
- ### Step 1: Create Virtual Environment (Recommended)
40
-
41
- ```powershell
42
- # Create virtual environment
43
- python -m venv venv
44
-
45
- # Activate it (Windows PowerShell)
46
- .\venv\Scripts\Activate.ps1
47
-
48
- # Or (Windows CMD)
49
- .\venv\Scripts\activate.bat
50
- ```
51
-
52
- ### Step 2: Install Dependencies
53
-
54
- ```powershell
55
- # Install all requirements
56
- pip install -r requirements.txt
57
- ```
58
-
59
- ### Step 3: Install Local Wheel Packages (For Phoenix Tracing)
60
-
61
- ```powershell
62
- # Install OpenInference instrumentation packages
63
- pip install openinference_instrumentation_langchain-0.1.56-py3-none-any.whl
64
- pip install openinference_instrumentation_openai-0.1.41-py3-none-any.whl
65
- ```
66
-
67
- ### Step 4: Create `.env` File
68
-
69
- Create a `.env` file in the project root with:
70
-
71
- ```env
72
- # Required: Groq API Key (get from https://console.groq.com)
73
- GROQ_API_KEY=gsk_your_groq_api_key_here
74
-
75
- # Optional: For Phoenix tracing
76
- PHOENIX_OTLP_ENDPOINT=http://localhost:6006/v1/traces
77
- PHOENIX_SERVICE_NAME=constitutional-assistant
78
- ```
79
-
80
- ---
81
-
82
- ## 🏃 Running the Applications
83
-
84
- ### 1. Run Latest Production App (`app_final_updated.py`) ⭐ RECOMMENDED
85
-
86
- The most recent version with improved prompt engineering and decision tree logic:
87
-
88
- ```powershell
89
- streamlit run app_final_updated.py
90
- ```
91
-
92
- Then open: **http://localhost:8501**
93
-
94
- **Features:**
95
- - Enhanced Arabic RTL support
96
- - Improved decision tree for handling different question types
97
- - Better handling of procedural vs. constitutional questions
98
- - Cleaner response formatting
99
-
100
- ---
101
-
102
- ### 2. Run Basic App (`app_final.py`)
103
-
104
- The original version:
105
-
106
- ```powershell
107
- streamlit run app_final.py
108
- ```
109
-
110
- Then open: **http://localhost:8501**
111
-
112
- ---
113
-
114
- ### 3. Run App with Phoenix Tracing (`app_final_pheonix.py`)
115
-
116
- This version includes observability/tracing with Phoenix.
117
-
118
- #### Step A: Start Phoenix Server First
119
-
120
- ```powershell
121
- # In a separate terminal
122
- python -m phoenix.server.main serve
123
- ```
124
-
125
- Phoenix UI will be at: **http://localhost:6006**
126
-
127
- #### Step B: Run the App
128
-
129
- ```powershell
130
- streamlit run app_final_pheonix.py
131
- ```
132
-
133
- Then open:
134
- - **App**: http://localhost:8501
135
- - **Phoenix Traces**: http://localhost:6006
136
-
137
- ---
138
-
139
- ### 4. Run Evaluation (`evaluate_rag.py`) ⭐ NEW SIMPLIFIED FORMAT
140
-
141
- Evaluate the RAG system with simplified output showing only essential information:
142
-
143
- ```powershell
144
- # Uses default test dataset (test_dataset_5_questions.json)
145
- python evaluate_rag.py
146
-
147
- # With custom test file
148
- python evaluate_rag.py path/to/your_test.json
149
-
150
- # Set via environment variable
151
- set QA_FILE_PATH=test_dataset_5_questions.json
152
- python evaluate_rag.py
153
- ```
154
-
155
- **Output Files:**
156
- - `evaluation_breakdown.json` - **Simplified format** with:
157
- - Question
158
- - Ground truth
159
- - Actual answer
160
- - Score (average of all metrics per question)
161
- - Average score across all questions
162
- - `evaluation_results.json` - Detailed metrics breakdown
163
- - `evaluation_detailed.json` - Full raw evaluation data
164
-
165
- **Sample Output Format:**
166
- ```json
167
- {
168
- "questions": [
169
- {
170
- "question": "ما الطبيعة القانونية لحق العمل في الدستور المصري؟",
171
- "ground_truth": "حق أساسي/حرية: العمل حق وواجب...",
172
- "actual_answer": "حسب المادة (12) من الدستور المصري...",
173
- "score": 0.8542
174
- }
175
- ],
176
- "average_score": 0.8542
177
- }
178
- ```
179
-
180
- **⚠️ Note:** This script has a **60-second delay** between questions to avoid Groq API rate limits.
181
-
182
  ---
183
-
184
- ### 5. Run Full Evaluation (`evaluate.py`)
185
-
186
- More comprehensive evaluation with external test dataset and rate limiting:
187
-
188
- ```powershell
189
- # Basic run (uses test_dataset.json)
190
- python evaluate.py
191
-
192
- # With custom test file
193
- python evaluate.py test_dataset_small.json
194
-
195
- # With custom test and output files
196
- python evaluate.py test_dataset_small.json my_results.json
197
- ```
198
-
199
- **⚠️ Note:** This script has a **2-minute delay** between questions to avoid Groq API rate limits.
200
-
201
- ---
202
-
203
- ## 📊 Test Dataset
204
-
205
- The project includes a curated test dataset with 5 questions covering different legal categories:
206
-
207
- **`test_dataset_5_questions.json`** includes:
208
- 1. **الدستور (Constitution)** - Constitutional rights and principles
209
- 2. **قانون العمل (Labour Law)** - Workplace rights and regulations
210
- 3. **الإجراءات الجنائية (Criminal Procedures)** - Criminal law procedures
211
- 4. **جرائم تقنية المعلومات (Technology Crimes)** - Cybercrime laws
212
- 5. **الأحوال الشخصية (Personal Status Laws)** - Family law matters
213
-
214
- This diverse dataset ensures comprehensive testing across all major legal domains covered by the system.
215
-
216
- ---
217
-
218
- ## 📊 Understanding RAGAS Metrics
219
-
220
- The evaluation system uses RAGAS metrics to assess the quality of the RAG pipeline. The simplified output combines these into a single score per question:
221
-
222
- | Metric | Description | Good Score |
223
- |--------|-------------|------------|
224
- | **faithfulness** | Is answer grounded in context? | > 0.7 |
225
- | **answer_relevancy** | Does answer match the question? | > 0.8 |
226
- | **context_precision** | How much context was useful? | > 0.6 |
227
- | **context_recall** | Did we retrieve all needed info? | > 0.7 |
228
-
229
- **Question Score** = Average of all four metrics (0-1 scale)
230
-
231
- **Overall Score** = Average of all question scores
232
-
233
  ---
234
 
235
- ## Repository Structure & Git
236
-
237
- ### Files NOT Included in Repository (via `.gitignore`)
238
-
239
- The following files are excluded from version control for security, size, or privacy reasons:
240
-
241
- 1. **`reranker/`** - Large model files (download separately or train locally)
242
- 2. **`__pycache__/`** - Python compiled bytecode
243
- 3. **`chroma_db/`** - Vector database (auto-generated on first run)
244
- 4. **`.env`** - Environment variables with API keys (NEVER commit this!)
245
- 5. **`*.json`** - All JSON files EXCEPT `test_dataset_5_questions.json`
246
- 6. **`*.csv`** - CSV data files
247
- 7. **`*.md`** - All markdown files EXCEPT `README.md`
248
- 8. **`*.whl`** - Wheel package files
249
-
250
- ### First-Time Setup
251
-
252
- When cloning this repository, you'll need to:
253
-
254
- 1. **Create `.env` file** with your API keys
255
- 2. **Download/prepare data files** in the `data/` folder
256
- 3. **Download reranker model** to `reranker/` folder
257
- 4. **Install dependencies** from `requirements.txt`
258
- 5. **Run the app** - ChromaDB will auto-generate on first run
259
-
260
- ---
261
-
262
- ## �🔧 Troubleshooting
263
-
264
- ### "GROQ_API_KEY not found"
265
- Make sure your `.env` file exists and contains:
266
- ```env
267
- GROQ_API_KEY=gsk_your_key_here
268
- ```
269
-
270
- ### "Reranker path not found"
271
- Ensure the `reranker/` folder exists with model files:
272
- ```
273
- reranker/
274
- ├── model.safetensors
275
- ├── config.json
276
- ├── tokenizer.json
277
- └── ...
278
- ```
279
-
280
- ### "Phoenix connection refused"
281
- Start Phoenix server first:
282
- ```powershell
283
- python -m phoenix.server.main serve
284
- ```
285
-
286
- ### Rate Limit Errors (Groq)
287
- - Wait a few minutes and try again
288
- - Use `test_dataset_small.json` for fewer questions
289
- - The `evaluate.py` script has built-in 2-minute delays
290
-
291
- ### Import Errors
292
- ```powershell
293
- # Reinstall all dependencies
294
- pip install -r requirements.txt --force-reinstall
295
- ```
296
-
297
- ---
298
-
299
- ## 📝 API Keys Required
300
-
301
- | Service | Purpose | Get Key From |
302
- |---------|---------|--------------|
303
- | **Groq** | LLM (Llama 3.1 8B) | https://console.groq.com |
304
- | **HuggingFace** | Embeddings (auto-download) | No key needed |
305
-
306
- ---
307
-
308
- ## 🔄 How the System Works
309
-
310
- ```
311
- User Question (Arabic)
312
-
313
- ┌─────────────────────────────────┐
314
- │ Decision Tree Logic │
315
- │ (app_final_updated.py) │
316
- │ ├── Constitutional questions │
317
- │ ├── Procedural questions │
318
- │ ├── General legal advice │
319
- │ └── Out-of-scope filtering │
320
- └─────────────────────────────────┘
321
-
322
- ┌─────────────────────────────────┐
323
- │ Hybrid Retrieval (RRF) │
324
- │ ├── Semantic Search (50%) │
325
- │ ├── BM25 Keyword (30%) │
326
- │ └── Metadata Filter (20%) │
327
- └───────────────────────���─────────┘
328
-
329
- ┌─────────────────────────────────┐
330
- │ Cross-Reference Expansion │
331
- │ (Fetch related articles) │
332
- └─────────────────────────────────┘
333
-
334
- ┌─────────────────────────────────┐
335
- │ Arabic Reranker (ARM-V1) │
336
- │ (Select top 5 most relevant) │
337
- └─────────────────────────────────┘
338
-
339
- ┌─────────────────────────────────┐
340
- │ LLM (Llama 3.1 via Groq) │
341
- │ (Generate Arabic answer) │
342
- │ - Separate system/user prompts │
343
- │ - Citation with article numbers│
344
- │ - Temperature: 0.3 │
345
- └─────────────────────────────────┘
346
-
347
- Final Answer
348
- ```
349
-
350
- ---
351
-
352
- ## 📋 Version History
353
-
354
- ### Latest Updates (Feb 2026)
355
- - ✅ Added `app_final_updated.py` with improved decision tree logic
356
- - ✅ Simplified evaluation output (question, ground_truth, answer, score)
357
- - ✅ Created curated 5-question test dataset covering 5 legal categories
358
- - ✅ Added comprehensive `.gitignore` for repository management
359
- - ✅ Updated documentation with all recent changes
360
- - ✅ Improved Arabic RTL support and number formatting
361
-
362
- ### Previous Features
363
- - Multi-source legal document support (Constitution, Civil, Labour, etc.)
364
- - Hybrid retrieval with RRF (Reciprocal Rank Fusion)
365
- - Arabic-specific reranker integration
366
- - Phoenix tracing for observability
367
- - RAGAS-based evaluation system
368
-
369
- ---
370
-
371
- ## 📞 Support
372
-
373
- For issues, check:
374
- 1. `.env` file has correct API keys
375
- 2. All dependencies installed
376
- 3. `reranker/` folder exists with model files
377
- 4. Internet connection for API calls
378
-
379
- ---
380
-
381
- ## 📄 License
382
-
383
- This project is for educational purposes - Egyptian Constitution Legal Assistant.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Legal Chatbot
3
+ emoji: 🏆
4
+ colorFrom: red
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 6.6.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Legal RAG Chatbot
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference