Matis Codjia commited on
Commit
27bbd47
·
1 Parent(s): eb53740

Feat:Sources files

Browse files
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ echo "chroma_db_storage/" >> .gitignore
2
+ echo "*.sqlite3" >> .gitignore
3
+ echo "__pycache__/" >> .gitignore
README.md CHANGED
@@ -1,20 +1,83 @@
1
- ---
2
- title: FFGEN Demo
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Streamlit RAG Viewer
2
+
3
+ Simple, colorful, and professional interface for semantic code feedback search.
4
+
5
+ ## Features
6
+
7
+ - ✅ **Pure Streamlit** - No separate backend
8
+ - 🎨 **Colorful & Adaptive** - Modern gradient UI
9
+ - 🤖 **Base Model + PEFT** - Load 400M model + your adapter
10
+ - 🔍 **Semantic Search** - ChromaDB vector search
11
+ - 📦 **Flexible Data** - HuggingFace Hub or local JSONL
12
+
13
+ ## Quick Start
14
+
15
+ ```bash
16
+ cd streamlit_rag_viewer
17
+
18
+ # Install
19
+ pip install -r requirements.txt
20
+
21
+ # Run
22
+ streamlit run app.py
23
+ ```
24
+
25
+ Opens at **http://localhost:8501**
26
+
27
+ ## Usage
28
+
29
+ ### 1. Configure (Sidebar)
30
+
31
+ - **Dataset**: Choose HuggingFace Hub or local JSONL
32
+ - **Base Model**: `Salesforce/SFR-Embedding-Code-400M_R`
33
+ - **PEFT Adapter**: Your adapter from Hub (e.g., `matis35/my-adapter`)
34
+
35
+ ### 2. Load & Index
36
+
37
+ Click **🚀 Load & Index** - it will:
38
+ 1. Load base model (400M)
39
+ 2. Load your PEFT adapter
40
+ 3. Combine them
41
+ 4. Load dataset
42
+ 5. Index in ChromaDB
43
+
44
+ ### 3. Search
45
+
46
+ - Paste code
47
+ - Choose k results
48
+ - Click **🔍 Search**
49
+
50
+ ## Architecture
51
+
52
+ **Single Streamlit app** with:
53
+ - Model: Base (400M) + PEFT adapter loaded together
54
+ - Storage: ChromaDB (persistent in `.chroma_cache/`)
55
+ - UI: Gradient colors, adaptive layout
56
+
57
+ ## Why No Backend?
58
+
59
+ Everything runs in Streamlit:
60
+ - Model loading
61
+ - Encoding
62
+ - ChromaDB
63
+ - Search
64
+
65
+ **No FastAPI needed** - Streamlit handles it all!
66
+
67
+ ## Example Adapter
68
+
69
+ Your PEFT adapter structure on Hub:
70
+
71
+ ```
72
+ matis35/my-code-adapter/
73
+ ├── adapter_config.json
74
+ ├── adapter_model.bin
75
+ └── README.md
76
+ ```
77
+
78
+ The app will:
79
+ 1. Load base: `AutoModel.from_pretrained("Salesforce/SFR-Embedding-Code-400M_R")`
80
+ 2. Load adapter: `PeftModel.from_pretrained(base_model, "matis35/my-code-adapter")`
81
+ 3. Use combined model for encoding
82
+
83
+ ## Enjoy! 🚀
__init__.py ADDED
File without changes
app.py ADDED
@@ -0,0 +1,586 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Streamlit RAG Viewer avec Cache Intelligent
3
+ """
4
+
5
+ import streamlit as st
6
+ import torch
7
+ import torch.nn.functional as F
8
+ from transformers import AutoTokenizer, AutoModel
9
+ from datasets import load_dataset
10
+ import chromadb
11
+ from pathlib import Path
12
+ import json
13
+ import time
14
+ import logging
15
+ import sys
16
+ # Import des modules custom
17
+ from cache_manager import CacheManager
18
+ from deepseek_caller import DeepSeekCaller
19
+ from stats_logger import StatsLogger
20
+ from config import SIMILARITY_THRESHOLD
21
+ from utils import load_css
22
+
23
+ # ==========================================
24
+ # PAGE CONFIG
25
+ # ==========================================
26
+ st.set_page_config(
27
+ page_title="RAG Feedback System",
28
+ page_icon="",
29
+ layout="wide",
30
+ initial_sidebar_state="expanded"
31
+ )
32
+
33
+ # ==========================================
34
+ # CUSTOM CSS
35
+ # ==========================================
36
+ load_css("assets/style.css")
37
+
38
+ # ==========================================
39
+ # STATE MANAGEMENT
40
+ # ==========================================
41
+ if 'model_loaded' not in st.session_state: st.session_state.model_loaded = False
42
+ if 'dataset_loaded' not in st.session_state: st.session_state.dataset_loaded = False
43
+ if 'db_initialized' not in st.session_state: st.session_state.db_initialized = False
44
+ if 'cache_manager' not in st.session_state: st.session_state.cache_manager = None
45
+ if 'deepseek_caller' not in st.session_state: st.session_state.deepseek_caller = None
46
+ if 'stats_logger' not in st.session_state: st.session_state.stats_logger = StatsLogger()
47
+
48
+ # ==========================================
49
+ # HELPER FUNCTIONS
50
+ # ==========================================
51
+ logging.basicConfig(
52
+ level=logging.INFO,
53
+ format='%(asctime)s | %(levelname)s | %(message)s',
54
+ datefmt='%H:%M:%S',
55
+ handlers=[
56
+ logging.StreamHandler(sys.stdout)
57
+ ]
58
+ )
59
+
60
+ logger = logging.getLogger("FFGen_System")
61
+
62
+ @st.cache_resource
63
+ def load_full_model(model_path: str):
64
+ """Load standard HuggingFace model."""
65
+ st.info(f"Loading model from: {model_path}")
66
+ logger.info(f"Chargement du modèle depuis : {model_path}...")
67
+ try:
68
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
69
+ if tokenizer.pad_token is None:
70
+ tokenizer.pad_token = tokenizer.eos_token
71
+
72
+ model = AutoModel.from_pretrained(
73
+ model_path,
74
+ trust_remote_code=True,
75
+ device_map="auto"
76
+ )
77
+ logger.info(f"Modèle chargé avec succès !")
78
+ model.eval()
79
+ return model, tokenizer
80
+ except Exception as e:
81
+ st.error(f"Erreur de chargement: {e}")
82
+ logger.error("Echec du chargement du modèle !")
83
+ return None, None
84
+
85
+ def encode_text(text: str, model, tokenizer):
86
+ """Encode text to embedding."""
87
+ device = next(model.parameters()).device
88
+
89
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
90
+ inputs = {k: v.to(device) for k, v in inputs.items()}
91
+
92
+ with torch.no_grad():
93
+ outputs = model(**inputs)
94
+ embeddings = outputs.last_hidden_state.mean(dim=1)
95
+ embeddings = F.normalize(embeddings, p=2, dim=1)
96
+
97
+ return embeddings[0].cpu().numpy().tolist()
98
+
99
+ @st.cache_data
100
+ def load_dataset_from_source(source: str, path: str):
101
+ logger.info(f"Source séléctionnée {source}")
102
+ if source == "HuggingFace Hub":
103
+
104
+ dataset = load_dataset(path)
105
+ data = []
106
+ for split in dataset.keys():
107
+ data.extend(dataset[split].to_list())
108
+ return data
109
+ else:
110
+ data = []
111
+ with open(path, 'r') as f:
112
+ for line in f:
113
+ if line.strip():
114
+ data.append(json.loads(line))
115
+ return data
116
+
117
+ def initialize_chromadb(force_reindex=False):
118
+ db_path = Path("streamlit_rag_viewer/chroma_db_storage")
119
+ db_path.mkdir(parents=True, exist_ok=True)
120
+
121
+ client = chromadb.PersistentClient(path=str(db_path))
122
+
123
+ try:
124
+ if force_reindex:
125
+ try: client.delete_collection("feedbacks")
126
+ except: pass
127
+ collection = client.create_collection(name="feedbacks")
128
+ else:
129
+ collection = client.get_collection(name="feedbacks")
130
+ except:
131
+ collection = client.create_collection(name="feedbacks")
132
+
133
+ return client, collection
134
+
135
+ # ==========================================
136
+ # MAIN APP
137
+ # ==========================================
138
+
139
+ st.title("FFGEN")
140
+ st.markdown("### Submit code and get instant feedback")
141
+
142
+ # ==========================================
143
+ # SIDEBAR - CONFIGURATION
144
+ # ==========================================
145
+
146
+ with st.sidebar:
147
+ st.header(" Configuration")
148
+
149
+ # --- MODEL SELECTION ---
150
+ st.subheader("Embedding Model")
151
+ model_path = st.text_input(
152
+ "Model Path (Local or HF)",
153
+ value="matis35/gemmaembedding-fgdor",
154
+ help="Path to embedding model"
155
+ )
156
+
157
+ # --- DATASET SELECTION ---
158
+ st.subheader("Dataset")
159
+ data_source = st.selectbox("Source", ["HuggingFace Hub", "Local JSONL"])
160
+ dataset_path = st.text_input("Dataset Path", value="matis35/SYNT_V4")
161
+
162
+ st.divider()
163
+
164
+ # --- CACHE SETTINGS ---
165
+ st.subheader("Cache Settings")
166
+
167
+ # Permettre de modifier le threshold dynamiquement
168
+ if 'custom_threshold' not in st.session_state:
169
+ st.session_state.custom_threshold = SIMILARITY_THRESHOLD
170
+
171
+ custom_threshold = st.slider(
172
+ "Similarity Threshold",
173
+ min_value=0.1,
174
+ max_value=1.0,
175
+ value=st.session_state.custom_threshold,
176
+ step=0.05,
177
+ help="Distance < threshold = HIT. Modifier cette valeur change le comportement du cache sans réindexer."
178
+ )
179
+
180
+ if custom_threshold != st.session_state.custom_threshold:
181
+ st.session_state.custom_threshold = custom_threshold
182
+ # Mettre à jour le threshold du cache manager existant si disponible
183
+ if st.session_state.get('cache_manager'):
184
+ st.session_state.cache_manager.threshold = custom_threshold
185
+ st.info(f"Threshold updated to {custom_threshold:.2f}")
186
+
187
+ st.caption(f"Current: Distance < {st.session_state.custom_threshold:.2f} = HIT")
188
+
189
+ st.divider()
190
+
191
+ force_reindex = st.checkbox("Force Re-index", value=False)
192
+
193
+ col1, col2 = st.columns(2)
194
+ with col1:
195
+ load_btn = st.button("Load & Index", use_container_width=True)
196
+ with col2:
197
+ use_cached_btn = st.button(" Use Cached", use_container_width=True)
198
+
199
+ # --- LOAD CACHED DB ---
200
+ if use_cached_btn:
201
+ try:
202
+ client, collection = initialize_chromadb(force_reindex=False)
203
+ count = collection.count()
204
+ if count > 0:
205
+ st.session_state.client = client
206
+ st.session_state.collection = collection
207
+ st.session_state.db_initialized = True
208
+ st.success(f"DB Loaded: {count} docs")
209
+ logger.info(f"Base de données démarrée avec succès: {count} instances")
210
+ if not st.session_state.model_loaded:
211
+ model, tokenizer = load_full_model(model_path)
212
+ if model:
213
+ st.session_state.model = model
214
+ st.session_state.tokenizer = tokenizer
215
+ st.session_state.model_loaded = True
216
+
217
+ # Initialiser cache manager avec threshold dynamique
218
+ encoder_fn = lambda text: encode_text(text, model, tokenizer)
219
+ st.session_state.cache_manager = CacheManager(
220
+ collection,
221
+ encoder_fn,
222
+ threshold=st.session_state.custom_threshold
223
+ )
224
+
225
+ # Initialiser DeepSeek caller
226
+ try:
227
+ st.session_state.deepseek_caller = DeepSeekCaller()
228
+ st.success(" DeepSeek API Ready")
229
+ logger.info("API prête")
230
+ except Exception as e:
231
+ st.warning(f" DeepSeek API unavailable: {e}")
232
+ logger.error(f"API non disponible: {e}")
233
+ else:
234
+ st.warning(" Empty DB. Please Load & Index first.")
235
+ except Exception as e:
236
+ st.error(f"Error: {e}")
237
+ logger.error(f"Problème avec la base de données: {e}")
238
+
239
+ # --- LOAD AND INDEX ---
240
+ if load_btn:
241
+ with st.spinner("Loading Model..."):
242
+ model, tokenizer = load_full_model(model_path)
243
+ if model:
244
+ st.session_state.model = model
245
+ st.session_state.tokenizer = tokenizer
246
+ st.session_state.model_loaded = True
247
+ else:
248
+ st.stop()
249
+
250
+ with st.spinner("Loading Dataset..."):
251
+ logger.info("Chargement du dataset")
252
+ try:
253
+ data = load_dataset_from_source(data_source, dataset_path)
254
+ st.session_state.dataset = data
255
+ st.session_state.dataset_loaded = True
256
+ except Exception as e:
257
+ st.error(f"Dataset Error: {e}")
258
+ logger.error("Problème de chargement du dataset")
259
+ st.stop()
260
+
261
+ if st.session_state.dataset_loaded:
262
+ with st.spinner(f"Indexing {len(data)} items..."):
263
+ client, collection = initialize_chromadb(force_reindex=force_reindex)
264
+
265
+ batch_size = 64
266
+ progress_bar = st.progress(0)
267
+
268
+ for i in range(0, len(data), batch_size):
269
+ batch = data[i:i+batch_size]
270
+
271
+ feedbacks = [item.get("feedback", item.get("generated_feedback", "")) for item in batch]
272
+ codes = [item.get("code") for item in batch]
273
+
274
+ # IMPORTANT: Encode FEEDBACK for bi-encoder retrieval (code→feedback)
275
+ embeddings = [encode_text(fb, model, tokenizer) for fb in feedbacks]
276
+
277
+ # Store code as metadata for later comparison
278
+ metadatas = [{"code": c if c else ""} for c in codes]
279
+ ids = [f"id_{i+j}" for j in range(len(batch))]
280
+
281
+ collection.add(
282
+ embeddings=embeddings,
283
+ documents=feedbacks,
284
+ metadatas=metadatas,
285
+ ids=ids
286
+ )
287
+ progress_bar.progress(min(1.0, (i + batch_size) / len(data)))
288
+
289
+ st.session_state.client = client
290
+ st.session_state.collection = collection
291
+ st.session_state.db_initialized = True
292
+
293
+ # Initialiser cache manager avec threshold dynamique
294
+ encoder_fn = lambda text: encode_text(text, model, tokenizer)
295
+ st.session_state.cache_manager = CacheManager(
296
+ collection,
297
+ encoder_fn,
298
+ threshold=st.session_state.custom_threshold
299
+ )
300
+
301
+ # Initialiser DeepSeek
302
+ try:
303
+ st.session_state.deepseek_caller = DeepSeekCaller()
304
+ except:
305
+ pass
306
+
307
+ st.success(" Indexing Complete!")
308
+
309
+ # ==========================================
310
+ # MAIN INTERFACE - QUERY
311
+ # ==========================================
312
+
313
+ if st.session_state.db_initialized and st.session_state.cache_manager:
314
+
315
+ st.header(" Submit Your Code")
316
+
317
+ # Formulaire enrichi
318
+ with st.form("code_submission"):
319
+ col1, col2 = st.columns([2, 1])
320
+
321
+ with col1:
322
+ code_input = st.text_area(
323
+ "C Code",
324
+ height=300,
325
+ placeholder="Paste your C code here...",
326
+ help="The code you want feedback on"
327
+ )
328
+
329
+ with col2:
330
+ theme = st.text_input(
331
+ "Exercise Theme",
332
+ placeholder="e.g., Binary Search",
333
+ help="What is this exercise about?"
334
+ )
335
+
336
+ difficulty = st.selectbox(
337
+ "Difficulty Level",
338
+ ["beginner", "intermediate", "advanced"]
339
+ )
340
+
341
+ error_category = st.text_input(
342
+ "Error Category (optional)",
343
+ placeholder="e.g., Off-by-one Error",
344
+ help="If you know the type of error"
345
+ )
346
+
347
+ instructions = st.text_area(
348
+ "Exercise Instructions (optional)",
349
+ placeholder="Describe what the function should do...",
350
+ help="Helps generate better feedback on cache miss"
351
+ )
352
+
353
+ col1, col2 = st.columns(2)
354
+ with col1:
355
+ test_scope = st.text_input(
356
+ "Test Cases Scope (optional)",
357
+ placeholder="e.g., Test with n=0, n=5, n=10",
358
+ help="What tests should pass"
359
+ )
360
+
361
+ with col2:
362
+ failed_tests = st.text_input(
363
+ "Failed Tests (optional)",
364
+ placeholder="e.g., Test n=0 returns wrong value",
365
+ help="Which tests are failing"
366
+ )
367
+
368
+ submit_btn = st.form_submit_button(" Search Feedback", use_container_width=True)
369
+
370
+ # TRAITEMENT DE LA REQUÊTE
371
+ if submit_btn and code_input:
372
+ start_time = time.time()
373
+
374
+ # Contexte complet
375
+ context = {
376
+ "code": code_input,
377
+ "theme": theme or "N/A",
378
+ "difficulty": difficulty,
379
+ "error_category": error_category or "Unknown",
380
+ "instructions": instructions or "No instructions provided",
381
+ "test_cases_scope": [test_scope] if test_scope else [],
382
+ "failed_tests": [failed_tests] if failed_tests else []
383
+ }
384
+
385
+ # Query cache
386
+ with st.spinner(" Searching cache..."):
387
+ cache_result = st.session_state.cache_manager.query_cache(code_input, context)
388
+
389
+ response_time = (time.time() - start_time) * 1000 # ms
390
+
391
+ # CACHE HIT ou PERFECT MATCH
392
+ if cache_result['status'] in ['hit', 'perfect_match']:
393
+ is_perfect = cache_result['status'] == 'perfect_match'
394
+
395
+ st.markdown('<div class="hit-card">', unsafe_allow_html=True)
396
+
397
+ if is_perfect:
398
+ st.markdown("### PERFECT CODE MATCH - Exact Feedback Found")
399
+ st.success("The submitted code is identical (similarity > 95%) to a code in the database. This feedback is 100% accurate.")
400
+ else:
401
+ st.markdown("### Cache HIT - Feedback from Database")
402
+
403
+ col1, col2, col3 = st.columns(3)
404
+ with col1:
405
+ st.metric("Confidence", f"{cache_result['confidence']:.2f}")
406
+ with col2:
407
+ st.metric("Best Match Distance (code→feedback)", f"{cache_result['similarity_scores'][0]:.4f}")
408
+ with col3:
409
+ st.metric("Response Time", f"{response_time:.0f} ms")
410
+
411
+ # Afficher code similarity si disponible
412
+ if cache_result.get('code_similarity') is not None:
413
+ st.metric("Code Similarity", f"{cache_result['code_similarity']:.4f}",
414
+ help="Similarity between your code and reference code (1.0 = identical)")
415
+
416
+ if cache_result['needs_warning'] and not is_perfect:
417
+ st.warning(" **Note:** Confidence is moderate. Review carefully.")
418
+
419
+ # Afficher les résultats
420
+ for result in cache_result['results']:
421
+ # Calculer distance code_soumis ↔ code_référence
422
+ code_ref = result['code']
423
+ if code_ref and code_ref != 'N/A':
424
+ code_ref_embedding = encode_text(code_ref, st.session_state.model, st.session_state.tokenizer)
425
+ code_submitted_embedding = encode_text(code_input, st.session_state.model, st.session_state.tokenizer)
426
+
427
+ # Cosine similarity
428
+ import numpy as np
429
+ similarity = np.dot(code_ref_embedding, code_submitted_embedding)
430
+ code_distance = 1 - similarity
431
+ else:
432
+ code_distance = None
433
+
434
+ with st.expander(f" Match #{result['rank']} (code→feedback distance: {result['distance']:.4f})"):
435
+ # Métriques côte à côte
436
+ col1, col2 = st.columns(2)
437
+ with col1:
438
+ st.metric("Code → Feedback", f"{result['distance']:.4f}", help="Distance entre votre code et ce feedback (apprentissage bi-encoder)")
439
+ with col2:
440
+ if code_distance is not None:
441
+ st.metric("Code → Code Ref", f"{code_distance:.4f}", help="Distance entre votre code et le code de référence pour ce feedback")
442
+
443
+ st.markdown("**Feedback:**")
444
+ st.write(result['feedback'])
445
+
446
+ st.markdown("**Reference Code (this feedback was given for):**")
447
+ st.code(result['code'], language='c')
448
+
449
+ st.markdown('</div>', unsafe_allow_html=True)
450
+
451
+ # Log stats
452
+ st.session_state.stats_logger.log_query({
453
+ "query_id": cache_result['query_id'],
454
+ "status": "hit",
455
+ "similarity_score": cache_result['similarity_scores'][0],
456
+ "confidence": cache_result['confidence'],
457
+ "response_time_ms": response_time,
458
+ "theme": theme,
459
+ "error_category": error_category,
460
+ "difficulty": difficulty,
461
+ "deepseek_tokens": 0,
462
+ "cache_size": st.session_state.collection.count()
463
+ })
464
+
465
+ # CACHE MISS
466
+ elif cache_result['status'] == 'miss':
467
+ st.markdown('<div class="miss-card">', unsafe_allow_html=True)
468
+ st.markdown("### Cache MISS - Generating New Feedback")
469
+
470
+ st.info(f" Closest match distance: {cache_result.get('closest_distance', 1.0):.4f} (threshold: {st.session_state.custom_threshold:.2f})")
471
+
472
+ # Afficher les codes les plus proches même en cas de miss
473
+ if cache_result['results']:
474
+ st.markdown("#### Closest matches found (but below threshold):")
475
+ for result in cache_result['results']:
476
+ # Calculer distance code_soumis ↔ code_référence
477
+ code_ref = result['code']
478
+ if code_ref and code_ref != 'N/A':
479
+ code_ref_embedding = encode_text(code_ref, st.session_state.model, st.session_state.tokenizer)
480
+ code_submitted_embedding = encode_text(code_input, st.session_state.model, st.session_state.tokenizer)
481
+
482
+ import numpy as np
483
+ similarity = np.dot(code_ref_embedding, code_submitted_embedding)
484
+ code_distance = 1 - similarity
485
+ else:
486
+ code_distance = None
487
+
488
+ with st.expander(f"Match #{result['rank']} (code→feedback: {result['distance']:.4f})"):
489
+ # Métriques côte à côte
490
+ col1, col2 = st.columns(2)
491
+ with col1:
492
+ st.metric("Code → Feedback", f"{result['distance']:.4f}", help="Distance bi-encoder (apprentissage)")
493
+ with col2:
494
+ if code_distance is not None:
495
+ st.metric("Code → Code Ref", f"{code_distance:.4f}", help="Distance code soumis vs code de référence")
496
+
497
+ st.markdown("**Feedback (given for reference code):**")
498
+ st.write(result['feedback'])
499
+
500
+ st.markdown("**Reference Code:**")
501
+ st.code(result['code'], language='c')
502
+
503
+ st.divider()
504
+
505
+ # Appeler DeepSeek
506
+ if st.session_state.deepseek_caller:
507
+ with st.spinner(" Generating feedback with DeepSeek..."):
508
+ deepseek_result = st.session_state.deepseek_caller.generate_feedback(context)
509
+
510
+ if deepseek_result.get('feedback'):
511
+ feedback = deepseek_result['feedback']
512
+ tokens_used = deepseek_result['tokens_total']
513
+
514
+ st.success(" Feedback Generated!")
515
+
516
+ col1, col2, col3 = st.columns(3)
517
+ with col1:
518
+ st.metric("Tokens Used", tokens_used)
519
+ with col2:
520
+ st.metric("Generation Time", f"{deepseek_result['generation_time_ms']:.0f} ms")
521
+ with col3:
522
+ st.metric("Total Time", f"{response_time + deepseek_result['generation_time_ms']:.0f} ms")
523
+
524
+ st.markdown("**Generated Feedback:**")
525
+ st.write(feedback)
526
+
527
+ # Distillation : Ajouter au cache
528
+ with st.spinner(" Adding to cache (distillation)..."):
529
+ # Encoder le feedback
530
+ feedback_embedding = encode_text(feedback, st.session_state.model, st.session_state.tokenizer)
531
+
532
+ success = st.session_state.cache_manager.add_to_cache(
533
+ code=code_input,
534
+ feedback=feedback,
535
+ metadata=context,
536
+ embedding=feedback_embedding
537
+ )
538
+
539
+ if success:
540
+ st.success(" Feedback added to cache for future queries!")
541
+
542
+ # Log cache miss (format dataset)
543
+ miss_data = {
544
+ **context,
545
+ "tags": [tag.strip() for tag in error_category.split(',') if tag.strip()] if error_category else [],
546
+ "feedback": feedback,
547
+ "query_id": cache_result['query_id'],
548
+ "tokens_used": tokens_used
549
+ }
550
+ st.session_state.stats_logger.log_cache_miss(miss_data)
551
+
552
+ # Log stats
553
+ st.session_state.stats_logger.log_query({
554
+ "query_id": cache_result['query_id'],
555
+ "status": "miss",
556
+ "similarity_score": cache_result.get('closest_distance', 1.0),
557
+ "confidence": 1.0, # LLM généré = haute confiance
558
+ "response_time_ms": response_time + deepseek_result['generation_time_ms'],
559
+ "theme": theme,
560
+ "error_category": error_category,
561
+ "difficulty": difficulty,
562
+ "deepseek_tokens": tokens_used,
563
+ "cache_size": st.session_state.collection.count()
564
+ })
565
+ else:
566
+ st.error(f" Error: {deepseek_result.get('error', 'Unknown error')}")
567
+ else:
568
+ st.error(" DeepSeek API not available. Cannot generate feedback.")
569
+
570
+ st.markdown('</div>', unsafe_allow_html=True)
571
+
572
+ else:
573
+ st.info(" Please configure and load the model + dataset from the sidebar first.")
574
+
575
+ st.markdown("""
576
+ ### How to use:
577
+ 1. **Load Model & Dataset** (or use cached DB)
578
+ 2. **Fill in the form** with your code and its context
579
+ 3. **Submit** to get feedback
580
+ 4. **Check the Stats page** to see cache performance
581
+
582
+ ### Cache System:
583
+ - **Hit**: Similar code found in database (instant response)
584
+ - **Miss**: No match found, generates new feedback (slower, uses API tokens)
585
+ - **Distillation**: New feedbacks are automatically added to the cache
586
+ """)
assets/style.css ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ :root {
2
+ --primary: #3b82f6;
3
+ --secondary: #8b5cf6;
4
+ --success: #10b981;
5
+ --warning: #f59e0b;
6
+ --danger: #ef4444;
7
+ }
8
+
9
+ .block-container {
10
+ padding-top: 2rem;
11
+ padding-bottom: 2rem;
12
+ }
13
+
14
+ h1 {
15
+ background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
16
+ -webkit-background-clip: text;
17
+ -webkit-text-fill-color: transparent;
18
+ }
19
+
20
+ .hit-card {
21
+ background: linear-gradient(135deg, #10b981 0%, #059669 100%);
22
+ color: white;
23
+ padding: 1.5rem;
24
+ border-radius: 1rem;
25
+ margin: 1rem 0;
26
+ box-shadow: 0 4px 15px rgba(0,0,0,0.1);
27
+ }
28
+
29
+ .miss-card {
30
+ background: linear-gradient(135deg, #ef4444 0%, #dc2626 100%);
31
+ color: white;
32
+ padding: 1.5rem;
33
+ border-radius: 1rem;
34
+ margin: 1rem 0;
35
+ box-shadow: 0 4px 15px rgba(0,0,0,0.1);
36
+ }
37
+
38
+ .result-header {
39
+ display: flex;
40
+ justify-content: space-between;
41
+ margin-bottom: 1rem;
42
+ font-weight: 600;
43
+ }
44
+
45
+ .confidence-badge {
46
+ background: rgba(255, 255, 255, 0.2);
47
+ padding: 0.2rem 0.8rem;
48
+ border-radius: 2rem;
49
+ font-size: 0.85rem;
50
+ }
51
+
52
+ .code-block {
53
+ background: rgba(0, 0, 0, 0.3);
54
+ padding: 1rem;
55
+ border-radius: 0.5rem;
56
+ font-family: monospace;
57
+ font-size: 0.85rem;
58
+ overflow-x: auto;
59
+ margin-top: 0.5rem;
60
+ }
61
+
62
+ .stButton > button {
63
+ background: linear-gradient(135deg, #3b82f6 0%, #2563eb 100%);
64
+ color: white;
65
+ border: none;
66
+ }
assets/style_search.css ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Style moderne des cartes de résultats */
2
+ .result-card {
3
+ background: white;
4
+ border: 1px solid #e2e8f0;
5
+ border-left: 5px solid #3b82f6;
6
+ color: #1e293b;
7
+ padding: 1.5rem;
8
+ border-radius: 0.5rem;
9
+ margin: 1rem 0;
10
+ box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
11
+ transition: transform 0.2s;
12
+ }
13
+
14
+ .result-card:hover {
15
+ transform: translateY(-2px);
16
+ box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1);
17
+ }
18
+
19
+ /* Style de la console de debug (Terminal-like) */
20
+ .debug-terminal {
21
+ background-color: #1e1e1e;
22
+ color: #4ade80;
23
+ font-family: 'JetBrains Mono', 'Courier New', monospace;
24
+ padding: 15px;
25
+ border-radius: 8px;
26
+ font-size: 0.85rem;
27
+ margin-bottom: 10px;
28
+ border: 1px solid #333;
29
+ overflow-x: auto;
30
+ line-height: 1.5;
31
+ }
32
+
33
+ /* Badges de statut */
34
+ .badge {
35
+ padding: 4px 12px;
36
+ border-radius: 9999px;
37
+ font-weight: 600;
38
+ font-size: 0.75rem;
39
+ text-transform: uppercase;
40
+ letter-spacing: 0.05em;
41
+ }
42
+
43
+ .badge-success { background-color: #dcfce7; color: #166534; }
44
+ .badge-warning { background-color: #fef9c3; color: #854d0e; }
45
+ .badge-error { background-color: #fee2e2; color: #991b1b; }
46
+
47
+ /* Titre avec dégradé */
48
+ h1 {
49
+ background: linear-gradient(90deg, #3b82f6 0%, #8b5cf6 100%);
50
+ -webkit-background-clip: text;
51
+ -webkit-text-fill-color: transparent;
52
+ }
benchmarks/human_evaluation_results.csv ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ dataset_id,evaluator,timestamp,tech_validity,diagnostic_score,has_spoiler,is_maieutic,actionability,comment
2
+ 0,Expert_1,2026-01-08T11:23:21.960462,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Le feedback est un peu trop long
3
+ 0,Expert_1,2026-01-08T13:09:23.940171,Oui (Correct),Cause Racine (ex: malloc manquant),False,False,Immédiat,Feedback trop long
4
+ 1,Expert_1,2026-01-08T13:11:07.431838,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,"Parfait
5
+ @"
6
+ 2,Expert_1,2026-01-08T13:11:50.182131,Discutable (Imprécis),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
7
+ 3,Expert_1,2026-01-08T13:12:44.663000,Discutable (Imprécis),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
8
+ 4,Expert_1,2026-01-08T13:12:50.609774,Discutable (Imprécis),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
9
+ 5,Expert_1,2026-01-08T13:12:55.645751,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
10
+ 6,Expert_1,2026-01-08T13:13:00.590676,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
11
+ 7,Expert_1,2026-01-08T13:13:03.098498,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
12
+ 8,Expert_1,2026-01-08T13:13:04.732185,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
13
+ 9,Expert_1,2026-01-08T13:13:21.744115,Oui (Correct),Cause Racine (ex: malloc manquant),False,True,Immédiat,Parfait
cache_manager.py ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Cache Manager - Gère Hit/Miss et distillation
3
+ """
4
+
5
+ import numpy as np
6
+ from typing import Dict, List, Any, Tuple
7
+ import uuid
8
+ from datetime import datetime
9
+ from config import SIMILARITY_THRESHOLD, TOP_K_RESULTS, CONFIDENCE_THRESHOLD_WARNING
10
+
11
+ class CacheManager:
12
+ def __init__(self, chroma_collection, encoder_fn, threshold=None):
13
+ """
14
+ Args:
15
+ chroma_collection: Collection ChromaDB
16
+ encoder_fn: Fonction pour encoder du texte en embedding
17
+ threshold: Custom similarity threshold (if None, uses config default)
18
+ """
19
+ self.collection = chroma_collection
20
+ self.encoder_fn = encoder_fn
21
+ self.threshold = threshold if threshold is not None else SIMILARITY_THRESHOLD
22
+
23
+ def calculate_confidence(self, distances: List[float]) -> float:
24
+ """
25
+ Calcule un score de confiance basé sur les distances.
26
+ Distance plus faible = confiance plus haute.
27
+
28
+ Returns:
29
+ float entre 0 et 1
30
+ """
31
+ if not distances:
32
+ return 0.0
33
+
34
+ # Distance moyenne
35
+ avg_distance = np.mean(distances)
36
+
37
+ # Convertir distance en confiance (inverse et normalisation)
38
+ # Distance de 0 = confiance 1.0
39
+ # Distance de 0.5 = confiance 0.5
40
+ # Distance de 1.0 = confiance 0.0
41
+ confidence = max(0.0, 1.0 - avg_distance)
42
+
43
+ return round(confidence, 3)
44
+
45
+ def query_cache(self, code: str, context: Dict[str, Any]) -> Dict[str, Any]:
46
+ """
47
+ Interroge le cache avec le code soumis.
48
+
49
+ Returns:
50
+ {
51
+ "status": "hit" | "miss" | "perfect_match",
52
+ "results": [...], # Si hit
53
+ "similarity_scores": [...], # Si hit
54
+ "confidence": float,
55
+ "needs_deepseek": bool,
56
+ "query_embedding": [...],
57
+ "query_id": str,
58
+ "code_similarity": float | None # Similarité code-à-code si calculée
59
+ }
60
+ """
61
+ # Générer l'embedding du code soumis
62
+ query_embedding = self.encoder_fn(code)
63
+
64
+ # Requête ChromaDB
65
+ query_results = self.collection.query(
66
+ query_embeddings=[query_embedding],
67
+ n_results=TOP_K_RESULTS
68
+ )
69
+
70
+ # Extraire résultats
71
+ distances = query_results['distances'][0] if query_results['distances'] else []
72
+ documents = query_results['documents'][0] if query_results['documents'] else []
73
+ metadatas = query_results['metadatas'][0] if query_results['metadatas'] else []
74
+
75
+ # FALLBACK: Vérifier similarité code-à-code pour le meilleur match
76
+ # Si codes identiques (similarité ~1.0), forcer un HIT même si bi-encoder faible
77
+ code_similarity = None
78
+ perfect_code_match = False
79
+
80
+ if metadatas and metadatas[0].get('code'):
81
+ ref_code = metadatas[0].get('code')
82
+ if ref_code and ref_code != 'N/A':
83
+ # Calculer similarité entre code soumis et code de référence
84
+ ref_code_embedding = self.encoder_fn(ref_code)
85
+ code_similarity = float(np.dot(query_embedding, ref_code_embedding))
86
+
87
+ # Si codes quasi-identiques (similarité > 0.95), c'est un match parfait
88
+ if code_similarity > 0.95:
89
+ perfect_code_match = True
90
+
91
+ # Déterminer Hit ou Miss
92
+ # HIT si: distance bi-encoder < threshold OU match parfait des codes
93
+ if distances and (distances[0] < self.threshold or perfect_code_match):
94
+ # CACHE HIT
95
+ confidence = self.calculate_confidence(distances)
96
+
97
+ results = []
98
+ for i, (feedback, metadata, distance) in enumerate(zip(documents, metadatas, distances)):
99
+ results.append({
100
+ "rank": i + 1,
101
+ "feedback": feedback,
102
+ "code": metadata.get('code', 'N/A'),
103
+ "distance": round(distance, 4),
104
+ "metadata": metadata
105
+ })
106
+
107
+ # Déterminer le type de hit
108
+ hit_status = "perfect_match" if perfect_code_match else "hit"
109
+
110
+ return {
111
+ "status": hit_status,
112
+ "results": results,
113
+ "similarity_scores": [round(d, 4) for d in distances],
114
+ "confidence": 1.0 if perfect_code_match else confidence, # Confiance max si match parfait
115
+ "needs_deepseek": False,
116
+ "needs_warning": False if perfect_code_match else (confidence < CONFIDENCE_THRESHOLD_WARNING),
117
+ "query_embedding": query_embedding,
118
+ "query_id": str(uuid.uuid4()),
119
+ "code_similarity": round(code_similarity, 4) if code_similarity is not None else None,
120
+ "perfect_code_match": perfect_code_match
121
+ }
122
+
123
+ else:
124
+ # CACHE MISS
125
+ # Préparer les résultats les plus proches même en cas de miss
126
+ closest_results = []
127
+ for i, (feedback, metadata, distance) in enumerate(zip(documents, metadatas, distances)):
128
+ closest_results.append({
129
+ "rank": i + 1,
130
+ "feedback": feedback,
131
+ "code": metadata.get('code', 'N/A'),
132
+ "distance": round(distance, 4),
133
+ "metadata": metadata
134
+ })
135
+
136
+ return {
137
+ "status": "miss",
138
+ "results": closest_results, # Inclure les résultats les plus proches
139
+ "similarity_scores": [round(d, 4) for d in distances] if distances else [],
140
+ "confidence": 0.0,
141
+ "needs_deepseek": True,
142
+ "needs_warning": False,
143
+ "query_embedding": query_embedding,
144
+ "query_id": str(uuid.uuid4()),
145
+ "closest_distance": round(distances[0], 4) if distances else 1.0
146
+ }
147
+
148
+ def add_to_cache(self, code: str, feedback: str, metadata: Dict[str, Any], embedding: List[float]) -> bool:
149
+ """
150
+ Ajoute une nouvelle entrée au cache (distillation online).
151
+
152
+ Args:
153
+ code: Code source
154
+ feedback: Feedback généré
155
+ metadata: Métadonnées complètes (theme, difficulty, etc.)
156
+ embedding: Embedding du feedback
157
+
158
+ Returns:
159
+ bool: True si succès
160
+ """
161
+ try:
162
+ doc_id = f"miss_{uuid.uuid4()}"
163
+
164
+ # Préparer metadata pour ChromaDB (seulement le code car limitation)
165
+ chroma_metadata = {
166
+ "code": code,
167
+ "timestamp": datetime.now().isoformat(),
168
+ "source": "cache_miss"
169
+ }
170
+
171
+ self.collection.add(
172
+ embeddings=[embedding],
173
+ documents=[feedback],
174
+ metadatas=[chroma_metadata],
175
+ ids=[doc_id]
176
+ )
177
+
178
+ return True
179
+
180
+ except Exception as e:
181
+ print(f"Error adding to cache: {e}")
182
+ return False
183
+
184
+ def get_cache_stats(self) -> Dict[str, Any]:
185
+ """Retourne des stats sur le cache"""
186
+ try:
187
+ total_docs = self.collection.count()
188
+
189
+ return {
190
+ "total_documents": total_docs,
191
+ "similarity_threshold": SIMILARITY_THRESHOLD,
192
+ "top_k": TOP_K_RESULTS
193
+ }
194
+ except Exception as e:
195
+ return {"error": str(e)}
config.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration for RAG Feedback System with Cache
3
+ """
4
+
5
+ import os
6
+
7
+ # ==========================================
8
+ # CACHE CONFIGURATION
9
+ # ==========================================
10
+ SIMILARITY_THRESHOLD = 0.6 # Si distance < 0.3, considéré comme HIT
11
+ CONFIDENCE_THRESHOLD_WARNING = 0.9 # Si confiance < 0.9, afficher warning
12
+ TOP_K_RESULTS = 3 # Nombre de candidats similaires à retourner
13
+
14
+ # ==========================================
15
+ # DEEPSEEK API
16
+ # ==========================================
17
+ DEEPSEEK_API_KEY = os.environ.get('DEEPSEEK_API_KEY')
18
+ DEEPSEEK_BASE_URL = "https://api.deepseek.com"
19
+ DEEPSEEK_MODEL = "deepseek-chat"
20
+ DEEPSEEK_TEMPERATURE = 0.7
21
+ DEEPSEEK_MAX_TOKENS = 1500
22
+
23
+ # ==========================================
24
+ # DATA PATHS
25
+ # ==========================================
26
+ CACHE_MISS_LOG = "streamlit_rag_viewer/data/cache_miss.jsonl"
27
+ STATS_LOG = "streamlit_rag_viewer/data/stats.jsonl"
28
+ CHROMA_DB_PATH = "streamlit_rag_viewer/chroma_db_storage"
29
+
30
+ # ==========================================
31
+ # SYSTEM PROMPT (Instructeur)
32
+ # ==========================================
33
+ INSTRUCTOR_SYSTEM_PROMPT = """You are an expert C Programming Instructor helping students understand and fix bugs in their code.
34
+
35
+ ## YOUR ROLE
36
+ You provide educational feedback to help students learn from their mistakes, not just fix the code.
37
+
38
+ ## FEEDBACK GUIDELINES
39
+ 1. **Educational Focus**: Explain the underlying concept, not just the solution
40
+ 2. **No Direct Solutions**: Never say "change line X to Y" - guide understanding instead
41
+ 3. **Conceptual Depth**: Reference domain definitions (e.g., what makes a number prime)
42
+ 4. **Diagnostic Approach**: Help students understand WHY the bug exists
43
+ 5. **Encouraging Tone**: Be supportive and constructive
44
+
45
+ ## RESPONSE FORMAT
46
+ Provide a single, clear paragraph of feedback that:
47
+ - Identifies the conceptual error
48
+ - Explains the underlying principle
49
+ - Guides the student toward understanding the fix
50
+ - Keeps technical language appropriate for the student's level
51
+
52
+ ## EXAMPLE GOOD FEEDBACK
53
+ "The bug relates to how arrays are passed to functions in C. When you use 'sizeof' on an array parameter, it doesn't give you the original array size—it returns the size of the pointer. Consider how array information is lost during function calls and what additional parameter you might need to track the actual array length."
54
+
55
+ ## AVOID
56
+ - Direct code fixes: "Change `return 1` to `return 0`"
57
+ - Line-specific instructions: "On line 5, modify..."
58
+ - Giving away the answer: "The problem is you start count at 1 instead of 0"
59
+ """
60
+
61
+ INSTRUCTOR_USER_PROMPT_TEMPLATE = """**Student Submission:**
62
+
63
+ **Exercise**: {theme}
64
+ **Difficulty**: {difficulty}
65
+ **Error Type**: {error_category}
66
+
67
+ **Instructions**: {instructions}
68
+
69
+ **Student's Buggy Code**:
70
+ ```c
71
+ {code}
72
+ ```
73
+
74
+ **Test Results**:
75
+ - Test Scope: {test_cases_scope}
76
+ - Failed Tests: {failed_tests}
77
+
78
+ ---
79
+
80
+ **Task**: As an instructor, provide educational feedback to help this student understand and fix the bug. Focus on the underlying concepts, not the direct solution.
81
+ """
data/.gitkeep ADDED
File without changes
datasets/dataset_c_piscine_semantic.jsonl ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {"theme": "Check for Prime Number", "error_category": "Incorrect Base Case Return Value", "instructions": "Write a C function that checks if a given integer is a prime number. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. The function should return 1 if the number is prime, and 0 otherwise. Ensure to handle edge cases correctly, such as numbers less than 2.", "code": "#include <stdio.h>\n#include <stdbool.h>\n\nint is_prime(int n) {\n if (n <= 1) {\n return 1; // Target Bug: Should return 0 for non-prime numbers (n <= 1)\n }\n \n for (int i = 2; i * i <= n; i++) {\n if (n % i == 0) {\n return 0;\n }\n }\n return 1;\n}\n\nint main() {\n int num;\n printf(\"Enter an integer: \");\n scanf(\"%d\", &num);\n \n if (is_prime(num)) {\n printf(\"%d is a prime number.\\n\", num);\n } else {\n printf(\"%d is not a prime number.\\n\", num);\n }\n \n return 0;\n}", "test_cases_scope": ["Input: 1", "Input: 0", "Input: -5", "Input: 2", "Input: 3", "Input: 4", "Input: 17", "Input: 100"], "failed_tests": ["Input: 1", "Input: 0", "Input: -5"], "feedback": "The bug in this code is related to the base case handling for prime number checking. In the function, when checking if a number is prime, it's crucial to correctly identify numbers that are not prime by definition. Specifically, prime numbers are defined as natural numbers greater than 1. This means any number less than or equal to 1 (including negative numbers, zero, and one) should be classified as non-prime. The current implementation incorrectly returns 1 (true) for these cases, which mislabels them as prime. To fix this, review the mathematical definition of prime numbers and ensure the base case returns the appropriate value (0 for false) when the input does not meet the criteria. This teaches the importance of aligning code logic with domain-specific definitions and carefully handling edge cases in conditional statements."}
2
+ {"theme": "Array Processing", "error_category": "Logic Error", "instructions": "Write a function 'find_max' that takes an integer array and its size as arguments, and returns the maximum value in the array. The array may contain both positive and negative integers. If the array is empty (size 0), return the minimum possible integer value (INT_MIN).", "code": "#include <stdio.h>\n#include <limits.h>\n\nint find_max(int arr[], int n) {\n if (n <= 0) {\n return INT_MIN;\n }\n int max_val = arr[0];\n for (int i = 1; i < n; i = i) { // Target Bug: Infinite loop due to missing increment (i = i instead of i++)\n if (arr[i] > max_val) {\n max_val = arr[i];\n }\n }\n return max_val;\n}\n\nint main() {\n // Example usage (not part of core function, but included for context)\n int test1[] = {3, 7, 2, 9, 1};\n int size1 = 5;\n printf(\"Max in test1: %d\\n\", find_max(test1, size1));\n \n int test2[] = {-5, -1, -10};\n int size2 = 3;\n printf(\"Max in test2: %d\\n\", find_max(test2, size2));\n \n return 0;\n}", "test_cases_scope": "The function should handle arrays of various sizes, including edge cases like empty arrays, single-element arrays, and arrays with negative values.", "failed_tests": [{"input": "arr = [3, 7, 2, 9, 1], n = 5", "expected_output": "9", "actual_output": "Program hangs (infinite loop), no output produced"}, {"input": "arr = [-5, -1, -10], n = 3", "expected_output": "-1", "actual_output": "Program hangs (infinite loop), no output produced"}, {"input": "arr = [42], n = 1", "expected_output": "42", "actual_output": "Program hangs (infinite loop), no output produced"}], "feedback": "The code correctly initializes the maximum value and has a proper loop condition to iterate through the array indices. However, the loop does not progress because the loop variable is not updated in each iteration. In C, for-loops require a well-defined update expression to ensure termination; without it, the loop condition remains true indefinitely if the initial condition is met. This highlights the importance of ensuring that loop control variables are modified appropriately to avoid infinite execution, which can cause programs to hang and fail to produce results."}
3
+ {"theme": "N-Queens Problem (Backtracking)", "error_category": "Integer Overflow (Unchecked addition/multiplication)", "instructions": "Implement a function to count the number of solutions to the N-Queens problem using backtracking. The function should take an integer n (board size) and return the total number of valid queen placements. Use a standard backtracking approach with a 1D array to store column positions.", "code": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\nint isSafe(int *board, int row, int col) {\n for (int i = 0; i < row; i++) {\n if (board[i] == col || abs(board[i] - col) == abs(i - row)) {\n return 0;\n }\n }\n return 1;\n}\n\nvoid solveNQueensUtil(int n, int row, int *board, int *count) {\n if (row == n) {\n (*count)++;\n return;\n }\n for (int col = 0; col < n; col++) {\n if (isSafe(board, row, col)) {\n board[row] = col;\n solveNQueensUtil(n, row + 1, board, count);\n }\n }\n}\n\nint totalNQueens(int n) {\n if (n <= 0) return 0;\n int *board = (int *)malloc(n * sizeof(int));\n if (!board) return -1;\n memset(board, -1, n * sizeof(int));\n int count = 0;\n solveNQueensUtil(n, 0, board, &count);\n free(board);\n return count;\n}\n\nint main() {\n int n;\n printf(\"Enter board size n: \");\n scanf(\"%d\", &n);\n int result = totalNQueens(n);\n if (result >= 0) {\n printf(\"Number of solutions for %d-Queens: %d\\n\", n, result);\n } else {\n printf(\"Memory allocation failed.\\n\");\n }\n return 0;\n}", "test_cases_scope": ["Test with n=1 (should return 1)", "Test with n=4 (should return 2)", "Test with n=8 (should return 92)", "Test with n=10 (should return 724)", "Test with n=12 (should return 14200)", "Test with n=14 (should return 365596)", "Test with n=0 or negative (should return 0)", "Memory check for n=10 (no leaks)"], "failed_tests": ["Test with n=12 (should return 14200, but may return incorrect value due to overflow)", "Test with n=14 (should return 365596, but may return incorrect value due to overflow)", "Larger n values (e.g., n=15 or higher) may produce wildly incorrect or negative results due to integer overflow in the count variable"], "feedback": "Your backtracking logic is correct, but consider the range of values your count variable can hold. For larger board sizes, the number of solutions grows rapidly—what happens when it exceeds the maximum value of an int? Think about how to handle potential overflow in arithmetic operations, especially when incrementing counters in recursive functions."}
4
+ {"theme": "Bit Manipulation with Pointer Errors", "error_category": "Invalid Pointer Cast", "instructions": "Write a C function that performs bit operations on an integer: set a specific bit to 1, unset a specific bit to 0, and toggle a specific bit (flip its value). The function should take an integer pointer and an operation code, and modify the integer in place. The bug involves an incorrect pointer cast that leads to undefined behavior.", "code": "#include <stdio.h>\n#include <stdlib.h>\n\ntypedef enum {\n SET_BIT,\n UNSET_BIT,\n TOGGLE_BIT\n} BitOperation;\n\nvoid bit_operation(int *num, BitOperation op, int bit_pos) {\n if (bit_pos < 0 || bit_pos >= (int)(sizeof(int) * 8)) {\n printf(\"Error: Bit position %d out of range.\\n\", bit_pos);\n return;\n }\n \n // Target Bug: Invalid pointer cast from int* to char*\n // This cast is incorrect because it assumes char* points to the same memory layout as int*,\n // leading to potential misalignment and undefined behavior when dereferencing.\n char *byte_ptr = (char *)num; // Bug: Casting int* to char* without proper handling\n \n // Attempt to access the specific byte containing the bit\n int byte_index = bit_pos / 8;\n int bit_in_byte = bit_pos % 8;\n \n // Dereferencing byte_ptr after invalid cast can cause issues\n char target_byte = byte_ptr[byte_index]; // Potential misalignment or segfault\n \n switch (op) {\n case SET_BIT:\n target_byte |= (1 << bit_in_byte);\n break;\n case UNSET_BIT:\n target_byte &= ~(1 << bit_in_byte);\n break;\n case TOGGLE_BIT:\n target_byte ^= (1 << bit_in_byte);\n break;\n default:\n printf(\"Error: Invalid operation.\\n\");\n return;\n }\n \n // Write back the modified byte\n byte_ptr[byte_index] = target_byte;\n}\n\nint main() {\n // Example usage\n int value = 0; // Binary: 0000...0000\n \n printf(\"Initial value: %d\\n\", value);\n \n bit_operation(&value, SET_BIT, 3); // Should set bit 3 to 1\n printf(\"After setting bit 3: %d\\n\", value);\n \n bit_operation(&value, TOGGLE_BIT, 3); // Should toggle bit 3 back to 0\n printf(\"After toggling bit 3: %d\\n\", value);\n \n bit_operation(&value, UNSET_BIT, 5); // Should unset bit 5 (already 0)\n printf(\"After unsetting bit 5: %d\\n\", value);\n \n return 0;\n}", "test_cases_scope": "The tests cover valid bit positions (0-31 for typical 32-bit ints), all operation types, and edge cases like out-of-range positions. The bug causes failures in memory access and alignment.", "failed_tests": [{"input": "bit_operation(&value, SET_BIT, 3) with value = 0", "expected_output": "value becomes 8 (binary 1000)", "actual_output": "Undefined behavior: may output 8, crash with segfault, or produce incorrect value due to misaligned access"}, {"input": "bit_operation(&value, TOGGLE_BIT, 10) with value = 1024", "expected_output": "value becomes 0 (toggling bit 10 in 1024)", "actual_output": "Undefined behavior: potential segfault or memory corruption"}, {"input": "bit_operation(&value, UNSET_BIT, 31) with value = 2147483648 on 32-bit system", "expected_output": "value becomes 0", "actual_output": "Undefined behavior: alignment error or crash"}, {"input": "Memory check with Valgrind or sanitizers", "expected_output": "No memory errors", "actual_output": "Errors reported: invalid read/write due to misaligned pointer access"}], "feedback": "The code attempts to manipulate individual bits by casting an integer pointer to a character pointer to access bytes directly. However, this approach overlooks critical aspects of pointer aliasing and memory alignment in C. When you cast between pointer types of different sizes and alignments (like from int* to char*), the compiler may not guarantee safe dereferencing because char* has different alignment requirements. This can lead to undefined behavior, such as segmentation faults or incorrect memory access, especially on architectures with strict alignment rules. The issue is not just about type conversion but about how pointers interpret memory layouts and the guarantees provided by the C standard regarding pointer arithmetic and access. Consider the implications of pointer casts on memory safety and alignment constraints."}
5
+ {"theme": "Print Binary Representation of Int", "error_category": "Incorrect Base Case Return Value", "instructions": "Write a recursive function `print_binary` that takes an integer `n` and prints its binary representation to stdout. For example, input 5 should output '101'. The function should handle positive integers only (assume input is non-negative). Use recursion to break down the number by dividing by 2.", "code": "#include <stdio.h>\n\nvoid print_binary(int n) {\n if (n == 0) {\n printf(\"0\");\n return;\n }\n \n if (n == 1) {\n printf(\"1\");\n return;\n }\n \n print_binary(n / 2);\n printf(\"%d\", n % 2);\n}\n\nint main() {\n int num;\n printf(\"Enter a non-negative integer: \");\n scanf(\"%d\", &num);\n \n if (num < 0) {\n printf(\"Error: Input must be non-negative.\\n\");\n return 1;\n }\n \n printf(\"Binary representation: \");\n print_binary(num);\n printf(\"\\n\");\n return 0;\n}", "test_cases_scope": ["Test with input 0: should output '0'", "Test with input 1: should output '1'", "Test with input 5: should output '101'", "Test with input 10: should output '1010'", "Test with input 2: should output '10'", "Test with input 7: should output '111'"], "failed_tests": ["Test with input 0: outputs '0' (correct)", "Test with input 1: outputs '1' (correct)", "Test with input 5: outputs '101' (correct)", "Test with input 10: outputs '1010' (correct)", "Test with input 2: outputs '10' (correct)", "Test with input 7: outputs '111' (correct)"], "feedback": "Your recursive function correctly handles the base cases for n == 0 and n == 1, but consider whether the base case logic aligns with the mathematical definition of binary representation. Think about what happens when n is exactly 1: does printing '1' and returning immediately ensure proper recursion for all inputs? Review how the recursive call and printing order work together, especially for numbers where the most significant bit might be affected by the base case handling."}
6
+ {"theme": "Implement 'tail' (Last N Lines)", "error_category": "Array decay misunderstanding (sizeof on array parameter)", "instructions": "Write a function 'print_last_n_lines' that takes an array of strings (lines) and an integer n, and prints the last n lines. If n is greater than the number of lines, print all lines. The function should handle the array correctly.", "code": "#include <stdio.h>\n#include <string.h>\n\nvoid print_last_n_lines(char *lines[], int n) {\n int total_lines = sizeof(lines) / sizeof(lines[0]);\n int start_index;\n \n if (n <= 0) {\n return;\n }\n \n if (n > total_lines) {\n start_index = 0;\n } else {\n start_index = total_lines - n;\n }\n \n for (int i = start_index; i < total_lines; i++) {\n printf(\"%s\\n\", lines[i]);\n }\n}\n\nint main() {\n char *test_lines[] = {\"Line 1\", \"Line 2\", \"Line 3\", \"Line 4\", \"Line 5\"};\n printf(\"Last 3 lines:\\n\");\n print_last_n_lines(test_lines, 3);\n return 0;\n}", "test_cases_scope": ["Test with n=3 on a 5-element array (should print lines 3-5)", "Test with n=5 on a 5-element array (should print all lines)", "Test with n=10 on a 5-element array (should print all lines)", "Test with n=0 (should print nothing)", "Test with different array sizes to check boundary conditions"], "failed_tests": ["Test with n=3 on a 5-element array (prints incorrect lines or crashes)", "Test with n=5 on a 5-element array (prints incorrect lines or crashes)", "Test with n=10 on a 5-element array (prints incorrect lines or crashes)", "Test with different array sizes (consistently produces wrong output)"], "feedback": "Think about how arrays are passed to functions in C. When you use 'sizeof' on an array parameter inside a function, does it give you the size of the original array or something else? Consider what happens to array information during function calls and how you might need to pass additional information to handle this correctly."}
7
+ {"theme": "Stack Implementation (Array-based)", "error_category": "Switch Case Fallthrough (Missing break)", "instructions": "Implement a stack using an array with functions to push, pop, and display elements. The program should present a menu to the user with options: 1) Push, 2) Pop, 3) Display, 4) Exit. Use a switch statement to handle the menu choices. The bug is a missing break statement in the switch case, causing unintended fallthrough behavior.", "code": "#include <stdio.h>\n#include <stdlib.h>\n\n#define MAX 5\n\nint stack[MAX];\nint top = -1;\n\nvoid push(int value) {\n if (top == MAX - 1) {\n printf(\"Stack Overflow!\\n\");\n } else {\n top++;\n stack[top] = value;\n printf(\"Pushed %d onto the stack.\\n\", value);\n }\n}\n\nvoid pop() {\n if (top == -1) {\n printf(\"Stack Underflow!\\n\");\n } else {\n printf(\"Popped %d from the stack.\\n\", stack[top]);\n top--;\n }\n}\n\nvoid display() {\n if (top == -1) {\n printf(\"Stack is empty.\\n\");\n } else {\n printf(\"Stack elements: \");\n for (int i = 0; i <= top; i++) {\n printf(\"%d \", stack[i]);\n }\n printf(\"\\n\");\n }\n}\n\nint main() {\n int choice, value;\n \n while (1) {\n printf(\"\\nMenu:\\n\");\n printf(\"1. Push\\n\");\n printf(\"2. Pop\\n\");\n printf(\"3. Display\\n\");\n printf(\"4. Exit\\n\");\n printf(\"Enter your choice: \");\n scanf(\"%d\", &choice);\n \n switch (choice) {\n case 1:\n printf(\"Enter value to push: \");\n scanf(\"%d\", &value);\n push(value);\n // Missing break statement here\n case 2:\n pop();\n break;\n case 3:\n display();\n break;\n case 4:\n printf(\"Exiting...\\n\");\n exit(0);\n default:\n printf(\"Invalid choice!\\n\");\n }\n }\n \n return 0;\n}", "test_cases_scope": ["Test 1: Push operation (e.g., push 10, then display)", "Test 2: Pop operation (e.g., push 20, then pop)", "Test 3: Multiple operations sequence (e.g., push 30, push 40, pop, display)", "Test 4: Invalid menu choice handling", "Test 5: Stack overflow and underflow conditions"], "failed_tests": ["Test 1: Input: Choose option 1, push 10 -> Expected Output: 'Pushed 10 onto the stack.' then menu again. Actual Output: 'Pushed 10 onto the stack.' followed immediately by 'Stack Underflow!' (since it falls through to case 2).", "Test 3: Input: Choose option 1, push 30; then choose option 1, push 40; then choose option 2 -> Expected Output: 'Pushed 30 onto the stack.', 'Pushed 40 onto the stack.', 'Popped 40 from the stack.'. Actual Output: After pushing 30, it falls through to pop() and may pop incorrectly or cause underflow, disrupting the sequence."], "feedback": "In C, switch cases execute sequentially until a break statement is encountered. Without break, control 'falls through' to the next case, which can lead to unintended multiple operations. Review how switch statements handle flow control to ensure each case is properly terminated."}
8
+ {"theme": "Singly Linked List: Get Element at Index", "error_category": "Off-by-one Error (0-indexed vs 1-indexed)", "instructions": "Write a function `get_at_index` that returns the data value at a given index in a singly linked list. The list uses 0-based indexing (first node at index 0). Return -1 if the index is out of bounds.", "code": "#include <stdio.h>\n#include <stdlib.h>\n\ntypedef struct Node {\n int data;\n struct Node* next;\n} Node;\n\nNode* create_node(int value) {\n Node* new_node = (Node*)malloc(sizeof(Node));\n if (new_node == NULL) {\n printf(\"Memory allocation failed\\n\");\n return NULL;\n }\n new_node->data = value;\n new_node->next = NULL;\n return new_node;\n}\n\nint get_at_index(Node* head, int index) {\n Node* current = head;\n int count = 1; // Bug: Should start at 0 for 0-based indexing\n \n while (current != NULL) {\n if (count == index) { // Bug: Compares count with index, but count starts at 1\n return current->data;\n }\n current = current->next;\n count++;\n }\n \n return -1; // Index out of bounds\n}\n\nint main() {\n // Example usage\n Node* head = create_node(10);\n head->next = create_node(20);\n head->next->next = create_node(30);\n \n printf(\"Element at index 0: %d\\n\", get_at_index(head, 0));\n printf(\"Element at index 1: %d\\n\", get_at_index(head, 1));\n printf(\"Element at index 2: %d\\n\", get_at_index(head, 2));\n printf(\"Element at index 3: %d\\n\", get_at_index(head, 3));\n \n // Free memory (simplified for example)\n free(head->next->next);\n free(head->next);\n free(head);\n \n return 0;\n}", "test_cases_scope": ["Test 1: Index 0 on list [10,20,30] -> Expected 10", "Test 2: Index 1 on list [10,20,30] -> Expected 20", "Test 3: Index 2 on list [10,20,30] -> Expected 30", "Test 4: Index 3 on list [10,20,30] -> Expected -1 (out of bounds)", "Test 5: Index -1 on list [10,20,30] -> Expected -1 (out of bounds)", "Test 6: Empty list, any index -> Expected -1"], "failed_tests": ["Test 1: Index 0 on list [10,20,30] -> Returns -1 instead of 10", "Test 2: Index 1 on list [10,20,30] -> Returns 10 instead of 20", "Test 3: Index 2 on list [10,20,30] -> Returns 20 instead of 30", "Test 4: Index 3 on list [10,20,30] -> Returns 30 instead of -1"], "feedback": "When implementing functions that use indices, carefully consider whether your indexing starts at 0 or 1. In C, linked lists and arrays typically use 0-based indexing, meaning the first element is at position 0. Ensure your counter variable and comparison logic align with this convention to avoid off-by-one errors."}
9
+ {"theme": "Sieve of Eratosthenes (Primes up to N)", "error_category": "Semicolon after Loop/If (Empty body)", "instructions": "This code implements the Sieve of Eratosthenes algorithm to find all prime numbers up to a given integer N. It uses a boolean array to mark non-prime numbers, starting from 2. The code has a deliberate error where a semicolon is placed after a loop or if statement, causing an empty body. Your task is to analyze the code, identify the bug, and understand how it leads to incorrect behavior without fixing it directly.", "code": "#include <stdio.h>\n#include <stdlib.h>\n#include <stdbool.h>\n\nvoid sieve_of_eratosthenes(int n) {\n if (n < 2) {\n printf(\"No primes less than 2.\\n\");\n return;\n }\n \n bool *is_prime = (bool *)malloc((n + 1) * sizeof(bool));\n if (is_prime == NULL) {\n printf(\"Memory allocation failed.\\n\");\n return;\n }\n \n for (int i = 0; i <= n; i++) {\n is_prime[i] = true;\n }\n is_prime[0] = false;\n is_prime[1] = false;\n \n for (int p = 2; p * p <= n; p++) {\n if (is_prime[p] == true); {\n for (int i = p * p; i <= n; i += p) {\n is_prime[i] = false;\n }\n }\n }\n \n printf(\"Prime numbers up to %d:\\n\", n);\n for (int i = 2; i <= n; i++) {\n if (is_prime[i]) {\n printf(\"%d \", i);\n }\n }\n printf(\"\\n\");\n \n free(is_prime);\n}\n\nint main() {\n int n;\n printf(\"Enter a positive integer N: \");\n scanf(\"%d\", &n);\n sieve_of_eratosthenes(n);\n return 0;\n}", "test_cases_scope": "The test cases cover various inputs to verify the correctness of the Sieve of Eratosthenes algorithm, including edge cases, small numbers, and typical values. They check for correct prime identification and proper memory management.", "failed_tests": ["Input: 10, Expected Output: Prime numbers up to 10: 2 3 5 7, Actual Output: Prime numbers up to 10: 2 3 4 5 6 7 8 9 10", "Input: 20, Expected Output: Prime numbers up to 20: 2 3 5 7 11 13 17 19, Actual Output: Prime numbers up to 20: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20", "Input: 5, Expected Output: Prime numbers up to 5: 2 3 5, Actual Output: Prime numbers up to 5: 2 3 4 5"], "feedback": "The bug is a semicolon placed immediately after the 'if' condition in the marking loop, creating an empty body. This causes the inner 'for' loop, intended to mark multiples of prime numbers as non-prime, to always execute regardless of the condition. As a result, no numbers are properly marked as non-prime, leading the algorithm to output all numbers from 2 to N as primes. Focus on how control flow is affected by misplaced semicolons and ensure that loop or if bodies are correctly enclosed without unintended termination."}
10
+ {"theme": "Flood Fill Algorithm", "error_category": "Shadowing Variable (Local variable hides parameter)", "instructions": "Implement a flood fill function that recursively fills a 2D grid of characters from a starting point (x, y) with a new character 'new_char', replacing the target character 'target_char'. The grid is represented as a 2D array with dimensions 'rows' and 'cols'. Use recursion to traverse up, down, left, and right, ensuring bounds checking to avoid out-of-range access.", "code": "#include <stdio.h>\n#include <stdlib.h>\n\nvoid flood_fill(char **grid, int rows, int cols, int x, int y, char target_char, char new_char) {\n if (x < 0 || x >= rows || y < 0 || y >= cols) {\n return;\n }\n if (grid[x][y] != target_char) {\n return;\n }\n \n grid[x][y] = new_char;\n \n int rows = rows; // Shadowing variable: local 'rows' hides parameter 'rows'\n int cols = cols; // Shadowing variable: local 'cols' hides parameter 'cols'\n \n flood_fill(grid, rows, cols, x + 1, y, target_char, new_char);\n flood_fill(grid, rows, cols, x - 1, y, target_char, new_char);\n flood_fill(grid, rows, cols, x, y + 1, target_char, new_char);\n flood_fill(grid, rows, cols, x, y - 1, target_char, new_char);\n}\n\nint main() {\n // Example usage for testing\n int rows = 5, cols = 5;\n char **grid = (char **)malloc(rows * sizeof(char *));\n for (int i = 0; i < rows; i++) {\n grid[i] = (char *)malloc(cols * sizeof(char));\n for (int j = 0; j < cols; j++) {\n grid[i][j] = '.';\n }\n }\n \n // Set a region to fill\n grid[1][1] = 'X';\n grid[1][2] = 'X';\n grid[2][1] = 'X';\n grid[2][2] = 'X';\n \n printf(\"Before fill:\\n\");\n for (int i = 0; i < rows; i++) {\n for (int j = 0; j < cols; j++) {\n printf(\"%c \", grid[i][j]);\n }\n printf(\"\\n\");\n }\n \n flood_fill(grid, rows, cols, 1, 1, 'X', 'O');\n \n printf(\"\\nAfter fill:\\n\");\n for (int i = 0; i < rows; i++) {\n for (int j = 0; j < cols; j++) {\n printf(\"%c \", grid[i][j]);\n }\n printf(\"\\n\");\n }\n \n for (int i = 0; i < rows; i++) {\n free(grid[i]);\n }\n free(grid);\n return 0;\n}", "test_cases_scope": ["Test 1: Fill a 5x5 grid with a 2x2 'X' region starting at (1,1), expecting all 'X's replaced with 'O's", "Test 2: Fill a 3x3 grid with a single 'X' at (0,0), expecting only that cell changed", "Test 3: Fill a 4x4 grid with no 'X's, expecting no changes", "Test 4: Fill starting out of bounds, expecting no changes to grid"], "failed_tests": ["Test 1: Input grid with 'X's at (1,1), (1,2), (2,1), (2,2); Output grid has only starting cell (1,1) changed to 'O', others remain 'X'", "Test 2: Input grid with 'X' at (0,0); Output grid has cell (0,0) changed to 'O', but recursion stops early, missing adjacent cells if present"], "feedback": "In recursive functions, parameter variables are used to pass data between calls. If you declare a local variable with the same name as a parameter, it creates a shadowing effect where the local variable hides the parameter within its scope. This can lead to incorrect values being passed in recursive calls, especially when those values are meant to be constant (like array dimensions). Review how variable scope works in C and ensure that recursive calls receive the intended parameters without unintended modifications."}
deepseek_caller.py ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ DeepSeek API Wrapper for Feedback Generation
3
+ """
4
+
5
+ from openai import OpenAI
6
+ from typing import Dict, Any
7
+ import time
8
+ from config import (
9
+ DEEPSEEK_API_KEY,
10
+ DEEPSEEK_BASE_URL,
11
+ DEEPSEEK_MODEL,
12
+ DEEPSEEK_TEMPERATURE,
13
+ DEEPSEEK_MAX_TOKENS,
14
+ INSTRUCTOR_SYSTEM_PROMPT,
15
+ INSTRUCTOR_USER_PROMPT_TEMPLATE
16
+ )
17
+
18
+ class DeepSeekCaller:
19
+ def __init__(self):
20
+ if not DEEPSEEK_API_KEY:
21
+ raise ValueError("DEEPSEEK_API_KEY not found in environment")
22
+
23
+ self.client = OpenAI(
24
+ api_key=DEEPSEEK_API_KEY,
25
+ base_url=DEEPSEEK_BASE_URL
26
+ )
27
+
28
+ def generate_feedback(self, context: Dict[str, Any]) -> Dict[str, Any]:
29
+ """
30
+ Génère un feedback pédagogique pour un code bugué.
31
+
32
+ Args:
33
+ context: {
34
+ "theme": str,
35
+ "difficulty": str,
36
+ "error_category": str,
37
+ "instructions": str,
38
+ "code": str,
39
+ "test_cases_scope": list,
40
+ "failed_tests": list
41
+ }
42
+
43
+ Returns:
44
+ {
45
+ "feedback": str,
46
+ "tokens_prompt": int,
47
+ "tokens_completion": int,
48
+ "tokens_total": int,
49
+ "generation_time_ms": float
50
+ }
51
+ """
52
+ start_time = time.time()
53
+
54
+ # Formater le prompt
55
+ user_prompt = INSTRUCTOR_USER_PROMPT_TEMPLATE.format(
56
+ theme=context.get('theme', 'N/A'),
57
+ difficulty=context.get('difficulty', 'intermediate'),
58
+ error_category=context.get('error_category', 'Unknown'),
59
+ instructions=context.get('instructions', 'No instructions provided'),
60
+ code=context.get('code', ''),
61
+ test_cases_scope=str(context.get('test_cases_scope', [])),
62
+ failed_tests=str(context.get('failed_tests', []))
63
+ )
64
+
65
+ try:
66
+ response = self.client.chat.completions.create(
67
+ model=DEEPSEEK_MODEL,
68
+ messages=[
69
+ {"role": "system", "content": INSTRUCTOR_SYSTEM_PROMPT},
70
+ {"role": "user", "content": user_prompt}
71
+ ],
72
+ temperature=DEEPSEEK_TEMPERATURE,
73
+ max_tokens=DEEPSEEK_MAX_TOKENS
74
+ )
75
+
76
+ feedback = response.choices[0].message.content
77
+ usage = response.usage
78
+
79
+ generation_time = (time.time() - start_time) * 1000 # en ms
80
+
81
+ return {
82
+ "feedback": feedback,
83
+ "tokens_prompt": usage.prompt_tokens,
84
+ "tokens_completion": usage.completion_tokens,
85
+ "tokens_total": usage.total_tokens,
86
+ "generation_time_ms": generation_time
87
+ }
88
+
89
+ except Exception as e:
90
+ return {
91
+ "feedback": None,
92
+ "error": str(e),
93
+ "tokens_prompt": 0,
94
+ "tokens_completion": 0,
95
+ "tokens_total": 0,
96
+ "generation_time_ms": (time.time() - start_time) * 1000
97
+ }
98
+
99
+ def test_connection(self) -> bool:
100
+ """Test si l'API est accessible"""
101
+ try:
102
+ response = self.client.chat.completions.create(
103
+ model=DEEPSEEK_MODEL,
104
+ messages=[{"role": "user", "content": "Hi"}],
105
+ max_tokens=5
106
+ )
107
+ return True
108
+ except Exception as e:
109
+ print(f"API Test Failed: {e}")
110
+ return False
pages/benchmark.py ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import json
4
+ import os
5
+ from datetime import datetime
6
+
7
+ # ==========================================
8
+ # CONFIGURATION & CONSTANTS
9
+ # ==========================================
10
+ st.set_page_config(layout="wide", page_title="FFGen - Dataset Benchmark")
11
+
12
+ # File paths
13
+ DATASET_SOURCE = "datasets/dataset_c_piscine_semantic.jsonl"
14
+ OUTPUT_CSV = "benchmarks/human_evaluation_results.csv"
15
+
16
+ # Ensure directories exist
17
+ os.makedirs("benchmarks", exist_ok=True)
18
+ os.makedirs("datasets", exist_ok=True)
19
+
20
+ # ==========================================
21
+ # UTILITY FUNCTIONS
22
+ # ==========================================
23
+
24
+ @st.cache_data
25
+ def load_dataset_sample(filepath, sample_size=100):
26
+ """Loads a sample of the dataset for evaluation"""
27
+ data = []
28
+ if os.path.exists(filepath):
29
+ with open(filepath, 'r') as f:
30
+ for i, line in enumerate(f):
31
+ if i >= sample_size: break
32
+ if line.strip():
33
+ try:
34
+ data.append(json.loads(line))
35
+ except: continue
36
+ else:
37
+ st.error(f"File not found: {filepath}")
38
+ return pd.DataFrame()
39
+ return pd.DataFrame(data)
40
+
41
+ def save_evaluation(eval_data):
42
+ """Saves a single evaluation line"""
43
+ df = pd.DataFrame([eval_data])
44
+ if not os.path.exists(OUTPUT_CSV):
45
+ df.to_csv(OUTPUT_CSV, index=False)
46
+ else:
47
+ df.to_csv(OUTPUT_CSV, mode='a', header=False, index=False)
48
+
49
+ # ==========================================
50
+ # STATE MANAGEMENT (SESSION STATE)
51
+ # ==========================================
52
+ if 'current_idx' not in st.session_state:
53
+ st.session_state.current_idx = 0
54
+
55
+ # Load data
56
+ df = load_dataset_sample(DATASET_SOURCE)
57
+
58
+ # ==========================================
59
+ # MAIN INTERFACE
60
+ # ==========================================
61
+
62
+ st.title("FFGen Benchmark: Dataset Quality Audit")
63
+ st.markdown("""
64
+ **Evaluation Protocol:**
65
+ This interface aims to establish an objective quality score for the dataset.
66
+ Please evaluate each Code/Feedback pair according to the strict criteria below.
67
+ """)
68
+
69
+ if not df.empty:
70
+ # Progress bar
71
+ progress = st.session_state.current_idx / len(df)
72
+ st.progress(progress, text=f"Progress: {st.session_state.current_idx + 1}/{len(df)}")
73
+
74
+ # Get current entry
75
+ if st.session_state.current_idx < len(df):
76
+ row = df.iloc[st.session_state.current_idx]
77
+
78
+ # --- LAYOUT: 2 COLUMNS ---
79
+ col_data, col_eval = st.columns([1.2, 0.8])
80
+
81
+ # --- LEFT COLUMN: DATA TO EVALUATE ---
82
+ with col_data:
83
+ st.subheader("🔍 Dataset Entry")
84
+
85
+ # Context
86
+ with st.expander("View exercise context", expanded=False):
87
+ st.markdown(f"**Theme:** {row.get('theme', 'N/A')}")
88
+ st.markdown(f"**Simulated Error:** {row.get('error_category', 'N/A')}")
89
+ st.text(row.get('instructions', ''))
90
+
91
+ # Code
92
+ st.markdown("### Student Code (Input)")
93
+ st.code(row.get('code', ''), language='c')
94
+
95
+ # Feedback
96
+ st.markdown("### AI Feedback (Output)")
97
+ st.info(row.get('feedback', ''))
98
+
99
+ # --- RIGHT COLUMN: EVALUATION GRID ---
100
+ with col_eval:
101
+ st.subheader("Audit Grid")
102
+
103
+ with st.form("benchmark_form"):
104
+
105
+ # CRITERION 1: TECHNICAL VALIDITY (Factual)
106
+ st.markdown("#### 1. Technical Validity")
107
+ technical_validity = st.radio(
108
+ "Does the feedback describe a correct technical reality in C?",
109
+ options=["Yes (Correct)", "No (Hallucination/False)", "Debatable (Imprecise)"],
110
+ index=None,
111
+ help="If the AI mentions a non-existent function or incorrect memory behavior, select NO."
112
+ )
113
+
114
+ # CRITERION 2: DIAGNOSTIC (Factual)
115
+ st.markdown("#### 2. Diagnostic Precision")
116
+ diagnostic_quality = st.radio(
117
+ "Does the feedback identify the root cause?",
118
+ options=[
119
+ "Root Cause (e.g., missing malloc)",
120
+ "Symptom Only (e.g., infinite loop/memory leak)",
121
+ "Off-topic"
122
+ ],
123
+ index=None
124
+ )
125
+
126
+ # CRITERION 3: PEDAGOGY (Epitech Method)
127
+ st.markdown("#### 3. Pedagogical Compliance")
128
+
129
+ col_c1, col_c2 = st.columns(2)
130
+ with col_c1:
131
+ has_spoiler = st.checkbox("Contains solution (Fixed code/Spoiler)")
132
+ with col_c2:
133
+ is_maieutic = st.checkbox("Asks a question or uses a guiding concept")
134
+
135
+ # CRITERION 4: ACTIONABILITY
136
+ st.markdown("#### 4. Actionability")
137
+ actionability = st.select_slider(
138
+ "Can a beginner student fix their code using this feedback?",
139
+ options=["Impossible", "Difficult", "Doable", "Immediate"]
140
+ )
141
+
142
+ expert_name = st.text_input("Evaluator Name", value="Expert_1")
143
+ comment = st.text_area("Specific comment (optional)")
144
+
145
+ # SUBMISSION BUTTONS
146
+ st.divider()
147
+ submitted = st.form_submit_button("💾 Save Evaluation", type="primary")
148
+
149
+ if submitted:
150
+ if technical_validity is None or diagnostic_quality is None:
151
+ st.error("Technical and diagnostic criteria are mandatory.")
152
+ else:
153
+ # Create data object
154
+ eval_entry = {
155
+ "dataset_id": row.get('id', st.session_state.current_idx),
156
+ "evaluator": expert_name,
157
+ "timestamp": datetime.now().isoformat(),
158
+ "tech_validity": technical_validity,
159
+ "diagnostic_score": diagnostic_quality,
160
+ "has_spoiler": has_spoiler,
161
+ "is_maieutic": is_maieutic,
162
+ "actionability": actionability,
163
+ "comment": comment
164
+ }
165
+
166
+ save_evaluation(eval_entry)
167
+ st.success("Validated!")
168
+ st.session_state.current_idx += 1
169
+ st.rerun()
170
+
171
+ else:
172
+ st.success("Audit complete for this sample!")
173
+
174
+ # Immediate summary dashboard
175
+ if os.path.exists(OUTPUT_CSV):
176
+ res_df = pd.read_csv(OUTPUT_CSV)
177
+ st.divider()
178
+ st.subheader("Preliminary Results")
179
+
180
+ c1, c2, c3 = st.columns(3)
181
+
182
+ # Calculate validity rate
183
+ # Note: We check if string contains "Yes" to match the English option
184
+ valid_count = res_df[res_df['tech_validity'].str.contains("Yes")].shape[0]
185
+ valid_rate = (valid_count / len(res_df)) * 100
186
+ c1.metric("Technical Validity", f"{valid_rate:.1f}%")
187
+
188
+ # Calculate spoiler rate
189
+ spoil_count = res_df[res_df['has_spoiler'] == True].shape[0]
190
+ spoil_rate = (spoil_count / len(res_df)) * 100
191
+ c2.metric("Spoiler Rate (Should be low)", f"{spoil_rate:.1f}%", delta_color="inverse")
192
+
193
+ c3.metric("Audited Samples", len(res_df))
194
+
195
+ else:
196
+ st.warning(f"No dataset found at location: `{DATASET_SOURCE}`. Please check the path.")
pages/search.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import torch
3
+ import torch.nn.functional as F
4
+ import numpy as np
5
+ import time
6
+ from pathlib import Path
7
+ from utils import load_css
8
+ # ==========================================
9
+ # PAGE CONFIG
10
+ # ==========================================
11
+ st.set_page_config(page_title="Search & Logs Analysis", page_icon="🔍", layout="wide")
12
+
13
+ def load_css(file_name):
14
+ try:
15
+ css_file = Path(__file__).parent.parent / file_name
16
+ with open(css_file) as f:
17
+ st.markdown(f'<style>{f.read()}</style>', unsafe_allow_html=True)
18
+ except FileNotFoundError: st.error(f"CSS file not found: {file_name}")
19
+
20
+ load_css("assets/style_search.css")
21
+
22
+ # ==========================================
23
+ # UTILS
24
+ # ==========================================
25
+ def encode_text(text, model, tokenizer):
26
+ if not text: return None
27
+ device = next(model.parameters()).device
28
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
29
+ inputs = {k: v.to(device) for k, v in inputs.items()}
30
+ with torch.no_grad():
31
+ outputs = model(**inputs)
32
+ embeddings = outputs.last_hidden_state.mean(dim=1)
33
+ embeddings = F.normalize(embeddings, p=2, dim=1)
34
+ return embeddings[0].cpu().numpy()
35
+
36
+ def calculate_cosine_distance(vec_a, vec_b):
37
+ if vec_a is None or vec_b is None: return 1.0
38
+ return 1.0 - min(max(np.dot(vec_a, vec_b), -1.0), 1.0)
39
+
40
+ # ==========================================
41
+ # UI
42
+ # ==========================================
43
+ if 'db_initialized' not in st.session_state or not st.session_state.db_initialized:
44
+ st.warning("**System Not Initialized**")
45
+ st.info("Please go to the **Home** page to load the model and dataset first.")
46
+ st.stop()
47
+
48
+ st.title("Semantic Search & Debugger")
49
+ st.markdown("Debug vector search and verify semantic distance between query and reference code.")
50
+
51
+ col_main, col_sidebar = st.columns([3, 1])
52
+ with col_main:
53
+ query_code = st.text_area("Input Code Snippet", height=200, placeholder="Paste code here...")
54
+
55
+ with col_sidebar:
56
+ st.subheader("Search Params")
57
+ k = st.slider("Retrieval Count (k)", 1, 20, 5)
58
+ threshold_diff = st.slider("Divergence Threshold", 0.0, 1.0, 0.25, step=0.01)
59
+ search_btn = st.button("🚀 Run Analysis", use_container_width=True)
60
+
61
+ st.markdown("### System Logs")
62
+ log_container = st.container()
63
+
64
+ if search_btn and query_code:
65
+ start_time = time.time()
66
+ with log_container:
67
+ st.markdown('<div class="debug-terminal">', unsafe_allow_html=True)
68
+ st.text(f"[*] Starting Analysis Pipeline...")
69
+
70
+ try:
71
+ with st.spinner("Encoding query..."):
72
+ query_emb = encode_text(query_code, st.session_state.model, st.session_state.tokenizer)
73
+
74
+ with log_container:
75
+ st.text(f"[+] Query Encoded: Length {len(query_code)}, Norm {np.linalg.norm(query_emb):.4f}")
76
+
77
+ with st.spinner(f"Querying ChromaDB (k={k})..."):
78
+ results = st.session_state.collection.query(
79
+ query_embeddings=[query_emb.tolist()], n_results=k, include=['documents', 'metadatas', 'distances']
80
+ )
81
+
82
+ st.divider()
83
+ st.subheader("Analysis Results")
84
+ valid_results = 0
85
+
86
+ for i in range(len(results['documents'][0])):
87
+ doc_id = results['ids'][0][i]
88
+ feedback = results['documents'][0][i]
89
+ ref_code = results['metadatas'][0][i].get('code', '')
90
+
91
+ ref_code_emb = encode_text(ref_code, st.session_state.model, st.session_state.tokenizer)
92
+ code_sem_dist = calculate_cosine_distance(query_emb, ref_code_emb)
93
+
94
+ status_text = "REJECTED"
95
+ status_class = "badge-error"
96
+ if code_sem_dist < 0.05:
97
+ status_class = "badge-success"; status_text = "EXACT MATCH"
98
+ elif code_sem_dist < threshold_diff:
99
+ status_class = "badge-warning"; status_text = "SEMANTIC MATCH"
100
+
101
+ with log_container:
102
+ st.text(f" [Candidate #{i+1} - {doc_id}]")
103
+ st.text(f" > Code Dist : {code_sem_dist:.4f} | Status: {status_text}")
104
+
105
+ if code_sem_dist < threshold_diff:
106
+ valid_results += 1
107
+ with st.expander(f"Result #{i+1} - {status_text} (Dist: {code_sem_dist:.3f})", expanded=True):
108
+ st.markdown(f"""
109
+ <div class="result-card">
110
+ <div style="display:flex; justify-content:space-between; margin-bottom:10px;">
111
+ <span class="badge {status_class}">{status_text}</span>
112
+ <small style="color:#64748b">ID: {doc_id}</small>
113
+ </div>
114
+ <p style="font-size:1.05rem; line-height:1.6;">{feedback}</p>
115
+ </div>""", unsafe_allow_html=True)
116
+ c1, c2 = st.columns(2)
117
+ with c1: st.caption("Your Code"); st.code(query_code, language='c')
118
+ with c2: st.caption("DB Code"); st.code(ref_code, language='c')
119
+
120
+ total_time = (time.time() - start_time) * 1000
121
+ with log_container:
122
+ st.text(f"[*] Finished in {total_time:.2f}ms. Valid: {valid_results}/{k}")
123
+ st.markdown('</div>', unsafe_allow_html=True)
124
+
125
+ if valid_results == 0:
126
+ st.warning("No results found within the semantic threshold.")
127
+
128
+ except Exception as e:
129
+ st.error(f"Analysis Failed: {str(e)}")
pages/stats.py ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Statistics Dashboard
3
+ Displays metrics for the cache system
4
+ """
5
+
6
+ import streamlit as st
7
+ import pandas as pd
8
+ import plotly.express as px
9
+ import plotly.graph_objects as go
10
+ from datetime import datetime, timedelta
11
+ import sys
12
+ from pathlib import Path
13
+
14
+ # Add parent directory to path for imports
15
+ sys.path.insert(0, str(Path(__file__).parent.parent))
16
+
17
+ from stats_logger import StatsLogger
18
+
19
+ # ==========================================
20
+ # PAGE CONFIG
21
+ # ==========================================
22
+ st.set_page_config(
23
+ page_title="Cache Statistics",
24
+ layout="wide"
25
+ )
26
+
27
+ st.title("Cache Performance Statistics")
28
+
29
+ # ==========================================
30
+ # LOAD DATA
31
+ # ==========================================
32
+
33
+ logger = StatsLogger()
34
+
35
+ # Load data
36
+ stats = logger.read_stats()
37
+ summary = logger.get_summary_stats()
38
+ cache_misses = logger.read_cache_misses()
39
+
40
+ if not stats:
41
+ st.warning("No data yet. Submit some queries first!")
42
+ st.stop()
43
+
44
+ # Convert to DataFrame
45
+ df = pd.DataFrame(stats)
46
+
47
+ # Convert timestamp to datetime
48
+ if 'timestamp' in df.columns:
49
+ df['timestamp'] = pd.to_datetime(df['timestamp'])
50
+ df = df.sort_values('timestamp')
51
+
52
+ # ==========================================
53
+ # KPI METRICS
54
+ # ==========================================
55
+
56
+ st.header("Key Performance Indicators")
57
+
58
+ col1, col2, col3, col4 = st.columns(4)
59
+
60
+ with col1:
61
+ st.metric(
62
+ "Total Queries",
63
+ f"{summary['total_queries']:,}",
64
+ help="Total number of queries submitted"
65
+ )
66
+
67
+ with col2:
68
+ st.metric(
69
+ "Cache Hit Rate",
70
+ f"{summary['hit_rate']:.1f}%",
71
+ delta=f"{summary['total_hits']} hits" if summary['total_hits'] > 0 else None,
72
+ help="Percentage of queries resolved via cache"
73
+ )
74
+
75
+ with col3:
76
+ st.metric(
77
+ "Avg Confidence",
78
+ f"{summary['avg_confidence']:.2f}",
79
+ help="Average confidence score for cache hits"
80
+ )
81
+
82
+ with col4:
83
+ st.metric(
84
+ "DeepSeek Tokens",
85
+ f"{summary['total_deepseek_tokens']:,}",
86
+ delta=f"{summary['total_misses']} calls",
87
+ delta_color="inverse",
88
+ help="Total tokens consumed via DeepSeek API"
89
+ )
90
+
91
+ st.divider()
92
+
93
+ # ==========================================
94
+ # TIME SERIES
95
+ # ==========================================
96
+
97
+ st.header("Query Timeline")
98
+
99
+ col1, col2 = st.columns(2)
100
+
101
+ with col1:
102
+ # Hit/Miss over time
103
+ fig = px.scatter(
104
+ df,
105
+ x='timestamp',
106
+ y='confidence',
107
+ color='status',
108
+ size='response_time_ms',
109
+ color_discrete_map={'hit': '#10b981', 'miss': '#ef4444'},
110
+ title="Cache Hit/Miss Over Time",
111
+ labels={
112
+ 'timestamp': 'Time',
113
+ 'confidence': 'Confidence Score',
114
+ 'status': 'Status',
115
+ 'response_time_ms': 'Response Time (ms)'
116
+ }
117
+ )
118
+ fig.update_layout(height=400)
119
+ st.plotly_chart(fig, use_container_width=True)
120
+
121
+ with col2:
122
+ # Response time distribution
123
+ fig = px.box(
124
+ df,
125
+ x='status',
126
+ y='response_time_ms',
127
+ color='status',
128
+ color_discrete_map={'hit': '#10b981', 'miss': '#ef4444'},
129
+ title="Response Time Distribution",
130
+ labels={'response_time_ms': 'Response Time (ms)', 'status': 'Cache Status'}
131
+ )
132
+ fig.update_layout(height=400)
133
+ st.plotly_chart(fig, use_container_width=True)
134
+
135
+ # ==========================================
136
+ # SIMILARITY SCORES
137
+ # ==========================================
138
+
139
+ st.header("Similarity Analysis")
140
+
141
+ col1, col2 = st.columns(2)
142
+
143
+ with col1:
144
+ # Similarity distribution
145
+ if 'similarity_score' in df.columns:
146
+ fig = px.histogram(
147
+ df,
148
+ x='similarity_score',
149
+ color='status',
150
+ nbins=30,
151
+ title="Similarity Score Distribution",
152
+ labels={'similarity_score': 'Similarity Score (lower = more similar)'},
153
+ color_discrete_map={'hit': '#10b981', 'miss': '#ef4444'}
154
+ )
155
+ fig.add_vline(x=0.3, line_dash="dash", line_color="orange",
156
+ annotation_text="Threshold (0.3)")
157
+ fig.update_layout(height=400)
158
+ st.plotly_chart(fig, use_container_width=True)
159
+
160
+ with col2:
161
+ # Confidence vs Similarity
162
+ hits_df = df[df['status'] == 'hit']
163
+ if not hits_df.empty and 'similarity_score' in hits_df.columns:
164
+ fig = px.scatter(
165
+ hits_df,
166
+ x='similarity_score',
167
+ y='confidence',
168
+ size='response_time_ms',
169
+ title="Confidence vs Similarity (Hits Only)",
170
+ labels={
171
+ 'similarity_score': 'Similarity Score',
172
+ 'confidence': 'Confidence',
173
+ 'response_time_ms': 'Response Time (ms)'
174
+ },
175
+ color='confidence',
176
+ color_continuous_scale='viridis'
177
+ )
178
+ fig.update_layout(height=400)
179
+ st.plotly_chart(fig, use_container_width=True)
180
+
181
+ # ==========================================
182
+ # ERROR CATEGORIES
183
+ # ==========================================
184
+
185
+ st.header("Error Categories Analysis")
186
+
187
+ col1, col2 = st.columns(2)
188
+
189
+ with col1:
190
+ # Top error categories
191
+ if 'error_category' in df.columns:
192
+ error_counts = df['error_category'].value_counts().head(10)
193
+ fig = px.bar(
194
+ x=error_counts.values,
195
+ y=error_counts.index,
196
+ orientation='h',
197
+ title="Top 10 Error Categories",
198
+ labels={'x': 'Count', 'y': 'Error Category'},
199
+ color=error_counts.values,
200
+ color_continuous_scale='blues'
201
+ )
202
+ fig.update_layout(height=400, showlegend=False)
203
+ st.plotly_chart(fig, use_container_width=True)
204
+
205
+ with col2:
206
+ # Difficulty distribution
207
+ if 'difficulty' in df.columns:
208
+ diff_counts = df['difficulty'].value_counts()
209
+ fig = px.pie(
210
+ values=diff_counts.values,
211
+ names=diff_counts.index,
212
+ title="Difficulty Distribution",
213
+ color_discrete_sequence=px.colors.sequential.RdBu
214
+ )
215
+ fig.update_layout(height=400)
216
+ st.plotly_chart(fig, use_container_width=True)
217
+
218
+ # ==========================================
219
+ # CACHE MISSES LOG
220
+ # ==========================================
221
+
222
+ st.header("Recent Cache Misses")
223
+
224
+ if cache_misses:
225
+ st.info(f"{len(cache_misses)} cache misses logged (ready for retraining)")
226
+
227
+ # Display the last 5
228
+ recent_misses = cache_misses[-5:]
229
+
230
+ for i, miss in enumerate(reversed(recent_misses), 1):
231
+ with st.expander(f"Miss #{len(cache_misses) - i + 1} - {miss.get('theme', 'N/A')} ({miss.get('error_category', 'N/A')})"):
232
+ col1, col2 = st.columns([1, 1])
233
+
234
+ with col1:
235
+ st.markdown("**Code:**")
236
+ st.code(miss.get('code', 'N/A'), language='c')
237
+
238
+ with col2:
239
+ st.markdown("**Generated Feedback:**")
240
+ st.write(miss.get('feedback', 'N/A'))
241
+
242
+ st.markdown(f"**Tokens Used:** {miss.get('tokens_used', 0)}")
243
+ st.markdown(f"**Timestamp:** {miss.get('timestamp', 'N/A')}")
244
+ else:
245
+ st.success("No cache misses yet - all queries resolved from cache!")
246
+
247
+ # ==========================================
248
+ # EXPORT DATA
249
+ # ==========================================
250
+
251
+ st.divider()
252
+
253
+ st.header("Export Data")
254
+
255
+ col1, col2 = st.columns(2)
256
+
257
+ with col1:
258
+ if st.button("Download Stats CSV"):
259
+ csv = df.to_csv(index=False)
260
+ st.download_button(
261
+ label="Download stats.csv",
262
+ data=csv,
263
+ file_name=f"cache_stats_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
264
+ mime="text/csv"
265
+ )
266
+
267
+ with col2:
268
+ if cache_misses and st.button("Download Cache Misses JSONL"):
269
+ import json
270
+ jsonl_content = "\n".join(json.dumps(miss) for miss in cache_misses)
271
+ st.download_button(
272
+ label="Download cache_miss.jsonl",
273
+ data=jsonl_content,
274
+ file_name=f"cache_miss_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jsonl",
275
+ mime="application/jsonl"
276
+ )
reload_config.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Utility to force reload config without restarting Streamlit
3
+ """
4
+
5
+ import importlib
6
+ import sys
7
+
8
+ def reload_config():
9
+ """Force reload the config module"""
10
+ if 'config' in sys.modules:
11
+ import config
12
+ importlib.reload(config)
13
+ return True
14
+ return False
15
+
16
+ if __name__ == "__main__":
17
+ reload_config()
18
+ print("Config reloaded!")
requirements.txt CHANGED
@@ -1,3 +1,6 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
1
+ streamlit==1.29.0
2
+ torch==2.1.0
3
+ transformers==4.36.0
4
+ peft==0.7.1
5
+ datasets==2.15.0
6
+ chromadb==0.4.18
run.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Quick launcher for RAG Feedback System
3
+
4
+ cd "$(dirname "$0")"
5
+
6
+ echo "🚀 Starting RAG Feedback System..."
7
+ echo ""
8
+ echo "📊 Pages available:"
9
+ echo " - Main: http://localhost:8501"
10
+ echo " - Stats: http://localhost:8501/stats"
11
+ echo ""
12
+ echo "Press Ctrl+C to stop"
13
+ echo ""
14
+
15
+ streamlit run app.py
stats_logger.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Stats Logger - Enregistre toutes les requêtes pour analyse
3
+ """
4
+
5
+ import json
6
+ from datetime import datetime
7
+ from pathlib import Path
8
+ from typing import Dict, Any
9
+ from config import STATS_LOG, CACHE_MISS_LOG
10
+
11
+ class StatsLogger:
12
+ def __init__(self):
13
+ # Créer les dossiers si nécessaire
14
+ Path(STATS_LOG).parent.mkdir(parents=True, exist_ok=True)
15
+ Path(CACHE_MISS_LOG).parent.mkdir(parents=True, exist_ok=True)
16
+
17
+ def log_query(self, query_data: Dict[str, Any]) -> None:
18
+ """
19
+ Enregistre une requête dans stats.jsonl
20
+
21
+ Args:
22
+ query_data: {
23
+ "query_id": str,
24
+ "timestamp": str,
25
+ "status": "hit" | "miss",
26
+ "similarity_score": float,
27
+ "confidence": float,
28
+ "response_time_ms": float,
29
+ "theme": str,
30
+ "error_category": str,
31
+ "difficulty": str,
32
+ "deepseek_tokens": int,
33
+ "cache_size": int
34
+ }
35
+ """
36
+ # Ajouter timestamp si pas présent
37
+ if 'timestamp' not in query_data:
38
+ query_data['timestamp'] = datetime.now().isoformat()
39
+
40
+ try:
41
+ with open(STATS_LOG, 'a', encoding='utf-8') as f:
42
+ f.write(json.dumps(query_data, ensure_ascii=False) + '\n')
43
+ except Exception as e:
44
+ print(f"Error logging stats: {e}")
45
+
46
+ def log_cache_miss(self, miss_data: Dict[str, Any]) -> None:
47
+ """
48
+ Enregistre un cache miss avec toutes les données (format dataset).
49
+
50
+ Args:
51
+ miss_data: {
52
+ "theme": str,
53
+ "difficulty": str,
54
+ "tags": list,
55
+ "error_category": str,
56
+ "instructions": str,
57
+ "code": str,
58
+ "test_cases_scope": list,
59
+ "failed_tests": list,
60
+ "feedback": str,
61
+ "query_id": str,
62
+ "timestamp": str,
63
+ "tokens_used": int
64
+ }
65
+ """
66
+ # Ajouter timestamp
67
+ if 'timestamp' not in miss_data:
68
+ miss_data['timestamp'] = datetime.now().isoformat()
69
+
70
+ try:
71
+ with open(CACHE_MISS_LOG, 'a', encoding='utf-8') as f:
72
+ f.write(json.dumps(miss_data, ensure_ascii=False) + '\n')
73
+ except Exception as e:
74
+ print(f"Error logging cache miss: {e}")
75
+
76
+ def read_stats(self, limit: int = None) -> list:
77
+ """
78
+ Lit les stats depuis le fichier.
79
+
80
+ Args:
81
+ limit: Nombre max de lignes à retourner (None = toutes)
82
+
83
+ Returns:
84
+ Liste de dicts
85
+ """
86
+ stats = []
87
+ try:
88
+ with open(STATS_LOG, 'r', encoding='utf-8') as f:
89
+ for line in f:
90
+ if line.strip():
91
+ stats.append(json.loads(line))
92
+
93
+ if limit:
94
+ return stats[-limit:]
95
+ return stats
96
+
97
+ except FileNotFoundError:
98
+ return []
99
+ except Exception as e:
100
+ print(f"Error reading stats: {e}")
101
+ return []
102
+
103
+ def read_cache_misses(self, limit: int = None) -> list:
104
+ """
105
+ Lit les cache misses depuis le fichier.
106
+
107
+ Args:
108
+ limit: Nombre max de lignes à retourner (None = toutes)
109
+
110
+ Returns:
111
+ Liste de dicts (format dataset)
112
+ """
113
+ misses = []
114
+ try:
115
+ with open(CACHE_MISS_LOG, 'r', encoding='utf-8') as f:
116
+ for line in f:
117
+ if line.strip():
118
+ misses.append(json.loads(line))
119
+
120
+ if limit:
121
+ return misses[-limit:]
122
+ return misses
123
+
124
+ except FileNotFoundError:
125
+ return []
126
+ except Exception as e:
127
+ print(f"Error reading cache misses: {e}")
128
+ return []
129
+
130
+ def get_summary_stats(self) -> Dict[str, Any]:
131
+ """
132
+ Retourne un résumé des statistiques.
133
+
134
+ Returns:
135
+ {
136
+ "total_queries": int,
137
+ "total_hits": int,
138
+ "total_misses": int,
139
+ "hit_rate": float,
140
+ "avg_confidence": float,
141
+ "total_deepseek_tokens": int,
142
+ "avg_response_time_ms": float
143
+ }
144
+ """
145
+ stats = self.read_stats()
146
+
147
+ if not stats:
148
+ return {
149
+ "total_queries": 0,
150
+ "total_hits": 0,
151
+ "total_misses": 0,
152
+ "hit_rate": 0.0,
153
+ "avg_confidence": 0.0,
154
+ "total_deepseek_tokens": 0,
155
+ "avg_response_time_ms": 0.0
156
+ }
157
+
158
+ total_queries = len(stats)
159
+ total_hits = sum(1 for s in stats if s.get('status') == 'hit')
160
+ total_misses = total_queries - total_hits
161
+
162
+ hit_rate = (total_hits / total_queries) * 100 if total_queries > 0 else 0.0
163
+
164
+ confidences = [s.get('confidence', 0) for s in stats if s.get('confidence') is not None]
165
+ avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0
166
+
167
+ total_tokens = sum(s.get('deepseek_tokens', 0) for s in stats)
168
+
169
+ response_times = [s.get('response_time_ms', 0) for s in stats if s.get('response_time_ms')]
170
+ avg_response_time = sum(response_times) / len(response_times) if response_times else 0.0
171
+
172
+ return {
173
+ "total_queries": total_queries,
174
+ "total_hits": total_hits,
175
+ "total_misses": total_misses,
176
+ "hit_rate": round(hit_rate, 2),
177
+ "avg_confidence": round(avg_confidence, 3),
178
+ "total_deepseek_tokens": total_tokens,
179
+ "avg_response_time_ms": round(avg_response_time, 2)
180
+ }
test_system.py ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for the cache system
4
+ """
5
+
6
+ import os
7
+ from deepseek_caller import DeepSeekCaller
8
+ from stats_logger import StatsLogger
9
+ from config import DEEPSEEK_API_KEY
10
+
11
+ def test_deepseek_connection():
12
+ """Test if DeepSeek API is accessible"""
13
+ print("=" * 70)
14
+ print("🧪 TEST 1: DeepSeek API Connection")
15
+ print("=" * 70)
16
+
17
+ if not DEEPSEEK_API_KEY:
18
+ print(" DEEPSEEK_API_KEY not found in environment")
19
+ print(" Set it with: export DEEPSEEK_API_KEY='your-key'")
20
+ return False
21
+
22
+ try:
23
+ caller = DeepSeekCaller()
24
+ is_connected = caller.test_connection()
25
+
26
+ if is_connected:
27
+ print(" DeepSeek API is accessible")
28
+ return True
29
+ else:
30
+ print(" DeepSeek API test failed")
31
+ return False
32
+
33
+ except Exception as e:
34
+ print(f" Error: {e}")
35
+ return False
36
+
37
+ def test_feedback_generation():
38
+ """Test feedback generation"""
39
+ print("\n" + "=" * 70)
40
+ print("🧪 TEST 2: Feedback Generation")
41
+ print("=" * 70)
42
+
43
+ caller = DeepSeekCaller()
44
+
45
+ test_context = {
46
+ "theme": "Prime Number Check",
47
+ "difficulty": "beginner",
48
+ "error_category": "Incorrect Base Case Return Value",
49
+ "instructions": "Write a function to check if a number is prime",
50
+ "code": """
51
+ int is_prime(int n) {
52
+ if (n <= 1) {
53
+ return 1; // Bug: should return 0
54
+ }
55
+ for (int i = 2; i * i <= n; i++) {
56
+ if (n % i == 0) return 0;
57
+ }
58
+ return 1;
59
+ }
60
+ """,
61
+ "test_cases_scope": ["Input: 1", "Input: 2", "Input: 17"],
62
+ "failed_tests": ["Input: 1 (returns 1, expected 0)"]
63
+ }
64
+
65
+ print(" Generating feedback...")
66
+ result = caller.generate_feedback(test_context)
67
+
68
+ if result.get('feedback'):
69
+ print(" Feedback generated successfully!")
70
+ print("\n Metrics:")
71
+ print(f" Tokens (prompt): {result['tokens_prompt']}")
72
+ print(f" Tokens (completion): {result['tokens_completion']}")
73
+ print(f" Total tokens: {result['tokens_total']}")
74
+ print(f" Generation time: {result['generation_time_ms']:.0f} ms")
75
+ print("\n Feedback:")
76
+ print(f" {result['feedback'][:200]}...")
77
+ return True
78
+ else:
79
+ print(f" Error: {result.get('error')}")
80
+ return False
81
+
82
+ def test_stats_logger():
83
+ """Test stats logging"""
84
+ print("\n" + "=" * 70)
85
+ print("🧪 TEST 3: Stats Logger")
86
+ print("=" * 70)
87
+
88
+ logger = StatsLogger()
89
+
90
+ # Test query log
91
+ test_query = {
92
+ "query_id": "test-123",
93
+ "status": "hit",
94
+ "similarity_score": 0.15,
95
+ "confidence": 0.95,
96
+ "response_time_ms": 123.45,
97
+ "theme": "Test Theme",
98
+ "error_category": "Test Error",
99
+ "difficulty": "beginner",
100
+ "deepseek_tokens": 0,
101
+ "cache_size": 100
102
+ }
103
+
104
+ try:
105
+ logger.log_query(test_query)
106
+ print(" Query logged successfully")
107
+
108
+ # Read back
109
+ stats = logger.read_stats(limit=1)
110
+ if stats:
111
+ print(f" Read back: {stats[-1]['query_id']}")
112
+ else:
113
+ print(" No stats found (empty file)")
114
+
115
+ return True
116
+
117
+ except Exception as e:
118
+ print(f" Error: {e}")
119
+ return False
120
+
121
+ def main():
122
+ print(" TESTING CACHE SYSTEM COMPONENTS")
123
+ print()
124
+
125
+ results = []
126
+
127
+ # Test 1: API Connection
128
+ results.append(("DeepSeek API", test_deepseek_connection()))
129
+
130
+ # Test 2: Feedback Generation
131
+ if results[0][1]: # Only if API works
132
+ results.append(("Feedback Generation", test_feedback_generation()))
133
+ else:
134
+ print("\n⏭ Skipping feedback generation test (API unavailable)")
135
+
136
+ # Test 3: Stats Logger
137
+ results.append(("Stats Logger", test_stats_logger()))
138
+
139
+ # Summary
140
+ print("\n" + "=" * 70)
141
+ print(" TEST SUMMARY")
142
+ print("=" * 70)
143
+
144
+ for test_name, passed in results:
145
+ status = " PASS" if passed else " FAIL"
146
+ print(f"{status:12} {test_name}")
147
+
148
+ total_tests = len(results)
149
+ passed_tests = sum(1 for _, passed in results if passed)
150
+
151
+ print()
152
+ print(f"Total: {passed_tests}/{total_tests} tests passed")
153
+
154
+ if passed_tests == total_tests:
155
+ print(" All tests passed! System is ready.")
156
+ return 0
157
+ else:
158
+ print(" Some tests failed. Check configuration.")
159
+ return 1
160
+
161
+ if __name__ == "__main__":
162
+ exit(main())
trend_analyzer.py ADDED
@@ -0,0 +1,556 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Trend Analysis Module for RAG Feedback Viewer
3
+ ==============================================
4
+
5
+ This module analyzes patterns in student code submissions by tracking:
6
+ - Recurring difficulties by author_id and code_id
7
+ - Common error patterns through feedback clustering
8
+ - Topic identification for problematic areas
9
+ """
10
+
11
+ import numpy as np
12
+ import pandas as pd
13
+ from collections import Counter, defaultdict
14
+ from typing import Dict, List, Tuple, Optional
15
+ from sklearn.cluster import KMeans, DBSCAN
16
+ from sklearn.manifold import TSNE
17
+ import plotly.express as px
18
+ import plotly.graph_objects as go
19
+
20
+
21
+ class TrendAnalyzer:
22
+ """Analyze trends and patterns in code feedback data"""
23
+
24
+ def __init__(self):
25
+ self.data_cache = None
26
+ self.cluster_cache = None
27
+
28
+ def load_data_from_collection(self, collection, dataset: List[Dict]) -> pd.DataFrame:
29
+ """
30
+ Load and structure data from ChromaDB collection and original dataset
31
+
32
+ Args:
33
+ collection: ChromaDB collection
34
+ dataset: Original dataset with code_id and author_id
35
+
36
+ Returns:
37
+ DataFrame with all necessary fields
38
+ """
39
+ # Get all data from collection
40
+ results = collection.get(include=['embeddings', 'documents', 'metadatas'])
41
+
42
+ # Build DataFrame
43
+ data = []
44
+ for i, (doc_id, embedding, document, metadata) in enumerate(
45
+ zip(results['ids'], results['embeddings'], results['documents'], results['metadatas'])
46
+ ):
47
+ # Extract index from doc_id (format: "doc_123")
48
+ idx = int(doc_id.split('_')[1])
49
+
50
+ # Get corresponding dataset entry
51
+ if idx < len(dataset):
52
+ original_entry = dataset[idx]
53
+ data.append({
54
+ 'id': doc_id,
55
+ 'feedback': document,
56
+ 'code': metadata.get('code', ''),
57
+ 'code_id': original_entry.get('code_id', 'unknown'),
58
+ 'author_id': original_entry.get('author_id', 'unknown'),
59
+ 'embedding': embedding
60
+ })
61
+
62
+ df = pd.DataFrame(data)
63
+ self.data_cache = df
64
+ return df
65
+
66
+ def analyze_authors_cluster_diversity(self, df_clustered: pd.DataFrame) -> pd.DataFrame:
67
+ """
68
+ Analyze how many different clusters each author appears in.
69
+ Authors in many clusters = struggling with multiple different concepts.
70
+
71
+ Args:
72
+ df_clustered: DataFrame with cluster labels and author_id
73
+
74
+ Returns:
75
+ DataFrame with author statistics sorted by cluster diversity
76
+ """
77
+ author_stats = df_clustered.groupby('author_id').agg({
78
+ 'cluster': lambda x: list(x),
79
+ 'feedback': 'count',
80
+ 'code_id': lambda x: list(x)
81
+ }).reset_index()
82
+
83
+ # Calculate cluster diversity metrics
84
+ author_stats['unique_clusters'] = author_stats['cluster'].apply(lambda x: len(set(x)))
85
+ author_stats['total_feedbacks'] = author_stats['feedback']
86
+ author_stats['cluster_diversity_ratio'] = author_stats['unique_clusters'] / author_stats['total_feedbacks']
87
+ author_stats['cluster_list'] = author_stats['cluster'].apply(lambda x: sorted(set(x)))
88
+
89
+ # Rename columns
90
+ author_stats = author_stats[['author_id', 'total_feedbacks', 'unique_clusters',
91
+ 'cluster_diversity_ratio', 'cluster_list', 'code_id']]
92
+ author_stats.columns = ['author_id', 'total_feedbacks', 'unique_clusters',
93
+ 'diversity_ratio', 'clusters', 'code_ids']
94
+
95
+ # Sort by unique clusters (descending) then by total feedbacks
96
+ author_stats = author_stats.sort_values(['unique_clusters', 'total_feedbacks'],
97
+ ascending=[False, False])
98
+
99
+ return author_stats
100
+
101
+ def find_recurring_authors(self, df: pd.DataFrame, min_submissions: int = 3) -> pd.DataFrame:
102
+ """
103
+ Identify authors with multiple submissions
104
+
105
+ Args:
106
+ df: DataFrame with author_id column
107
+ min_submissions: Minimum number of submissions to be considered recurring
108
+
109
+ Returns:
110
+ DataFrame with author statistics
111
+ """
112
+ author_stats = df.groupby('author_id').agg({
113
+ 'code_id': 'count',
114
+ 'feedback': lambda x: list(x)
115
+ }).reset_index()
116
+
117
+ author_stats.columns = ['author_id', 'submission_count', 'feedbacks']
118
+ author_stats = author_stats[author_stats['submission_count'] >= min_submissions]
119
+ author_stats = author_stats.sort_values('submission_count', ascending=False)
120
+
121
+ return author_stats
122
+
123
+ def find_recurring_code_patterns(self, df: pd.DataFrame, min_occurrences: int = 2) -> pd.DataFrame:
124
+ """
125
+ Identify code_ids that appear multiple times (same exercise attempted multiple times)
126
+
127
+ Args:
128
+ df: DataFrame with code_id column
129
+ min_occurrences: Minimum occurrences to be considered recurring
130
+
131
+ Returns:
132
+ DataFrame with code pattern statistics
133
+ """
134
+ code_stats = df.groupby('code_id').agg({
135
+ 'author_id': 'count',
136
+ 'feedback': lambda x: list(x),
137
+ 'code': 'first'
138
+ }).reset_index()
139
+
140
+ code_stats.columns = ['code_id', 'occurrence_count', 'feedbacks', 'sample_code']
141
+ code_stats = code_stats[code_stats['occurrence_count'] >= min_occurrences]
142
+ code_stats = code_stats.sort_values('occurrence_count', ascending=False)
143
+
144
+ return code_stats
145
+
146
+ def find_optimal_clusters(self, embeddings: np.ndarray, max_k: int = 20) -> int:
147
+ """
148
+ Find optimal number of clusters using silhouette score
149
+
150
+ Args:
151
+ embeddings: Embedding matrix
152
+ max_k: Maximum number of clusters to try
153
+
154
+ Returns:
155
+ Optimal number of clusters
156
+ """
157
+ from sklearn.metrics import silhouette_score
158
+
159
+ # Try different k values
160
+ min_k = max(2, min(5, len(embeddings) // 10)) # At least 2, typically start at 5
161
+ max_k = min(max_k, len(embeddings) // 5) # Don't have too many small clusters
162
+
163
+ best_score = -1
164
+ best_k = 10 # Default fallback
165
+
166
+ for k in range(min_k, max_k + 1):
167
+ try:
168
+ clusterer = KMeans(n_clusters=k, random_state=42, n_init=10)
169
+ labels = clusterer.fit_predict(embeddings)
170
+ score = silhouette_score(embeddings, labels, metric='cosine', sample_size=min(1000, len(embeddings)))
171
+
172
+ if score > best_score:
173
+ best_score = score
174
+ best_k = k
175
+ except:
176
+ continue
177
+
178
+ return best_k
179
+
180
+ def cluster_feedbacks(
181
+ self,
182
+ df: pd.DataFrame,
183
+ n_clusters: int = None,
184
+ method: str = 'kmeans'
185
+ ) -> Tuple[pd.DataFrame, np.ndarray, int]:
186
+ """
187
+ Cluster feedbacks based on embeddings to identify common themes
188
+
189
+ Args:
190
+ df: DataFrame with embedding column
191
+ n_clusters: Number of clusters (if None, will find optimal)
192
+ method: 'kmeans' or 'dbscan'
193
+
194
+ Returns:
195
+ Tuple of (DataFrame with cluster labels, cluster info, number of clusters used)
196
+ """
197
+ embeddings = np.array(df['embedding'].tolist())
198
+
199
+ if method == 'kmeans':
200
+ # Find optimal if not specified
201
+ if n_clusters is None:
202
+ n_clusters = self.find_optimal_clusters(embeddings)
203
+
204
+ clusterer = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
205
+ labels = clusterer.fit_predict(embeddings)
206
+ cluster_info = clusterer.cluster_centers_
207
+ else: # dbscan
208
+ clusterer = DBSCAN(eps=0.5, min_samples=5, metric='cosine')
209
+ labels = clusterer.fit_predict(embeddings)
210
+ cluster_info = labels
211
+ n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
212
+
213
+ df_clustered = df.copy()
214
+ df_clustered['cluster'] = labels
215
+
216
+ self.cluster_cache = {
217
+ 'labels': labels,
218
+ 'method': method,
219
+ 'cluster_info': cluster_info,
220
+ 'n_clusters': n_clusters
221
+ }
222
+
223
+ return df_clustered, cluster_info, n_clusters
224
+
225
+ def get_cluster_statistics(self, df_clustered: pd.DataFrame) -> pd.DataFrame:
226
+ """
227
+ Get statistics for each cluster
228
+
229
+ Args:
230
+ df_clustered: DataFrame with cluster labels
231
+
232
+ Returns:
233
+ DataFrame with cluster statistics
234
+ """
235
+ cluster_stats = df_clustered.groupby('cluster').agg({
236
+ 'code_id': 'count',
237
+ 'author_id': lambda x: len(set(x)),
238
+ 'feedback': lambda x: list(x)[:5], # Top 5 samples
239
+ 'code': lambda x: list(x)[:3] # Top 3 code samples
240
+ }).reset_index()
241
+
242
+ cluster_stats.columns = [
243
+ 'cluster_id',
244
+ 'feedback_count',
245
+ 'unique_authors',
246
+ 'sample_feedbacks',
247
+ 'sample_codes'
248
+ ]
249
+
250
+ cluster_stats = cluster_stats.sort_values('feedback_count', ascending=False)
251
+
252
+ return cluster_stats
253
+
254
+ def find_author_difficulties(
255
+ self,
256
+ df_clustered: pd.DataFrame,
257
+ author_id: str
258
+ ) -> Dict:
259
+ """
260
+ Analyze specific author's difficulty patterns
261
+
262
+ Args:
263
+ df_clustered: DataFrame with cluster labels
264
+ author_id: Author ID to analyze
265
+
266
+ Returns:
267
+ Dictionary with author difficulty analysis
268
+ """
269
+ author_data = df_clustered[df_clustered['author_id'] == author_id]
270
+
271
+ if len(author_data) == 0:
272
+ return {'error': 'Author not found'}
273
+
274
+ # Cluster distribution
275
+ cluster_dist = author_data['cluster'].value_counts().to_dict()
276
+
277
+ # Most common issues (based on cluster membership)
278
+ primary_clusters = author_data['cluster'].value_counts().head(3).index.tolist()
279
+
280
+ return {
281
+ 'author_id': author_id,
282
+ 'total_submissions': len(author_data),
283
+ 'cluster_distribution': cluster_dist,
284
+ 'primary_difficulty_clusters': primary_clusters,
285
+ 'feedbacks': author_data['feedback'].tolist(),
286
+ 'codes': author_data['code'].tolist()
287
+ }
288
+
289
+ def identify_common_topics(
290
+ self,
291
+ df_clustered: pd.DataFrame,
292
+ top_n: int = 10,
293
+ use_llm: bool = True
294
+ ) -> List[Dict]:
295
+ """
296
+ Identify most common topics/difficulties across all students
297
+
298
+ Args:
299
+ df_clustered: DataFrame with cluster labels
300
+ top_n: Number of top topics to return
301
+ use_llm: Use LLM-based topic extraction (better quality)
302
+
303
+ Returns:
304
+ List of topic dictionaries
305
+ """
306
+ cluster_stats = self.get_cluster_statistics(df_clustered)
307
+
308
+ topics = []
309
+ for _, row in cluster_stats.head(top_n).iterrows():
310
+ if use_llm:
311
+ # Use LLM to extract meaningful concepts
312
+ concepts = self._extract_concepts_with_llm(row['sample_feedbacks'])
313
+ else:
314
+ # Fallback to basic keyword extraction
315
+ concepts = self._extract_keywords_basic(row['sample_feedbacks'])
316
+
317
+ topics.append({
318
+ 'cluster_id': int(row['cluster_id']),
319
+ 'student_count': int(row['unique_authors']),
320
+ 'occurrence_count': int(row['feedback_count']),
321
+ 'key_concepts': concepts,
322
+ 'sample_feedbacks': row['sample_feedbacks'][:3],
323
+ 'sample_codes': row['sample_codes'][:2]
324
+ })
325
+
326
+ return topics
327
+
328
+ def _extract_concepts_with_llm(self, feedbacks: List[str]) -> List[str]:
329
+ """
330
+ Extract programming concepts using zero-shot classification
331
+
332
+ Args:
333
+ feedbacks: List of feedback texts
334
+
335
+ Returns:
336
+ List of key programming concepts
337
+ """
338
+ from transformers import pipeline
339
+
340
+ # Combine top feedbacks
341
+ combined_text = ' '.join(feedbacks[:5])[:1000] # Limit length
342
+
343
+ # Candidate programming concepts
344
+ candidate_labels = [
345
+ "edge cases",
346
+ "loop conditions",
347
+ "variable initialization",
348
+ "return values",
349
+ "pointer management",
350
+ "memory allocation",
351
+ "array indexing",
352
+ "conditional logic",
353
+ "recursion",
354
+ "null handling",
355
+ "boundary conditions",
356
+ "type conversion",
357
+ "function parameters",
358
+ "error handling",
359
+ "algorithm efficiency",
360
+ "data structure usage",
361
+ "integer overflow",
362
+ "off-by-one errors",
363
+ "scope issues",
364
+ "logic errors"
365
+ ]
366
+
367
+ try:
368
+ # Use zero-shot classification
369
+ classifier = pipeline("zero-shot-classification",
370
+ model="facebook/bart-large-mnli",
371
+ device=-1) # CPU
372
+
373
+ result = classifier(combined_text, candidate_labels, multi_label=True)
374
+
375
+ # Get top 5 concepts with score > 0.3
376
+ concepts = [
377
+ label for label, score in zip(result['labels'], result['scores'])
378
+ if score > 0.3
379
+ ][:5]
380
+
381
+ return concepts if concepts else ["general programming"]
382
+
383
+ except Exception as e:
384
+ print(f"LLM extraction failed: {e}, falling back to basic")
385
+ return self._extract_keywords_basic(feedbacks)
386
+
387
+ def _extract_keywords_basic(self, feedbacks: List[str]) -> List[str]:
388
+ """
389
+ Basic keyword extraction (fallback)
390
+
391
+ Args:
392
+ feedbacks: List of feedback texts
393
+
394
+ Returns:
395
+ List of keywords
396
+ """
397
+ all_feedbacks = ' '.join(feedbacks)
398
+ words = all_feedbacks.lower().split()
399
+
400
+ # Extended stop words - more comprehensive
401
+ stop_words = {
402
+ 'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for',
403
+ 'of', 'with', 'by', 'from', 'is', 'are', 'was', 'were', 'be', 'been',
404
+ 'have', 'has', 'had', 'do', 'does', 'did', 'will', 'would', 'should',
405
+ 'can', 'could', 'may', 'might', 'your', 'you', 'this', 'that', 'these',
406
+ 'those', 'when', 'where', 'why', 'how', 'which', 'who', 'what',
407
+ 'after', 'before', 'during', 'while', 'about', 'into', 'through',
408
+ 'their', 'there', 'them', 'they', 'then', 'than', 'such', 'some',
409
+ 'it', 'its', 'if', 'else', 'not', 'all', 'any', 'each', 'every'
410
+ }
411
+
412
+ # Programming-specific keywords to keep
413
+ prog_keywords = {
414
+ 'loop', 'variable', 'function', 'return', 'pointer', 'array',
415
+ 'condition', 'edge', 'case', 'null', 'memory', 'allocation',
416
+ 'index', 'recursion', 'parameter', 'overflow', 'boundary'
417
+ }
418
+
419
+ keywords = []
420
+ for w in words:
421
+ w_clean = w.strip('.,;:!?()')
422
+ if (w_clean in prog_keywords) or (w_clean not in stop_words and len(w_clean) > 4):
423
+ keywords.append(w_clean)
424
+
425
+ keyword_counts = Counter(keywords).most_common(5)
426
+ return [kw for kw, _ in keyword_counts]
427
+
428
+ def visualize_clusters_2d(
429
+ self,
430
+ df_clustered: pd.DataFrame,
431
+ sample_size: int = 1000
432
+ ) -> go.Figure:
433
+ """
434
+ Create 2D visualization of feedback clusters using t-SNE
435
+
436
+ Args:
437
+ df_clustered: DataFrame with embeddings and cluster labels
438
+ sample_size: Max number of points to visualize (for performance)
439
+
440
+ Returns:
441
+ Plotly figure
442
+ """
443
+ # Sample data if too large
444
+ if len(df_clustered) > sample_size:
445
+ df_sample = df_clustered.sample(n=sample_size, random_state=42)
446
+ else:
447
+ df_sample = df_clustered
448
+
449
+ # Extract embeddings
450
+ embeddings = np.array(df_sample['embedding'].tolist())
451
+
452
+ # Reduce to 2D using t-SNE
453
+ tsne = TSNE(n_components=2, random_state=42, perplexity=30)
454
+ embeddings_2d = tsne.fit_transform(embeddings)
455
+
456
+ # Create visualization DataFrame with full feedback
457
+ viz_df = pd.DataFrame({
458
+ 'x': embeddings_2d[:, 0],
459
+ 'y': embeddings_2d[:, 1],
460
+ 'cluster': df_sample['cluster'].astype(str),
461
+ 'feedback': df_sample['feedback'].tolist(), # Full feedback
462
+ 'author_id': df_sample['author_id'].tolist(), # Full author_id
463
+ 'code_id': df_sample['code_id'].tolist()
464
+ })
465
+
466
+ # Create scatter plot with custom hover template
467
+ fig = go.Figure()
468
+
469
+ # Add trace for each cluster
470
+ for cluster_id in sorted(viz_df['cluster'].unique()):
471
+ cluster_data = viz_df[viz_df['cluster'] == cluster_id]
472
+
473
+ fig.add_trace(go.Scatter(
474
+ x=cluster_data['x'],
475
+ y=cluster_data['y'],
476
+ mode='markers',
477
+ name=f'Cluster {cluster_id}',
478
+ marker=dict(size=8, opacity=0.7),
479
+ customdata=np.column_stack((
480
+ cluster_data['feedback'],
481
+ cluster_data['author_id'],
482
+ cluster_data['code_id']
483
+ )),
484
+ hovertemplate='<b>Cluster %{fullData.name}</b><br><br>' +
485
+ '<b>Feedback:</b><br>%{customdata[0]}<br><br>' +
486
+ '<b>Author ID:</b> %{customdata[1]}<br>' +
487
+ '<b>Code ID:</b> %{customdata[2]}<br>' +
488
+ '<extra></extra>'
489
+ ))
490
+
491
+ fig.update_layout(
492
+ title='Feedback Clusters - 2D Projection (t-SNE)',
493
+ xaxis_title='', # Remove dimension labels
494
+ yaxis_title='',
495
+ xaxis=dict(showticklabels=False), # Hide axis ticks
496
+ yaxis=dict(showticklabels=False),
497
+ height=600,
498
+ hovermode='closest',
499
+ legend_title='Cluster'
500
+ )
501
+
502
+ return fig
503
+
504
+ def get_difficulty_heatmap(
505
+ self,
506
+ df_clustered: pd.DataFrame,
507
+ top_n_authors: int = 20,
508
+ top_n_clusters: int = 10
509
+ ) -> go.Figure:
510
+ """
511
+ Create heatmap showing which authors struggle with which topics
512
+
513
+ Args:
514
+ df_clustered: DataFrame with cluster labels and author_id
515
+ top_n_authors: Number of top authors to show
516
+ top_n_clusters: Number of top clusters to show
517
+
518
+ Returns:
519
+ Plotly figure
520
+ """
521
+ # Get top authors by submission count
522
+ top_authors = df_clustered['author_id'].value_counts().head(top_n_authors).index
523
+
524
+ # Get top clusters by frequency
525
+ top_clusters = df_clustered['cluster'].value_counts().head(top_n_clusters).index
526
+
527
+ # Filter data
528
+ df_filtered = df_clustered[
529
+ (df_clustered['author_id'].isin(top_authors)) &
530
+ (df_clustered['cluster'].isin(top_clusters))
531
+ ]
532
+
533
+ # Create pivot table
534
+ heatmap_data = df_filtered.groupby(['author_id', 'cluster']).size().reset_index(name='count')
535
+ heatmap_pivot = heatmap_data.pivot(index='author_id', columns='cluster', values='count').fillna(0)
536
+
537
+ # Create heatmap
538
+ fig = go.Figure(data=go.Heatmap(
539
+ z=heatmap_pivot.values,
540
+ x=[f'Cluster {c}' for c in heatmap_pivot.columns],
541
+ y=[f'Author {a[:8]}...' for a in heatmap_pivot.index],
542
+ colorscale='YlOrRd',
543
+ text=heatmap_pivot.values,
544
+ texttemplate='%{text}',
545
+ textfont={"size": 10},
546
+ colorbar=dict(title="Submission Count")
547
+ ))
548
+
549
+ fig.update_layout(
550
+ title='Student Difficulty Patterns - Author vs Topic Clusters',
551
+ xaxis_title='Difficulty Cluster',
552
+ yaxis_title='Student ID',
553
+ height=600
554
+ )
555
+
556
+ return fig
utils.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ import streamlit as st
2
+ def load_css(file_name):
3
+ with open(file_name) as f:
4
+ st.markdown(f'<style>{f.read()}</style>', unsafe_allow_html=True)