chunchu-08 commited on
Commit
bd2e6df
·
1 Parent(s): de3d14a

refactor: Centralize core logic and unify visualizations

Browse files

This major refactor introduces universal_model_wrapper.py to handle all LLM interactions, real-time detection, and prompt logic. The round-robin evaluation is now fully dynamic and adapts to model selection in the UI. All visualizations have been restyled for a consistent, professional appearance across all prompt types. The documentation has been updated to reflect this new, more robust architecture.

Files changed (6) hide show
  1. README.md +55 -179
  2. app.py +102 -47
  3. information +82 -0
  4. realtime_detector.py +0 -29
  5. response_generator.py +22 -97
  6. universal_model_wrapper.py +163 -0
README.md CHANGED
@@ -20,37 +20,70 @@ This application provides a complete LLM comparison and evaluation system that g
20
 
21
  ## Key Features
22
 
23
- - **Multi-Model Response Generation**: Generate responses from GPT-4, Claude 3, and Gemini 1.5
24
- - **Round-Robin Evaluation System**: Each model evaluates all other models for comprehensive comparison
25
- - **Real-time Query Detection**: Automatically detect and enhance real-time queries with Google search
26
- - **ATS Scoring**: Resume vs Job Description matching with detailed feedback
27
- - **Interactive Data Analysis & Visualization**: Generate interactive charts, heatmaps, and performance reports
28
- - **Batch Processing**: Handle multiple prompts from CSV files
29
- - **Modular Architecture**: Clean, production-ready code with separated concerns
30
- - **Gradio Web Interface**: User-friendly web UI for all features
31
- - **Export Capabilities**: ZIP bundles with all results and interactive visualizations
32
- - **Automated Deployment**: GitHub Actions for continuous deployment to Hugging Face Spaces
33
 
34
  ## Project Architecture
35
 
 
 
36
  ### Core Application Files
37
 
38
- - **`app.py`** - Main Gradio web interface (UI orchestration and deployment)
39
- - **`response_generator.py`** - Handles all LLM response generation and comparison
40
- - **`round_robin_evaluator.py`** - Comprehensive model evaluation system
41
- - **`llm_prompt_eval_analysis.py`** - Data analysis and visualization engine
42
- - **`llm_response_logger.py`** - Quick testing and logging tool
 
43
 
44
  ### Supporting Modules
45
 
46
- - **`realtime_detector.py`** - Detects real-time queries that need current information
47
- - **`search_fallback.py`** - Integrates Google search for real-time information enhancement
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- ### Configuration Files
50
 
51
- - **`requirements.txt`** - Python dependencies and versions
52
- - **`.env`** - API keys and configuration (create this file)
53
- - **`.github/workflows/deploy-to-hf.yml`** - GitHub Actions for automated deployment
 
 
 
 
54
 
55
  ## Installation
56
 
@@ -82,57 +115,6 @@ This application provides a complete LLM comparison and evaluation system that g
82
  GOOGLE_CSE_ID=your_google_cse_id_here
83
  ```
84
 
85
- ## Usage
86
-
87
- ### Web Interface (Recommended)
88
-
89
- Launch the Gradio web interface:
90
- ```bash
91
- python app.py
92
- ```
93
-
94
- The interface provides:
95
- - **Input Section**: Enter prompts, upload files, and configure options
96
- - **Results Tabs**: View responses, evaluations, search results, and interactive visualizations
97
- - **Export Options**: Download results as ZIP bundles with interactive HTML charts
98
- - **Real-time Features**: Automatic query detection and search enhancement
99
-
100
- ### Standalone Tools
101
-
102
- Each module can be used independently for specific tasks:
103
-
104
- #### Response Generator
105
- ```bash
106
- python response_generator.py
107
- ```
108
- - Interactive mode for single prompts
109
- - Batch mode for multiple prompts from file
110
- - Side-by-side response comparison
111
-
112
- #### Round-Robin Evaluator
113
- ```bash
114
- python round_robin_evaluator.py
115
- ```
116
- - Test the evaluation system
117
- - View evaluation metrics and scores
118
- - Export results to CSV
119
-
120
- #### Analysis Tool
121
- ```bash
122
- python llm_prompt_eval_analysis.py
123
- ```
124
- - Analyze latest CSV results
125
- - Generate visualizations and charts
126
- - Create comprehensive performance reports
127
-
128
- #### Response Logger
129
- ```bash
130
- python llm_response_logger.py
131
- ```
132
- - Quick testing of all models
133
- - Batch testing from files
134
- - Rapid evaluation and logging
135
-
136
  ## API Requirements
137
 
138
  ### Required APIs
@@ -181,113 +163,7 @@ When a resume and job description are provided, the system performs ATS (Applica
181
  - `heatmap.html`, `radar.html`, `barchart.html` - Interactive visualization files
182
  - `bundle.zip` - Complete export package
183
 
184
- ## Technical Architecture
185
-
186
- ### Design Principles
187
- - **Separation of Concerns**: Each file has a specific responsibility
188
- - **Clean Code**: Production-ready without decorative elements
189
- - **Error Handling**: Comprehensive error handling and logging
190
- - **Reusable Components**: Modules can be used independently
191
- - **Configurable**: Easy to modify and extend
192
- - **Hugging Face Compatible**: No external browser dependencies for chart generation
193
-
194
- ### Module Responsibilities
195
-
196
- | Module | Responsibility |
197
- |--------|---------------|
198
- | `app.py` | UI orchestration and deployment |
199
- | `response_generator.py` | LLM API calls and response collection |
200
- | `round_robin_evaluator.py` | Model evaluation and scoring |
201
- | `realtime_detector.py` | Real-time query detection |
202
- | `search_fallback.py` | Google search integration |
203
- | `llm_prompt_eval_analysis.py` | Data analysis and visualization |
204
-
205
- ## Deployment
206
-
207
- ### Automated Deployment with GitHub Actions
208
-
209
- The project includes automated deployment to Hugging Face Spaces using GitHub Actions:
210
-
211
- #### Setup Requirements
212
-
213
- 1. **Hugging Face Access Token**:
214
- - Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
215
- - Create a new token with **Write** permissions
216
- - Copy the token (starts with `hf_...`)
217
-
218
- 2. **GitHub Repository Secrets**:
219
- - Go to your GitHub repository Settings
220
- - Navigate to Secrets and variables → Actions
221
- - Add a new repository secret:
222
- - **Name**: `HF_TOKEN`
223
- - **Value**: Your Hugging Face token
224
-
225
- #### Deployment Workflow
226
-
227
- The `.github/workflows/deploy-to-hf.yml` file automatically:
228
- - Triggers on pushes to the main branch
229
- - Deploys changes to Hugging Face Spaces
230
- - Maintains continuous integration
231
-
232
- #### Usage
233
-
234
- After setup, simply push to GitHub:
235
- ```bash
236
- git add .
237
- git commit -m "Update application"
238
- git push origin main
239
- ```
240
-
241
- The GitHub Action will automatically deploy to Hugging Face Spaces.
242
-
243
- ### Manual Deployment
244
-
245
- For local deployment, ensure all dependencies are installed and API keys are configured.
246
-
247
- ## Error Handling
248
-
249
- The system includes comprehensive error handling:
250
- - **API Failures**: Graceful handling of API errors with fallback options
251
- - **Missing Keys**: Clear indication of missing API keys
252
- - **Network Issues**: Retry logic and connection management
253
- - **Data Validation**: Input validation and sanitization
254
- - **File Processing**: Robust handling of various file formats
255
-
256
  ## Development and Testing
257
 
258
  ### Testing Tools
259
- - **`test_standalone_tools.py`**: Demonstrates usage of all standalone tools
260
- - **Batch Testing**: Process multiple prompts efficiently
261
- - **Performance Monitoring**: Track evaluation metrics over time
262
-
263
- ### Development Guidelines
264
- 1. Follow the modular architecture
265
- 2. Maintain clean, production-ready code
266
- 3. Add proper error handling
267
- 4. Update documentation for new features
268
- 5. Test all modules independently
269
-
270
- ## Contributing
271
-
272
- 1. Follow the established modular architecture
273
- 2. Maintain clean, production-ready code standards
274
- 3. Add comprehensive error handling
275
- 4. Update documentation for any new features
276
- 5. Test all modules independently before submission
277
-
278
- ## License
279
-
280
- This project is licensed under the MIT License - see the LICENSE file for details.
281
-
282
- ## Support
283
-
284
- For issues and questions:
285
- 1. Check the API key configuration in `.env`
286
- 2. Verify all dependencies are installed correctly
287
- 3. Review error messages in the console output
288
- 4. Check the results directory for output files
289
- 5. Consult the project documentation for detailed module descriptions
290
-
291
- ## Live Application
292
-
293
- Access the live application at: [https://huggingface.co/spaces/chunchu-08/LLM-Comparison-Hub](https://huggingface.co/spaces/chunchu-08/LLM-Comparison-Hub) "<!-- trigger deploy -->"
 
20
 
21
  ## Key Features
22
 
23
+ - **Multi-Model Response Generation**: Dynamically generate responses from any combination of GPT-4, Claude 3, and Gemini 1.5 using a simple model selector.
24
+ - **Dynamic Round-Robin Evaluation**: A robust evaluation system where selected models evaluate each other. If a model is deselected, the evaluation logic adapts automatically.
25
+ - **Real-time Query Detection**: Automatically detects if a prompt requires current information and fetches it using a Google search fallback.
26
+ - **ATS Scoring**: Performs detailed resume vs. job description matching and scoring.
27
+ - **Interactive Data Analysis & Visualization**: Generates consistent, professionally styled charts (Heatmap, Radar, Bar) for all prompt types.
28
+ - **Batch Processing**: Handles multiple prompts from CSV files.
29
+ - **Modular Architecture**: A clean, production-ready codebase with a new `universal_model_wrapper.py` that centralizes core logic.
30
+ - **Gradio Web Interface**: A user-friendly web UI with a model selector to easily choose which LLMs to run.
31
+ - **Export Capabilities**: Download a ZIP bundle with all evaluation results and interactive HTML charts.
32
+ - **Automated Deployment**: GitHub Actions for continuous deployment to Hugging Face Spaces.
33
 
34
  ## Project Architecture
35
 
36
+ The architecture has been refactored for simplicity and robustness.
37
+
38
  ### Core Application Files
39
 
40
+ - **`app.py`** - Main Gradio web interface, including UI logic and the model selector.
41
+ - **`universal_model_wrapper.py`** - **New core module!** Centralizes all LLM API calls, real-time detection, search fallback, and ATS/general prompt logic.
42
+ - **`response_generator.py`** - A simplified wrapper that interfaces between the app and the `universal_model_wrapper`.
43
+ - **`round_robin_evaluator.py`** - A dynamic evaluation engine that adapts to the models selected in the UI.
44
+ - **`llm_prompt_eval_analysis.py`** - Data analysis and visualization engine.
45
+ - **`llm_response_logger.py`** - Quick testing and logging tool.
46
 
47
  ### Supporting Modules
48
 
49
+ - **`search_fallback.py`**: This file is kept for reference, but its core functionality has been integrated into `universal_model_wrapper.py` for a more robust, self-contained architecture.
50
+
51
+ ## Usage
52
+
53
+ ### Web Interface (Recommended)
54
+
55
+ Launch the Gradio web interface:
56
+ ```bash
57
+ python app.py
58
+ ```
59
+
60
+ The interface provides:
61
+ - **Input Section**: Enter prompts, upload files, and use the **Model Selector** checkboxes to choose which LLMs to run.
62
+ - **Results Tabs**: View responses, evaluations, search results, and interactive visualizations.
63
+ - **Export Options**: Download results as ZIP bundles with interactive HTML charts.
64
+ - **Real-time Features**: Automatic query detection and search enhancement.
65
+
66
+ ### Model Selection
67
+ The UI now includes a set of checkboxes allowing you to select any combination of models (GPT-4, Claude 3, Gemini 1.5) for a given query. The application, including the round-robin evaluation, will dynamically adapt to your selection.
68
+
69
+ ## Technical Architecture
70
+
71
+ ### Design Principles
72
+ - **Centralized Logic**: The new `universal_model_wrapper.py` acts as a single source of truth for model interaction.
73
+ - **Dynamic & Robust**: The evaluation system is no longer static and adapts to user input, preventing crashes when models are deselected.
74
+ - **Separation of Concerns**: Each file has a clear, specific responsibility.
75
+ - **Clean Code**: Production-ready and easy to maintain.
76
+ - **Hugging Face Compatible**: No external browser dependencies for chart generation.
77
 
78
+ ### Module Responsibilities
79
 
80
+ | Module | Responsibility |
81
+ |--------|---------------|
82
+ | `app.py` | UI orchestration, including the model selector and deployment. |
83
+ | `universal_model_wrapper.py` | Handles all LLM calls, prompt logic, and search. |
84
+ | `response_generator.py` | Connects the UI to the universal wrapper. |
85
+ | `round_robin_evaluator.py` | Dynamically evaluates the currently selected models. |
86
+ | `llm_prompt_eval_analysis.py` | Data analysis and visualization. |
87
 
88
  ## Installation
89
 
 
115
  GOOGLE_CSE_ID=your_google_cse_id_here
116
  ```
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ## API Requirements
119
 
120
  ### Required APIs
 
163
  - `heatmap.html`, `radar.html`, `barchart.html` - Interactive visualization files
164
  - `bundle.zip` - Complete export package
165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  ## Development and Testing
167
 
168
  ### Testing Tools
169
+ - **`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -1,4 +1,4 @@
1
- # gradio_full_llm_eval.py – Final Updated Version with ATS Scoring and Visualized UI
2
  import gradio as gr
3
  import os
4
  import pandas as pd
@@ -12,8 +12,6 @@ from dotenv import load_dotenv
12
 
13
  from response_generator import generate_all_responses_with_reasoning
14
  from round_robin_evaluator import comprehensive_round_robin_evaluation
15
- from realtime_detector import is_realtime_prompt
16
- from search_fallback import get_google_snippets
17
 
18
  load_dotenv()
19
  pio.kaleido.scope.default_format = "png"
@@ -70,31 +68,72 @@ Return JSON:
70
  return {"ats_score": 50, "strengths": [], "gaps": [], "suggestions": ["Check formatting."]}
71
 
72
  def create_visualizations(df, results_dir):
73
- image_files = []
74
  summary = df.groupby('target_model')[metrics].mean().reset_index()
75
-
76
- heatmap = px.imshow(summary[metrics].values, x=metrics, y=summary['target_model'],
77
- labels=dict(x="Metric", y="Model", color="Score"),
78
- title="Heatmap: Metrics Across Models", color_continuous_scale='Viridis')
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  heatmap_path = os.path.join(results_dir, "heatmap.html")
80
  heatmap.write_html(heatmap_path)
81
- image_files.append(heatmap_path)
82
 
 
83
  radar = go.Figure()
84
  for _, row in summary.iterrows():
85
- radar.add_trace(go.Scatterpolar(r=list(row[metrics]), theta=metrics, fill='toself', name=row['target_model']))
86
- radar.update_layout(title="Radar Chart: Model Score Profiles", polar=dict(radialaxis=dict(visible=True, range=[0, 1])))
 
 
 
 
 
 
 
 
 
 
 
 
87
  radar_path = os.path.join(results_dir, "radar.html")
88
  radar.write_html(radar_path)
89
- image_files.append(radar_path)
90
-
91
- bar = px.bar(summary.melt(id_vars='target_model'), x='variable', y='value', color='target_model', barmode='group',
92
- title="Bar Chart: Metric Comparison")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  bar_path = os.path.join(results_dir, "barchart.html")
94
  bar.write_html(bar_path)
95
- image_files.append(bar_path)
96
 
97
- return (heatmap, radar, bar), image_files
98
 
99
  def format_ats_feedback(score, strengths, gaps, suggestions):
100
  color = "🟢" if score >= 75 else "🟡" if score >= 50 else "🔴"
@@ -114,8 +153,9 @@ def format_ats_feedback(score, strengths, gaps, suggestions):
114
  def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_file, model_selection):
115
  selected_models = [m for m, enabled in zip(["GPT-4", "Claude 3", "Gemini 1.5"], model_selection) if enabled]
116
  resume_text = ""
117
- batch_mode = user_file and user_file.name.endswith(".csv")
118
- resume_mode = user_file and user_file.name.lower().endswith(('.pdf', '.docx', '.txt'))
 
119
 
120
  prompts = [prompt]
121
  ats_summary_texts = []
@@ -131,38 +171,49 @@ def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_f
131
  zip_path, ats_table_markdown = None, ""
132
 
133
  for prompt_text in prompts:
134
- search_results = get_google_snippets(prompt_text) if enable_realtime and is_realtime_prompt(prompt_text) else ""
135
- final_prompt = f"{prompt_text}\n\nRecent info: {search_results}" if search_results else prompt_text
136
- responses = generate_all_responses_with_reasoning(final_prompt, selected_models)
 
 
 
 
 
 
 
 
137
 
138
  ats_rows = []
139
  for model in responses:
140
  model_resp = responses[model]['response']
141
- if resume_text:
142
- ats_result = ats_score_advanced(model_resp, resume_text, prompt_text)
143
- feedback = format_ats_feedback(ats_result['ats_score'], ats_result.get('strengths', []), ats_result.get('gaps', []), ats_result.get('suggestions', []))
144
- responses[model]['ats_embed'] = f"### Response\n\n{model_resp}\n\n---\n\n### ATS Evaluation\n\n{feedback}"
145
- ats_rows.append(f"| {model} | {ats_result['ats_score']} | {', '.join(ats_result.get('strengths', []))} | {', '.join(ats_result.get('suggestions', []))} |")
146
- else:
147
- responses[model]['ats_embed'] = f"### Response\n\n{model_resp}\n\n---\n\n**Explainability:**\n{responses[model]['reasoning']}"
 
 
 
148
  if ats_rows:
149
  ats_table_markdown = "| Model | Score | Strengths | Suggestions |\n|-------|-------|-----------|-------------|\n" + "\n".join(ats_rows)
150
 
151
- if enable_eval:
152
- compact = {k: v['response'] for k, v in responses.items()}
153
- eval_result = comprehensive_round_robin_evaluation(compact, final_prompt)
154
- for model, data in eval_result.items():
155
- for evaluator, scores in data['evaluations'].items():
156
- row = {
157
- 'prompt': prompt_text,
158
- 'target_model': model,
159
- 'evaluator': evaluator,
160
- 'response': responses[model]['response'],
161
- 'explainability': responses[model]['reasoning']
162
- }
163
- row.update({k: scores.get(k, 0.5) for k in metrics})
164
- row.update({f"avg_{k}": data['average_scores'].get(k, 0.5) for k in metrics})
165
- all_rows.append(row)
166
 
167
  df_all = pd.DataFrame(all_rows)
168
  if not df_all.empty:
@@ -182,14 +233,18 @@ def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_f
182
  df_batch['ATS Summary'] = ats_summary_texts
183
  df_batch.to_csv(os.path.join(results_dir, "batch_prompts_output.csv"), index=False)
184
  zipf.write(os.path.join(results_dir, "batch_prompts_output.csv"), arcname="batch_prompts_output.csv")
 
 
 
 
185
 
186
  return tuple(
187
  responses[model].get('ats_embed', responses[model]['response']) for model in ["GPT-4", "Claude 3", "Gemini 1.5"]
188
  ) + (
189
  search_results or "N/A",
190
  *all_charts,
191
- df_all[['target_model', 'evaluator'] + metrics] if not df_all.empty else pd.DataFrame(),
192
- ats_table_markdown,
193
  zip_path
194
  )
195
 
@@ -222,7 +277,7 @@ This app compares LLM responses using round-robin evaluations, with real-time qu
222
  model_selector = gr.CheckboxGroup(label="Select Models", choices=["GPT-4", "Claude 3", "Gemini 1.5"], value=["GPT-4", "Claude 3", "Gemini 1.5"])
223
  enable_realtime = gr.Checkbox(label="Enable real-time detection", value=True)
224
  enable_eval = gr.Checkbox(label="Enable evaluation", value=True)
225
- enable_analysis = gr.Checkbox(label="Enable analysis", value=True)
226
  submit = gr.Button("Run Evaluation")
227
 
228
  with gr.Column():
 
1
+ # app.py – Final Updated Version with Unified Visualization (Model selection-safe + Visualization Fixes)
2
  import gradio as gr
3
  import os
4
  import pandas as pd
 
12
 
13
  from response_generator import generate_all_responses_with_reasoning
14
  from round_robin_evaluator import comprehensive_round_robin_evaluation
 
 
15
 
16
  load_dotenv()
17
  pio.kaleido.scope.default_format = "png"
 
68
  return {"ats_score": 50, "strengths": [], "gaps": [], "suggestions": ["Check formatting."]}
69
 
70
  def create_visualizations(df, results_dir):
71
+ html_files = []
72
  summary = df.groupby('target_model')[metrics].mean().reset_index()
73
+ font_style = dict(family="Arial, sans-serif", size=12, color="black")
74
+
75
+ # 1. Heatmap with professional styling
76
+ heatmap = px.imshow(
77
+ summary[metrics].values,
78
+ x=metrics,
79
+ y=summary['target_model'],
80
+ labels=dict(x="Metric", y="Model", color="Score"),
81
+ title="<b>Heatmap: Metrics Across Models</b>",
82
+ color_continuous_scale='Viridis'
83
+ )
84
+ heatmap.update_layout(
85
+ margin=dict(l=80, r=40, t=80, b=120),
86
+ xaxis_tickangle=-45,
87
+ title_font=dict(size=18, family="Arial, sans-serif"),
88
+ font=font_style
89
+ )
90
  heatmap_path = os.path.join(results_dir, "heatmap.html")
91
  heatmap.write_html(heatmap_path)
92
+ html_files.append(heatmap_path)
93
 
94
+ # 2. Radar Chart with professional styling
95
  radar = go.Figure()
96
  for _, row in summary.iterrows():
97
+ radar.add_trace(go.Scatterpolar(
98
+ r=list(row[metrics]),
99
+ theta=metrics,
100
+ fill='toself',
101
+ name=row['target_model']
102
+ ))
103
+ radar.update_layout(
104
+ title="<b>Radar Chart: Model Score Profiles</b>",
105
+ polar=dict(radialaxis=dict(visible=True, range=[0, 1])),
106
+ legend_title_text='Models',
107
+ title_font=dict(size=18, family="Arial, sans-serif"),
108
+ font=font_style,
109
+ margin=dict(l=60, r=60, t=80, b=80)
110
+ )
111
  radar_path = os.path.join(results_dir, "radar.html")
112
  radar.write_html(radar_path)
113
+ html_files.append(radar_path)
114
+
115
+ # 3. Bar Chart with professional styling
116
+ bar = px.bar(
117
+ summary.melt(id_vars='target_model'),
118
+ x='variable',
119
+ y='value',
120
+ color='target_model',
121
+ barmode='group',
122
+ title="<b>Bar Chart: Metric Comparison</b>",
123
+ labels={'variable': 'Metric', 'value': 'Score', 'target_model': 'Model'}
124
+ )
125
+ bar.update_layout(
126
+ margin=dict(l=60, r=20, t=80, b=120),
127
+ xaxis_tickangle=-45,
128
+ legend_title_text='Model',
129
+ title_font=dict(size=18, family="Arial, sans-serif"),
130
+ font=font_style
131
+ )
132
  bar_path = os.path.join(results_dir, "barchart.html")
133
  bar.write_html(bar_path)
134
+ html_files.append(bar_path)
135
 
136
+ return (heatmap, radar, bar), html_files
137
 
138
  def format_ats_feedback(score, strengths, gaps, suggestions):
139
  color = "🟢" if score >= 75 else "🟡" if score >= 50 else "🔴"
 
153
  def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_file, model_selection):
154
  selected_models = [m for m, enabled in zip(["GPT-4", "Claude 3", "Gemini 1.5"], model_selection) if enabled]
155
  resume_text = ""
156
+ job_description = prompt
157
+ batch_mode = user_file and hasattr(user_file, 'name') and user_file.name.endswith(".csv")
158
+ resume_mode = user_file and hasattr(user_file, 'name') and user_file.name.lower().endswith(('.pdf', '.docx', '.txt'))
159
 
160
  prompts = [prompt]
161
  ats_summary_texts = []
 
171
  zip_path, ats_table_markdown = None, ""
172
 
173
  for prompt_text in prompts:
174
+ responses = generate_all_responses_with_reasoning(
175
+ prompt_text,
176
+ selected_models,
177
+ resume_text if resume_mode else None,
178
+ job_description if resume_mode else None
179
+ )
180
+
181
+ if responses:
182
+ first_response = list(responses.values())[0]
183
+ search_results = first_response.get('search_results', '')
184
+ is_ats = first_response.get('is_ats', False)
185
 
186
  ats_rows = []
187
  for model in responses:
188
  model_resp = responses[model]['response']
189
+ model_reasoning = responses[model]['reasoning']
190
+ responses[model]['ats_embed'] = f"### Response\n\n{model_resp}\n\n---\n\n**Explainability:**\n{model_reasoning}"
191
+
192
+ if resume_mode and is_ats:
193
+ try:
194
+ ats_result = ats_score_advanced(model_resp, resume_text, prompt_text)
195
+ ats_rows.append(f"| {model} | {ats_result['ats_score']} | {', '.join(ats_result.get('strengths', []))} | {', '.join(ats_result.get('suggestions', []))} |")
196
+ except:
197
+ ats_rows.append(f"| {model} | N/A | N/A | N/A |")
198
+
199
  if ats_rows:
200
  ats_table_markdown = "| Model | Score | Strengths | Suggestions |\n|-------|-------|-----------|-------------|\n" + "\n".join(ats_rows)
201
 
202
+ # Always run evaluation to generate chart data
203
+ compact = {k: v['response'] for k, v in responses.items()}
204
+ eval_result = comprehensive_round_robin_evaluation(compact, prompt_text)
205
+ for model, data in eval_result.items():
206
+ for evaluator, scores in data['evaluations'].items():
207
+ row = {
208
+ 'prompt': prompt_text,
209
+ 'target_model': model,
210
+ 'evaluator': evaluator,
211
+ 'response': responses[model]['response'],
212
+ 'explainability': responses[model]['reasoning']
213
+ }
214
+ row.update({k: scores.get(k, 0.5) for k in metrics})
215
+ row.update({f"avg_{k}": data['average_scores'].get(k, 0.5) for k in metrics})
216
+ all_rows.append(row)
217
 
218
  df_all = pd.DataFrame(all_rows)
219
  if not df_all.empty:
 
233
  df_batch['ATS Summary'] = ats_summary_texts
234
  df_batch.to_csv(os.path.join(results_dir, "batch_prompts_output.csv"), index=False)
235
  zipf.write(os.path.join(results_dir, "batch_prompts_output.csv"), arcname="batch_prompts_output.csv")
236
+
237
+ # Conditional UI updates
238
+ eval_table = df_all[['target_model', 'evaluator'] + metrics] if not df_all.empty and enable_eval else pd.DataFrame()
239
+ ats_md = ats_table_markdown if resume_mode else ""
240
 
241
  return tuple(
242
  responses[model].get('ats_embed', responses[model]['response']) for model in ["GPT-4", "Claude 3", "Gemini 1.5"]
243
  ) + (
244
  search_results or "N/A",
245
  *all_charts,
246
+ eval_table,
247
+ ats_md,
248
  zip_path
249
  )
250
 
 
277
  model_selector = gr.CheckboxGroup(label="Select Models", choices=["GPT-4", "Claude 3", "Gemini 1.5"], value=["GPT-4", "Claude 3", "Gemini 1.5"])
278
  enable_realtime = gr.Checkbox(label="Enable real-time detection", value=True)
279
  enable_eval = gr.Checkbox(label="Enable evaluation", value=True)
280
+ enable_analysis = gr.Checkbox(label="Enable analysis (currently not used)", value=True)
281
  submit = gr.Button("Run Evaluation")
282
 
283
  with gr.Column():
information ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLM-Compare-Hub Project File Structure and Use Cases
2
+ ====================================================
3
+
4
+ Core Application Files
5
+ ---------------------
6
+
7
+ app.py
8
+ - Use Case: Main Gradio web interface and application entry point.
9
+ - Function: Orchestrates the UI, handles user input (including model selection), and manages the data flow between all other modules.
10
+
11
+ universal_model_wrapper.py
12
+ - Use Case: The new, central engine for all LLM interactions.
13
+ - Function: Contains the complete logic for handling API calls to GPT-4, Claude 3, and Gemini 1.5. It also includes self-contained functions for real-time prompt detection and Google search fallback. It intelligently determines whether a prompt is for a general query or an ATS evaluation.
14
+ - Status: This is the core logic hub of the application.
15
+
16
+ response_generator.py
17
+ - Use Case: A lean interface between the UI (`app.py`) and the model logic.
18
+ - Function: Takes requests from the UI and passes them to the `universal_model_wrapper.py`. It then returns the formatted responses back to the UI.
19
+ - Status: Simplified to be a clean pass-through, improving modularity.
20
+
21
+ round_robin_evaluator.py
22
+ - Use Case: Dynamic, comprehensive model evaluation system.
23
+ - Function: Each *selected* model evaluates its peers. The evaluation logic is now fully dynamic and adapts based on which models are selected in the UI, preventing crashes.
24
+ - Status: The core evaluation engine.
25
+
26
+ llm_prompt_eval_analysis.py
27
+ - Use Case: Data analysis and visualization.
28
+ - Function: Analyzes evaluation results and generates consistently styled, professional charts and reports.
29
+ - Status: Standalone analysis tool + used by the Gradio app.
30
+
31
+ llm_response_logger.py
32
+ - Use Case: Quick testing and logging tool for developers.
33
+ - Function: Allows for rapid testing of models with single or batch prompts.
34
+ - Status: Standalone testing tool.
35
+
36
+ Supporting Modules
37
+ -----------------
38
+
39
+ search_fallback.py
40
+ - Use Case: Provides Google search functionality.
41
+ - Status: Although the primary search logic has been moved into `universal_model_wrapper.py` for robustness, this file is kept in the project for reference or potential future use. It is not actively called by the main application flow.
42
+
43
+ Configuration & Documentation
44
+ ----------------------------
45
+
46
+ requirements.txt
47
+ - Use Case: Python dependencies.
48
+ - Function: Lists all required packages for the project to run.
49
+
50
+ .env
51
+ - Use Case: API key configuration.
52
+ - Function: Securely stores all necessary API keys.
53
+
54
+ .gitignore
55
+ - Use Case: Git version control.
56
+ - Function: Excludes sensitive files and unnecessary directories from the repository.
57
+
58
+ README.md
59
+ - Use Case: Main project documentation.
60
+ - Function: Provides setup instructions, usage guides, and an overview of the architecture.
61
+
62
+ Testing & Development
63
+ --------------------
64
+
65
+ test_standalone_tools.py
66
+ - Use Case: Testing and demonstration.
67
+ - Function: Shows how to use the standalone modules.
68
+ - Status: Development/testing tool.
69
+
70
+ Project Summary
71
+ ==============
72
+
73
+ The project has been refactored into a more robust, modular architecture with the `universal_model_wrapper.py` at its core. This new structure centralizes the most complex logic, making the application easier to maintain and less prone to errors. The evaluation system is now fully dynamic, adapting to the user's model selection in the UI.
74
+
75
+ Key Features:
76
+ - **Dynamic Model Selection**: Choose any combination of models to run.
77
+ - **Robust Round-Robin Evaluation**: The evaluation system adapts to your model selection.
78
+ - **Centralized Logic**: The `universal_model_wrapper.py` handles all core model interactions.
79
+ - **Consistent Visualizations**: All charts are now generated with the same professional styling.
80
+ - **Self-Contained Search**: Real-time detection and search are handled within the core wrapper.
81
+ - **Clean and Maintainable**: The architecture is simplified and easier to understand.
82
+
realtime_detector.py DELETED
@@ -1,29 +0,0 @@
1
- # realtime_detector.py
2
- import os
3
- from openai import OpenAI
4
- from dotenv import load_dotenv
5
-
6
- load_dotenv()
7
-
8
- client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
9
-
10
- def is_realtime_prompt(prompt: str) -> bool:
11
- try:
12
- system_msg = "You are a classifier that determines whether a user's question requires real-time information or not. Answer 'yes' or 'no'."
13
- user_msg = f"Question: {prompt}\nAnswer with yes or no:"
14
-
15
- response = client.chat.completions.create(
16
- model="gpt-3.5-turbo",
17
- messages=[
18
- {"role": "system", "content": system_msg},
19
- {"role": "user", "content": user_msg}
20
- ],
21
- temperature=0
22
- )
23
-
24
- reply = response.choices[0].message.content.strip().lower()
25
- return "yes" in reply
26
-
27
- except Exception as e:
28
- print("[RealTime Detector Error]", e)
29
- return False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
response_generator.py CHANGED
@@ -1,103 +1,28 @@
1
  import os
2
  from dotenv import load_dotenv
3
- from openai import OpenAI
4
- import anthropic
5
- import google.generativeai as genai
6
 
7
  # Load API keys from .env
8
  load_dotenv()
9
- openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
10
- anthropic_client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))
11
- genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
12
 
13
- def get_gpt4_response(prompt):
14
- try:
15
- if "Recent info:" in prompt:
16
- user_prompt, realtime_info = prompt.split("Recent info:", 1)
17
- messages = [
18
- {
19
- "role": "system",
20
- "content": (
21
- "You are an expert ATS evaluator. You are comparing a job description (JD) and a resume to produce an ATS score. "
22
- "Highlight matches, gaps, suggestions for improvement, and an overall score."
23
- )
24
- },
25
- {"role": "user", "content": user_prompt.strip()},
26
- {
27
- "role": "user",
28
- "content": (
29
- f"Here is some recent real-time context for your reference:\n\n{realtime_info.strip()}\n\n"
30
- "Based on this, tailor your response as if the data is accurate."
31
- )
32
- }
33
- ]
34
- else:
35
- messages = [
36
- {
37
- "role": "system",
38
- "content": (
39
- "You are an expert ATS evaluator. You are comparing a job description (JD) and a resume to produce an ATS score. "
40
- "Highlight matches, gaps, suggestions for improvement, and an overall score."
41
- )
42
- },
43
- {"role": "user", "content": prompt}
44
- ]
45
-
46
- response = openai_client.chat.completions.create(
47
- model="gpt-4",
48
- messages=messages,
49
- temperature=0.7
50
- )
51
- return response.choices[0].message.content
52
-
53
- except Exception as e:
54
- print(f"Error with GPT-4: {e}")
55
- return "GPT-4 failed."
56
-
57
- def get_claude_response(prompt):
58
- try:
59
- response = anthropic_client.messages.create(
60
- model="claude-3-opus-20240229",
61
- max_tokens=1000,
62
- temperature=0.7,
63
- messages=[{"role": "user", "content": prompt}]
64
- )
65
- return response.content[0].text
66
- except Exception as e:
67
- print(f"Error with Claude 3: {e}")
68
- return "Claude 3 failed."
69
-
70
- def get_gemini_response(prompt):
71
- try:
72
- model = genai.GenerativeModel("gemini-1.5-pro")
73
- response = model.generate_content(prompt)
74
- return response.text
75
- except Exception as e:
76
- print(f"Error with Gemini: {e}")
77
- return "Gemini 1.5 failed."
78
-
79
- def generate_all_responses_with_reasoning(prompt, selected_models=None):
80
- all_models = {
81
- "GPT-4": get_gpt4_response,
82
- "Claude 3": get_claude_response,
83
- "Gemini 1.5": get_gemini_response
84
- }
85
- models_to_use = selected_models if selected_models else list(all_models.keys())
86
-
87
- responses = {}
88
- for model_name in models_to_use:
89
- fetch_fn = all_models[model_name]
90
- try:
91
- response = fetch_fn(prompt)
92
- reason_prompt = (
93
- f"Why did you generate this response to the prompt:\n\n"
94
- f"\"{prompt}\"\n\n"
95
- f"Your Response:\n\"{response}\"\n\n"
96
- "Explain your reasoning behind structuring or phrasing it that way."
97
- )
98
- reasoning = fetch_fn(reason_prompt)
99
- responses[model_name] = {"response": response, "reasoning": reasoning}
100
- except Exception as e:
101
- responses[model_name] = {"response": "Failed", "reasoning": str(e)}
102
-
103
- return responses
 
1
  import os
2
  from dotenv import load_dotenv
3
+ from universal_model_wrapper import universal_model_responses
 
 
4
 
5
  # Load API keys from .env
6
  load_dotenv()
 
 
 
7
 
8
+ def generate_all_responses(prompt, resume=None, job_description=None):
9
+ return universal_model_responses(prompt, resume, job_description)
10
+
11
+ def generate_all_responses_with_reasoning(prompt, selected_models=None, resume=None, job_description=None):
12
+ """
13
+ Generate responses from all selected models with reasoning.
14
+ Uses the universal model wrapper for enhanced functionality.
15
+ """
16
+ # Get responses from the universal wrapper
17
+ all_responses = universal_model_responses(prompt, resume, job_description)
18
+
19
+ # Filter by selected models if specified
20
+ if selected_models:
21
+ filtered_responses = {
22
+ model: data
23
+ for model, data in all_responses.items()
24
+ if model in selected_models
25
+ }
26
+ return filtered_responses
27
+
28
+ return all_responses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
universal_model_wrapper.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from dotenv import load_dotenv
3
+ from openai import OpenAI
4
+ import anthropic
5
+ import google.generativeai as genai
6
+ from search_fallback import get_google_snippets
7
+
8
+ load_dotenv()
9
+
10
+ openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
11
+ anthropic_client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))
12
+ genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
13
+
14
+ def detect_realtime_prompt(prompt: str) -> bool:
15
+ try:
16
+ resp = openai_client.chat.completions.create(
17
+ model="gpt-4",
18
+ messages=[
19
+ {"role": "system", "content": "Respond only with 'True' or 'False'. Does the following prompt require real-time or up-to-date information to answer accurately?"},
20
+ {"role": "user", "content": prompt}
21
+ ],
22
+ temperature=0
23
+ )
24
+ return "true" in resp.choices[0].message.content.strip().lower()
25
+ except Exception as e:
26
+ print("Realtime detection failed:", e)
27
+ return False
28
+
29
+ def build_prompt(prompt, resume=None, jd=None, search_snippets=None):
30
+ if resume and jd:
31
+ return f"""### Task: Evaluate ATS Fit
32
+
33
+ Resume:
34
+ {resume}
35
+
36
+ Job Description:
37
+ {jd}
38
+
39
+ Instructions:
40
+ - Provide an ATS match score out of 100
41
+ - Justify the score with 3 bullet points
42
+ - Highlight missing skills, if any
43
+ """
44
+ if search_snippets:
45
+ return f"{prompt}\n\n[Latest Info Retrieved via Google Search]\n{search_snippets}"
46
+ return prompt
47
+
48
+ def ask_reasoning(model_name, response, prompt):
49
+ follow_up = f"""You answered:
50
+
51
+ {response}
52
+
53
+ Now explain why you gave this answer to the prompt: "{prompt}".
54
+ Respond in 2-3 bullet points."""
55
+ try:
56
+ if model_name == "GPT-4":
57
+ resp = openai_client.chat.completions.create(
58
+ model="gpt-4",
59
+ messages=[
60
+ {"role": "system", "content": "You're an AI that explains why a given answer was provided."},
61
+ {"role": "user", "content": follow_up}
62
+ ],
63
+ temperature=0
64
+ )
65
+ return resp.choices[0].message.content.strip()
66
+ elif model_name == "Claude 3":
67
+ resp = anthropic_client.messages.create(
68
+ model="claude-3-opus-20240229",
69
+ max_tokens=500,
70
+ temperature=0,
71
+ messages=[{"role": "user", "content": follow_up}]
72
+ )
73
+ return resp.content[0].text.strip()
74
+ elif model_name == "Gemini 1.5":
75
+ model = genai.GenerativeModel("gemini-1.5-pro")
76
+ return model.generate_content(follow_up).text.strip()
77
+ except Exception as e:
78
+ return f"[Reasoning Error] {e}"
79
+
80
+ def get_gpt4_response(prompt, resume=None, jd=None, search_snippets=None):
81
+ system_instruction = (
82
+ "You are ChatGPT. If search results are included in the prompt, use them explicitly. "
83
+ "Do not ignore them or hallucinate."
84
+ )
85
+ full_prompt = build_prompt(prompt, resume, jd, search_snippets)
86
+ try:
87
+ resp = openai_client.chat.completions.create(
88
+ model="gpt-4",
89
+ messages=[
90
+ {"role": "system", "content": system_instruction},
91
+ {"role": "user", "content": full_prompt}
92
+ ],
93
+ temperature=0.3
94
+ )
95
+ return resp.choices[0].message.content.strip()
96
+ except Exception as e:
97
+ return f"[GPT-4 Error] {e}"
98
+
99
+ def get_claude_response(prompt, resume=None, jd=None, search_snippets=None):
100
+ full_prompt = build_prompt(prompt, resume, jd, search_snippets)
101
+ try:
102
+ resp = anthropic_client.messages.create(
103
+ model="claude-3-opus-20240229",
104
+ max_tokens=1000,
105
+ temperature=0.3,
106
+ messages=[{"role": "user", "content": full_prompt}]
107
+ )
108
+ return resp.content[0].text.strip()
109
+ except Exception as e:
110
+ return f"[Claude Error] {e}"
111
+
112
+ def get_gemini_response(prompt, resume=None, jd=None, search_snippets=None):
113
+ full_prompt = build_prompt(prompt, resume, jd, search_snippets)
114
+ try:
115
+ model = genai.GenerativeModel("gemini-1.5-pro")
116
+ return model.generate_content(full_prompt).text.strip()
117
+ except Exception as e:
118
+ return f"[Gemini Error] {e}"
119
+
120
+ def universal_model_responses(prompt, resume=None, jd=None, selected_models=None):
121
+ if selected_models is None:
122
+ selected_models = ["GPT-4", "Claude 3", "Gemini 1.5"]
123
+
124
+ is_ats = bool(resume and jd)
125
+
126
+ # Use realtime search only if not ATS prompt
127
+ search_snippets = None
128
+ if not is_ats and detect_realtime_prompt(prompt):
129
+ try:
130
+ search_snippets = get_google_snippets(prompt)
131
+ except Exception as e:
132
+ print("[Search Fallback Error]", e)
133
+
134
+ results = {}
135
+
136
+ if "GPT-4" in selected_models:
137
+ gpt_resp = get_gpt4_response(prompt, resume, jd, search_snippets)
138
+ results["GPT-4"] = {
139
+ "response": gpt_resp,
140
+ "reasoning": ask_reasoning("GPT-4", gpt_resp, prompt),
141
+ "search_results": search_snippets,
142
+ "is_ats": is_ats
143
+ }
144
+
145
+ if "Claude 3" in selected_models:
146
+ claude_resp = get_claude_response(prompt, resume, jd, search_snippets)
147
+ results["Claude 3"] = {
148
+ "response": claude_resp,
149
+ "reasoning": ask_reasoning("Claude 3", claude_resp, prompt),
150
+ "search_results": search_snippets,
151
+ "is_ats": is_ats
152
+ }
153
+
154
+ if "Gemini 1.5" in selected_models:
155
+ gemini_resp = get_gemini_response(prompt, resume, jd, search_snippets)
156
+ results["Gemini 1.5"] = {
157
+ "response": gemini_resp,
158
+ "reasoning": ask_reasoning("Gemini 1.5", gemini_resp, prompt),
159
+ "search_results": search_snippets,
160
+ "is_ats": is_ats
161
+ }
162
+
163
+ return results