Spaces:

chunchu-08
/

LLM-Comparison-Hub

Sleeping

chunchu-08 commited on Jun 22, 2025

Commit

bd2e6df

1 Parent(s): de3d14a

refactor: Centralize core logic and unify visualizations

This major refactor introduces universal_model_wrapper.py to handle all LLM interactions, real-time detection, and prompt logic. The round-robin evaluation is now fully dynamic and adapts to model selection in the UI. All visualizations have been restyled for a consistent, professional appearance across all prompt types. The documentation has been updated to reflect this new, more robust architecture.

Files changed (6) hide show

README.md +55 -179
app.py +102 -47
information +82 -0
realtime_detector.py +0 -29
response_generator.py +22 -97
universal_model_wrapper.py +163 -0

README.md CHANGED Viewed

@@ -20,37 +20,70 @@ This application provides a complete LLM comparison and evaluation system that g
 ## Key Features
-- **Multi-Model Response Generation**: Generate responses from GPT-4, Claude 3, and Gemini 1.5
-- **Round-Robin Evaluation System**: Each model evaluates all other models for comprehensive comparison
-- **Real-time Query Detection**: Automatically detect and enhance real-time queries with Google search
-- **ATS Scoring**: Resume vs Job Description matching with detailed feedback
-- **Interactive Data Analysis & Visualization**: Generate interactive charts, heatmaps, and performance reports
-- **Batch Processing**: Handle multiple prompts from CSV files
-- **Modular Architecture**: Clean, production-ready code with separated concerns
-- **Gradio Web Interface**: User-friendly web UI for all features
-- **Export Capabilities**: ZIP bundles with all results and interactive visualizations
-- **Automated Deployment**: GitHub Actions for continuous deployment to Hugging Face Spaces
 ## Project Architecture
 ### Core Application Files
-- **`app.py`** - Main Gradio web interface (UI orchestration and deployment)
-- **`response_generator.py`** - Handles all LLM response generation and comparison
-- **`round_robin_evaluator.py`** - Comprehensive model evaluation system
-- **`llm_prompt_eval_analysis.py`** - Data analysis and visualization engine
-- **`llm_response_logger.py`** - Quick testing and logging tool
 ### Supporting Modules
-- **`realtime_detector.py`** - Detects real-time queries that need current information
-- **`search_fallback.py`** - Integrates Google search for real-time information enhancement
-### Configuration Files
-- **`requirements.txt`** - Python dependencies and versions
-- **`.env`** - API keys and configuration (create this file)
-- **`.github/workflows/deploy-to-hf.yml`** - GitHub Actions for automated deployment
 ## Installation
@@ -82,57 +115,6 @@ This application provides a complete LLM comparison and evaluation system that g
    GOOGLE_CSE_ID=your_google_cse_id_here
    ```
-## Usage
-### Web Interface (Recommended)
-Launch the Gradio web interface:
-```bash
-python app.py
-```
-The interface provides:
-- **Input Section**: Enter prompts, upload files, and configure options
-- **Results Tabs**: View responses, evaluations, search results, and interactive visualizations
-- **Export Options**: Download results as ZIP bundles with interactive HTML charts
-- **Real-time Features**: Automatic query detection and search enhancement
-### Standalone Tools
-Each module can be used independently for specific tasks:
-#### Response Generator
-```bash
-python response_generator.py
-```
-- Interactive mode for single prompts
-- Batch mode for multiple prompts from file
-- Side-by-side response comparison
-#### Round-Robin Evaluator
-```bash
-python round_robin_evaluator.py
-```
-- Test the evaluation system
-- View evaluation metrics and scores
-- Export results to CSV
-#### Analysis Tool
-```bash
-python llm_prompt_eval_analysis.py
-```
-- Analyze latest CSV results
-- Generate visualizations and charts
-- Create comprehensive performance reports
-#### Response Logger
-```bash
-python llm_response_logger.py
-```
-- Quick testing of all models
-- Batch testing from files
-- Rapid evaluation and logging
 ## API Requirements
 ### Required APIs
@@ -181,113 +163,7 @@ When a resume and job description are provided, the system performs ATS (Applica
 - `heatmap.html`, `radar.html`, `barchart.html` - Interactive visualization files
 - `bundle.zip` - Complete export package
-## Technical Architecture
-### Design Principles
-- **Separation of Concerns**: Each file has a specific responsibility
-- **Clean Code**: Production-ready without decorative elements
-- **Error Handling**: Comprehensive error handling and logging
-- **Reusable Components**: Modules can be used independently
-- **Configurable**: Easy to modify and extend
-- **Hugging Face Compatible**: No external browser dependencies for chart generation
-### Module Responsibilities
-| Module | Responsibility |
-|--------|---------------|
-| `app.py` | UI orchestration and deployment |
-| `response_generator.py` | LLM API calls and response collection |
-| `round_robin_evaluator.py` | Model evaluation and scoring |
-| `realtime_detector.py` | Real-time query detection |
-| `search_fallback.py` | Google search integration |
-| `llm_prompt_eval_analysis.py` | Data analysis and visualization |
-## Deployment
-### Automated Deployment with GitHub Actions
-The project includes automated deployment to Hugging Face Spaces using GitHub Actions:
-#### Setup Requirements
-1. **Hugging Face Access Token**:
-   - Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
-   - Create a new token with **Write** permissions
-   - Copy the token (starts with `hf_...`)
-2. **GitHub Repository Secrets**:
-   - Go to your GitHub repository Settings
-   - Navigate to Secrets and variables → Actions
-   - Add a new repository secret:
-     - **Name**: `HF_TOKEN`
-     - **Value**: Your Hugging Face token
-#### Deployment Workflow
-The `.github/workflows/deploy-to-hf.yml` file automatically:
-- Triggers on pushes to the main branch
-- Deploys changes to Hugging Face Spaces
-- Maintains continuous integration
-#### Usage
-After setup, simply push to GitHub:
-```bash
-git add .
-git commit -m "Update application"
-git push origin main
-```
-The GitHub Action will automatically deploy to Hugging Face Spaces.
-### Manual Deployment
-For local deployment, ensure all dependencies are installed and API keys are configured.
-## Error Handling
-The system includes comprehensive error handling:
-- **API Failures**: Graceful handling of API errors with fallback options
-- **Missing Keys**: Clear indication of missing API keys
-- **Network Issues**: Retry logic and connection management
-- **Data Validation**: Input validation and sanitization
-- **File Processing**: Robust handling of various file formats
 ## Development and Testing
 ### Testing Tools
-- **`test_standalone_tools.py`**: Demonstrates usage of all standalone tools
-- **Batch Testing**: Process multiple prompts efficiently
-- **Performance Monitoring**: Track evaluation metrics over time
-### Development Guidelines
-1. Follow the modular architecture
-2. Maintain clean, production-ready code
-3. Add proper error handling
-4. Update documentation for new features
-5. Test all modules independently
-## Contributing
-1. Follow the established modular architecture
-2. Maintain clean, production-ready code standards
-3. Add comprehensive error handling
-4. Update documentation for any new features
-5. Test all modules independently before submission
-## License
-This project is licensed under the MIT License - see the LICENSE file for details.
-## Support
-For issues and questions:
-1. Check the API key configuration in `.env`
-2. Verify all dependencies are installed correctly
-3. Review error messages in the console output
-4. Check the results directory for output files
-5. Consult the project documentation for detailed module descriptions
-## Live Application
-Access the live application at: [https://huggingface.co/spaces/chunchu-08/LLM-Comparison-Hub](https://huggingface.co/spaces/chunchu-08/LLM-Comparison-Hub) "<!-- trigger deploy -->"

 ## Key Features
+- **Multi-Model Response Generation**: Dynamically generate responses from any combination of GPT-4, Claude 3, and Gemini 1.5 using a simple model selector.
+- **Dynamic Round-Robin Evaluation**: A robust evaluation system where selected models evaluate each other. If a model is deselected, the evaluation logic adapts automatically.
+- **Real-time Query Detection**: Automatically detects if a prompt requires current information and fetches it using a Google search fallback.
+- **ATS Scoring**: Performs detailed resume vs. job description matching and scoring.
+- **Interactive Data Analysis & Visualization**: Generates consistent, professionally styled charts (Heatmap, Radar, Bar) for all prompt types.
+- **Batch Processing**: Handles multiple prompts from CSV files.
+- **Modular Architecture**: A clean, production-ready codebase with a new `universal_model_wrapper.py` that centralizes core logic.
+- **Gradio Web Interface**: A user-friendly web UI with a model selector to easily choose which LLMs to run.
+- **Export Capabilities**: Download a ZIP bundle with all evaluation results and interactive HTML charts.
+- **Automated Deployment**: GitHub Actions for continuous deployment to Hugging Face Spaces.
 ## Project Architecture
+The architecture has been refactored for simplicity and robustness.
 ### Core Application Files
+- **`app.py`** - Main Gradio web interface, including UI logic and the model selector.
+- **`universal_model_wrapper.py`** - **New core module!** Centralizes all LLM API calls, real-time detection, search fallback, and ATS/general prompt logic.
+- **`response_generator.py`** - A simplified wrapper that interfaces between the app and the `universal_model_wrapper`.
+- **`round_robin_evaluator.py`** - A dynamic evaluation engine that adapts to the models selected in the UI.
+- **`llm_prompt_eval_analysis.py`** - Data analysis and visualization engine.
+- **`llm_response_logger.py`** - Quick testing and logging tool.
 ### Supporting Modules
+- **`search_fallback.py`**: This file is kept for reference, but its core functionality has been integrated into `universal_model_wrapper.py` for a more robust, self-contained architecture.
+## Usage
+### Web Interface (Recommended)
+Launch the Gradio web interface:
+```bash
+python app.py
+```
+The interface provides:
+- **Input Section**: Enter prompts, upload files, and use the **Model Selector** checkboxes to choose which LLMs to run.
+- **Results Tabs**: View responses, evaluations, search results, and interactive visualizations.
+- **Export Options**: Download results as ZIP bundles with interactive HTML charts.
+- **Real-time Features**: Automatic query detection and search enhancement.
+### Model Selection
+The UI now includes a set of checkboxes allowing you to select any combination of models (GPT-4, Claude 3, Gemini 1.5) for a given query. The application, including the round-robin evaluation, will dynamically adapt to your selection.
+## Technical Architecture
+### Design Principles
+- **Centralized Logic**: The new `universal_model_wrapper.py` acts as a single source of truth for model interaction.
+- **Dynamic & Robust**: The evaluation system is no longer static and adapts to user input, preventing crashes when models are deselected.
+- **Separation of Concerns**: Each file has a clear, specific responsibility.
+- **Clean Code**: Production-ready and easy to maintain.
+- **Hugging Face Compatible**: No external browser dependencies for chart generation.
+### Module Responsibilities
+| Module | Responsibility |
+|--------|---------------|
+| `app.py` | UI orchestration, including the model selector and deployment. |
+| `universal_model_wrapper.py` | Handles all LLM calls, prompt logic, and search. |
+| `response_generator.py` | Connects the UI to the universal wrapper. |
+| `round_robin_evaluator.py` | Dynamically evaluates the currently selected models. |
+| `llm_prompt_eval_analysis.py` | Data analysis and visualization. |
 ## Installation
    GOOGLE_CSE_ID=your_google_cse_id_here
    ```
 ## API Requirements
 ### Required APIs
 - `heatmap.html`, `radar.html`, `barchart.html` - Interactive visualization files
 - `bundle.zip` - Complete export package
 ## Development and Testing
 ### Testing Tools
+- **`

app.py CHANGED Viewed

@@ -1,4 +1,4 @@
-# gradio_full_llm_eval.py – Final Updated Version with ATS Scoring and Visualized UI
 import gradio as gr
 import os
 import pandas as pd
@@ -12,8 +12,6 @@ from dotenv import load_dotenv
 from response_generator import generate_all_responses_with_reasoning
 from round_robin_evaluator import comprehensive_round_robin_evaluation
-from realtime_detector import is_realtime_prompt
-from search_fallback import get_google_snippets
 load_dotenv()
 pio.kaleido.scope.default_format = "png"
@@ -70,31 +68,72 @@ Return JSON:
         return {"ats_score": 50, "strengths": [], "gaps": [], "suggestions": ["Check formatting."]}
 def create_visualizations(df, results_dir):
-    image_files = []
     summary = df.groupby('target_model')[metrics].mean().reset_index()
-    heatmap = px.imshow(summary[metrics].values, x=metrics, y=summary['target_model'],
-                        labels=dict(x="Metric", y="Model", color="Score"),
-                        title="Heatmap: Metrics Across Models", color_continuous_scale='Viridis')
     heatmap_path = os.path.join(results_dir, "heatmap.html")
     heatmap.write_html(heatmap_path)
-    image_files.append(heatmap_path)
     radar = go.Figure()
     for _, row in summary.iterrows():
-        radar.add_trace(go.Scatterpolar(r=list(row[metrics]), theta=metrics, fill='toself', name=row['target_model']))
-    radar.update_layout(title="Radar Chart: Model Score Profiles", polar=dict(radialaxis=dict(visible=True, range=[0, 1])))
     radar_path = os.path.join(results_dir, "radar.html")
     radar.write_html(radar_path)
-    image_files.append(radar_path)
-    bar = px.bar(summary.melt(id_vars='target_model'), x='variable', y='value', color='target_model', barmode='group',
-                 title="Bar Chart: Metric Comparison")
     bar_path = os.path.join(results_dir, "barchart.html")
     bar.write_html(bar_path)
-    image_files.append(bar_path)
-    return (heatmap, radar, bar), image_files
 def format_ats_feedback(score, strengths, gaps, suggestions):
     color = "🟢" if score >= 75 else "🟡" if score >= 50 else "🔴"
@@ -114,8 +153,9 @@ def format_ats_feedback(score, strengths, gaps, suggestions):
 def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_file, model_selection):
     selected_models = [m for m, enabled in zip(["GPT-4", "Claude 3", "Gemini 1.5"], model_selection) if enabled]
     resume_text = ""
-    batch_mode = user_file and user_file.name.endswith(".csv")
-    resume_mode = user_file and user_file.name.lower().endswith(('.pdf', '.docx', '.txt'))
     prompts = [prompt]
     ats_summary_texts = []
@@ -131,38 +171,49 @@ def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_f
     zip_path, ats_table_markdown = None, ""
     for prompt_text in prompts:
-        search_results = get_google_snippets(prompt_text) if enable_realtime and is_realtime_prompt(prompt_text) else ""
-        final_prompt = f"{prompt_text}\n\nRecent info: {search_results}" if search_results else prompt_text
-        responses = generate_all_responses_with_reasoning(final_prompt, selected_models)
         ats_rows = []
         for model in responses:
             model_resp = responses[model]['response']
-            if resume_text:
-                ats_result = ats_score_advanced(model_resp, resume_text, prompt_text)
-                feedback = format_ats_feedback(ats_result['ats_score'], ats_result.get('strengths', []), ats_result.get('gaps', []), ats_result.get('suggestions', []))
-                responses[model]['ats_embed'] = f"###  Response\n\n{model_resp}\n\n---\n\n###  ATS Evaluation\n\n{feedback}"
-                ats_rows.append(f"| {model} | {ats_result['ats_score']} | {', '.join(ats_result.get('strengths', []))} | {', '.join(ats_result.get('suggestions', []))} |")
-            else:
-                responses[model]['ats_embed'] = f"###  Response\n\n{model_resp}\n\n---\n\n**Explainability:**\n{responses[model]['reasoning']}"
         if ats_rows:
             ats_table_markdown = "| Model | Score | Strengths | Suggestions |\n|-------|-------|-----------|-------------|\n" + "\n".join(ats_rows)
-        if enable_eval:
-            compact = {k: v['response'] for k, v in responses.items()}
-            eval_result = comprehensive_round_robin_evaluation(compact, final_prompt)
-            for model, data in eval_result.items():
-                for evaluator, scores in data['evaluations'].items():
-                    row = {
-                        'prompt': prompt_text,
-                        'target_model': model,
-                        'evaluator': evaluator,
-                        'response': responses[model]['response'],
-                        'explainability': responses[model]['reasoning']
-                    }
-                    row.update({k: scores.get(k, 0.5) for k in metrics})
-                    row.update({f"avg_{k}": data['average_scores'].get(k, 0.5) for k in metrics})
-                    all_rows.append(row)
     df_all = pd.DataFrame(all_rows)
     if not df_all.empty:
@@ -182,14 +233,18 @@ def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_f
             df_batch['ATS Summary'] = ats_summary_texts
             df_batch.to_csv(os.path.join(results_dir, "batch_prompts_output.csv"), index=False)
             zipf.write(os.path.join(results_dir, "batch_prompts_output.csv"), arcname="batch_prompts_output.csv")
     return tuple(
         responses[model].get('ats_embed', responses[model]['response']) for model in ["GPT-4", "Claude 3", "Gemini 1.5"]
     ) + (
         search_results or "N/A",
         *all_charts,
-        df_all[['target_model', 'evaluator'] + metrics] if not df_all.empty else pd.DataFrame(),
-        ats_table_markdown,
         zip_path
     )
@@ -222,7 +277,7 @@ This app compares LLM responses using round-robin evaluations, with real-time qu
                 model_selector = gr.CheckboxGroup(label="Select Models", choices=["GPT-4", "Claude 3", "Gemini 1.5"], value=["GPT-4", "Claude 3", "Gemini 1.5"])
                 enable_realtime = gr.Checkbox(label="Enable real-time detection", value=True)
                 enable_eval = gr.Checkbox(label="Enable evaluation", value=True)
-                enable_analysis = gr.Checkbox(label="Enable analysis", value=True)
                 submit = gr.Button("Run Evaluation")
             with gr.Column():

+# app.py – Final Updated Version with Unified Visualization (Model selection-safe + Visualization Fixes)
 import gradio as gr
 import os
 import pandas as pd
 from response_generator import generate_all_responses_with_reasoning
 from round_robin_evaluator import comprehensive_round_robin_evaluation
 load_dotenv()
 pio.kaleido.scope.default_format = "png"
         return {"ats_score": 50, "strengths": [], "gaps": [], "suggestions": ["Check formatting."]}
 def create_visualizations(df, results_dir):
+    html_files = []
     summary = df.groupby('target_model')[metrics].mean().reset_index()
+    font_style = dict(family="Arial, sans-serif", size=12, color="black")
+    # 1. Heatmap with professional styling
+    heatmap = px.imshow(
+        summary[metrics].values,
+        x=metrics,
+        y=summary['target_model'],
+        labels=dict(x="Metric", y="Model", color="Score"),
+        title="<b>Heatmap: Metrics Across Models</b>",
+        color_continuous_scale='Viridis'
+    )
+    heatmap.update_layout(
+        margin=dict(l=80, r=40, t=80, b=120),
+        xaxis_tickangle=-45,
+        title_font=dict(size=18, family="Arial, sans-serif"),
+        font=font_style
+    )
     heatmap_path = os.path.join(results_dir, "heatmap.html")
     heatmap.write_html(heatmap_path)
+    html_files.append(heatmap_path)
+    # 2. Radar Chart with professional styling
     radar = go.Figure()
     for _, row in summary.iterrows():
+        radar.add_trace(go.Scatterpolar(
+            r=list(row[metrics]),
+            theta=metrics,
+            fill='toself',
+            name=row['target_model']
+        ))
+    radar.update_layout(
+        title="<b>Radar Chart: Model Score Profiles</b>",
+        polar=dict(radialaxis=dict(visible=True, range=[0, 1])),
+        legend_title_text='Models',
+        title_font=dict(size=18, family="Arial, sans-serif"),
+        font=font_style,
+        margin=dict(l=60, r=60, t=80, b=80)
+    )
     radar_path = os.path.join(results_dir, "radar.html")
     radar.write_html(radar_path)
+    html_files.append(radar_path)
+    # 3. Bar Chart with professional styling
+    bar = px.bar(
+        summary.melt(id_vars='target_model'),
+        x='variable',
+        y='value',
+        color='target_model',
+        barmode='group',
+        title="<b>Bar Chart: Metric Comparison</b>",
+        labels={'variable': 'Metric', 'value': 'Score', 'target_model': 'Model'}
+    )
+    bar.update_layout(
+        margin=dict(l=60, r=20, t=80, b=120),
+        xaxis_tickangle=-45,
+        legend_title_text='Model',
+        title_font=dict(size=18, family="Arial, sans-serif"),
+        font=font_style
+    )
     bar_path = os.path.join(results_dir, "barchart.html")
     bar.write_html(bar_path)
+    html_files.append(bar_path)
+    return (heatmap, radar, bar), html_files
 def format_ats_feedback(score, strengths, gaps, suggestions):
     color = "🟢" if score >= 75 else "🟡" if score >= 50 else "🔴"
 def process_prompt(prompt, enable_realtime, enable_eval, enable_analysis, user_file, model_selection):
     selected_models = [m for m, enabled in zip(["GPT-4", "Claude 3", "Gemini 1.5"], model_selection) if enabled]
     resume_text = ""
+    job_description = prompt
+    batch_mode = user_file and hasattr(user_file, 'name') and user_file.name.endswith(".csv")
+    resume_mode = user_file and hasattr(user_file, 'name') and user_file.name.lower().endswith(('.pdf', '.docx', '.txt'))
     prompts = [prompt]
     ats_summary_texts = []
     zip_path, ats_table_markdown = None, ""
     for prompt_text in prompts:
+        responses = generate_all_responses_with_reasoning(
+            prompt_text,
+            selected_models,
+            resume_text if resume_mode else None,
+            job_description if resume_mode else None
+        )
+        if responses:
+            first_response = list(responses.values())[0]
+            search_results = first_response.get('search_results', '')
+            is_ats = first_response.get('is_ats', False)
         ats_rows = []
         for model in responses:
             model_resp = responses[model]['response']
+            model_reasoning = responses[model]['reasoning']
+            responses[model]['ats_embed'] = f"### Response\n\n{model_resp}\n\n---\n\n**Explainability:**\n{model_reasoning}"
+            if resume_mode and is_ats:
+                try:
+                    ats_result = ats_score_advanced(model_resp, resume_text, prompt_text)
+                    ats_rows.append(f"| {model} | {ats_result['ats_score']} | {', '.join(ats_result.get('strengths', []))} | {', '.join(ats_result.get('suggestions', []))} |")
+                except:
+                    ats_rows.append(f"| {model} | N/A | N/A | N/A |")
         if ats_rows:
             ats_table_markdown = "| Model | Score | Strengths | Suggestions |\n|-------|-------|-----------|-------------|\n" + "\n".join(ats_rows)
+        # Always run evaluation to generate chart data
+        compact = {k: v['response'] for k, v in responses.items()}
+        eval_result = comprehensive_round_robin_evaluation(compact, prompt_text)
+        for model, data in eval_result.items():
+            for evaluator, scores in data['evaluations'].items():
+                row = {
+                    'prompt': prompt_text,
+                    'target_model': model,
+                    'evaluator': evaluator,
+                    'response': responses[model]['response'],
+                    'explainability': responses[model]['reasoning']
+                }
+                row.update({k: scores.get(k, 0.5) for k in metrics})
+                row.update({f"avg_{k}": data['average_scores'].get(k, 0.5) for k in metrics})
+                all_rows.append(row)
     df_all = pd.DataFrame(all_rows)
     if not df_all.empty:
             df_batch['ATS Summary'] = ats_summary_texts
             df_batch.to_csv(os.path.join(results_dir, "batch_prompts_output.csv"), index=False)
             zipf.write(os.path.join(results_dir, "batch_prompts_output.csv"), arcname="batch_prompts_output.csv")
+    # Conditional UI updates
+    eval_table = df_all[['target_model', 'evaluator'] + metrics] if not df_all.empty and enable_eval else pd.DataFrame()
+    ats_md = ats_table_markdown if resume_mode else ""
     return tuple(
         responses[model].get('ats_embed', responses[model]['response']) for model in ["GPT-4", "Claude 3", "Gemini 1.5"]
     ) + (
         search_results or "N/A",
         *all_charts,
+        eval_table,
+        ats_md,
         zip_path
     )
                 model_selector = gr.CheckboxGroup(label="Select Models", choices=["GPT-4", "Claude 3", "Gemini 1.5"], value=["GPT-4", "Claude 3", "Gemini 1.5"])
                 enable_realtime = gr.Checkbox(label="Enable real-time detection", value=True)
                 enable_eval = gr.Checkbox(label="Enable evaluation", value=True)
+                enable_analysis = gr.Checkbox(label="Enable analysis (currently not used)", value=True)
                 submit = gr.Button("Run Evaluation")
             with gr.Column():

information ADDED Viewed

	@@ -0,0 +1,82 @@

+LLM-Compare-Hub Project File Structure and Use Cases
+====================================================
+Core Application Files
+---------------------
+app.py
+- Use Case: Main Gradio web interface and application entry point.
+- Function: Orchestrates the UI, handles user input (including model selection), and manages the data flow between all other modules.
+universal_model_wrapper.py
+- Use Case: The new, central engine for all LLM interactions.
+- Function: Contains the complete logic for handling API calls to GPT-4, Claude 3, and Gemini 1.5. It also includes self-contained functions for real-time prompt detection and Google search fallback. It intelligently determines whether a prompt is for a general query or an ATS evaluation.
+- Status: This is the core logic hub of the application.
+response_generator.py
+- Use Case: A lean interface between the UI (`app.py`) and the model logic.
+- Function: Takes requests from the UI and passes them to the `universal_model_wrapper.py`. It then returns the formatted responses back to the UI.
+- Status: Simplified to be a clean pass-through, improving modularity.
+round_robin_evaluator.py
+- Use Case: Dynamic, comprehensive model evaluation system.
+- Function: Each *selected* model evaluates its peers. The evaluation logic is now fully dynamic and adapts based on which models are selected in the UI, preventing crashes.
+- Status: The core evaluation engine.
+llm_prompt_eval_analysis.py
+- Use Case: Data analysis and visualization.
+- Function: Analyzes evaluation results and generates consistently styled, professional charts and reports.
+- Status: Standalone analysis tool + used by the Gradio app.
+llm_response_logger.py
+- Use Case: Quick testing and logging tool for developers.
+- Function: Allows for rapid testing of models with single or batch prompts.
+- Status: Standalone testing tool.
+Supporting Modules
+-----------------
+search_fallback.py
+- Use Case: Provides Google search functionality.
+- Status: Although the primary search logic has been moved into `universal_model_wrapper.py` for robustness, this file is kept in the project for reference or potential future use. It is not actively called by the main application flow.
+Configuration & Documentation
+----------------------------
+requirements.txt
+- Use Case: Python dependencies.
+- Function: Lists all required packages for the project to run.
+.env
+- Use Case: API key configuration.
+- Function: Securely stores all necessary API keys.
+.gitignore
+- Use Case: Git version control.
+- Function: Excludes sensitive files and unnecessary directories from the repository.
+README.md
+- Use Case: Main project documentation.
+- Function: Provides setup instructions, usage guides, and an overview of the architecture.
+Testing & Development
+--------------------
+test_standalone_tools.py
+- Use Case: Testing and demonstration.
+- Function: Shows how to use the standalone modules.
+- Status: Development/testing tool.
+Project Summary
+==============
+The project has been refactored into a more robust, modular architecture with the `universal_model_wrapper.py` at its core. This new structure centralizes the most complex logic, making the application easier to maintain and less prone to errors. The evaluation system is now fully dynamic, adapting to the user's model selection in the UI.
+Key Features:
+- **Dynamic Model Selection**: Choose any combination of models to run.
+- **Robust Round-Robin Evaluation**: The evaluation system adapts to your model selection.
+- **Centralized Logic**: The `universal_model_wrapper.py` handles all core model interactions.
+- **Consistent Visualizations**: All charts are now generated with the same professional styling.
+- **Self-Contained Search**: Real-time detection and search are handled within the core wrapper.
+- **Clean and Maintainable**: The architecture is simplified and easier to understand.

realtime_detector.py DELETED Viewed

@@ -1,29 +0,0 @@
-# realtime_detector.py
-import os
-from openai import OpenAI
-from dotenv import load_dotenv
-load_dotenv()
-client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-def is_realtime_prompt(prompt: str) -> bool:
-    try:
-        system_msg = "You are a classifier that determines whether a user's question requires real-time information or not. Answer 'yes' or 'no'."
-        user_msg = f"Question: {prompt}\nAnswer with yes or no:"
-        response = client.chat.completions.create(
-            model="gpt-3.5-turbo",
-            messages=[
-                {"role": "system", "content": system_msg},
-                {"role": "user", "content": user_msg}
-            ],
-            temperature=0
-        )
-        reply = response.choices[0].message.content.strip().lower()
-        return "yes" in reply
-    except Exception as e:
-        print("[RealTime Detector Error]", e)
-        return False

response_generator.py CHANGED Viewed

@@ -1,103 +1,28 @@
 import os
 from dotenv import load_dotenv
-from openai import OpenAI
-import anthropic
-import google.generativeai as genai
 # Load API keys from .env
 load_dotenv()
-openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-anthropic_client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))
-genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
-def get_gpt4_response(prompt):
-    try:
-        if "Recent info:" in prompt:
-            user_prompt, realtime_info = prompt.split("Recent info:", 1)
-            messages = [
-                {
-                    "role": "system",
-                    "content": (
-                        "You are an expert ATS evaluator. You are comparing a job description (JD) and a resume to produce an ATS score. "
-                        "Highlight matches, gaps, suggestions for improvement, and an overall score."
-                    )
-                },
-                {"role": "user", "content": user_prompt.strip()},
-                {
-                    "role": "user",
-                    "content": (
-                        f"Here is some recent real-time context for your reference:\n\n{realtime_info.strip()}\n\n"
-                        "Based on this, tailor your response as if the data is accurate."
-                    )
-                }
-            ]
-        else:
-            messages = [
-                {
-                    "role": "system",
-                    "content": (
-                        "You are an expert ATS evaluator. You are comparing a job description (JD) and a resume to produce an ATS score. "
-                        "Highlight matches, gaps, suggestions for improvement, and an overall score."
-                    )
-                },
-                {"role": "user", "content": prompt}
-            ]
-        response = openai_client.chat.completions.create(
-            model="gpt-4",
-            messages=messages,
-            temperature=0.7
-        )
-        return response.choices[0].message.content
-    except Exception as e:
-        print(f"Error with GPT-4: {e}")
-        return "GPT-4 failed."
-def get_claude_response(prompt):
-    try:
-        response = anthropic_client.messages.create(
-            model="claude-3-opus-20240229",
-            max_tokens=1000,
-            temperature=0.7,
-            messages=[{"role": "user", "content": prompt}]
-        )
-        return response.content[0].text
-    except Exception as e:
-        print(f"Error with Claude 3: {e}")
-        return "Claude 3 failed."
-def get_gemini_response(prompt):
-    try:
-        model = genai.GenerativeModel("gemini-1.5-pro")
-        response = model.generate_content(prompt)
-        return response.text
-    except Exception as e:
-        print(f"Error with Gemini: {e}")
-        return "Gemini 1.5 failed."
-def generate_all_responses_with_reasoning(prompt, selected_models=None):
-    all_models = {
-        "GPT-4": get_gpt4_response,
-        "Claude 3": get_claude_response,
-        "Gemini 1.5": get_gemini_response
-    }
-    models_to_use = selected_models if selected_models else list(all_models.keys())
-    responses = {}
-    for model_name in models_to_use:
-        fetch_fn = all_models[model_name]
-        try:
-            response = fetch_fn(prompt)
-            reason_prompt = (
-                f"Why did you generate this response to the prompt:\n\n"
-                f"\"{prompt}\"\n\n"
-                f"Your Response:\n\"{response}\"\n\n"
-                "Explain your reasoning behind structuring or phrasing it that way."
-            )
-            reasoning = fetch_fn(reason_prompt)
-            responses[model_name] = {"response": response, "reasoning": reasoning}
-        except Exception as e:
-            responses[model_name] = {"response": "Failed", "reasoning": str(e)}
-    return responses

 import os
 from dotenv import load_dotenv
+from universal_model_wrapper import universal_model_responses
 # Load API keys from .env
 load_dotenv()
+def generate_all_responses(prompt, resume=None, job_description=None):
+    return universal_model_responses(prompt, resume, job_description)
+def generate_all_responses_with_reasoning(prompt, selected_models=None, resume=None, job_description=None):
+    """
+    Generate responses from all selected models with reasoning.
+    Uses the universal model wrapper for enhanced functionality.
+    """
+    # Get responses from the universal wrapper
+    all_responses = universal_model_responses(prompt, resume, job_description)
+    # Filter by selected models if specified
+    if selected_models:
+        filtered_responses = {
+            model: data
+            for model, data in all_responses.items()
+            if model in selected_models
+        }
+        return filtered_responses
+    return all_responses

universal_model_wrapper.py ADDED Viewed

	@@ -0,0 +1,163 @@

+import os
+from dotenv import load_dotenv
+from openai import OpenAI
+import anthropic
+import google.generativeai as genai
+from search_fallback import get_google_snippets
+load_dotenv()
+openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+anthropic_client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))
+genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
+def detect_realtime_prompt(prompt: str) -> bool:
+    try:
+        resp = openai_client.chat.completions.create(
+            model="gpt-4",
+            messages=[
+                {"role": "system", "content": "Respond only with 'True' or 'False'. Does the following prompt require real-time or up-to-date information to answer accurately?"},
+                {"role": "user", "content": prompt}
+            ],
+            temperature=0
+        )
+        return "true" in resp.choices[0].message.content.strip().lower()
+    except Exception as e:
+        print("Realtime detection failed:", e)
+        return False
+def build_prompt(prompt, resume=None, jd=None, search_snippets=None):
+    if resume and jd:
+        return f"""### Task: Evaluate ATS Fit
+Resume:
+{resume}
+Job Description:
+{jd}
+Instructions:
+- Provide an ATS match score out of 100
+- Justify the score with 3 bullet points
+- Highlight missing skills, if any
+"""
+    if search_snippets:
+        return f"{prompt}\n\n[Latest Info Retrieved via Google Search]\n{search_snippets}"
+    return prompt
+def ask_reasoning(model_name, response, prompt):
+    follow_up = f"""You answered:
+{response}
+Now explain why you gave this answer to the prompt: "{prompt}".
+Respond in 2-3 bullet points."""
+    try:
+        if model_name == "GPT-4":
+            resp = openai_client.chat.completions.create(
+                model="gpt-4",
+                messages=[
+                    {"role": "system", "content": "You're an AI that explains why a given answer was provided."},
+                    {"role": "user", "content": follow_up}
+                ],
+                temperature=0
+            )
+            return resp.choices[0].message.content.strip()
+        elif model_name == "Claude 3":
+            resp = anthropic_client.messages.create(
+                model="claude-3-opus-20240229",
+                max_tokens=500,
+                temperature=0,
+                messages=[{"role": "user", "content": follow_up}]
+            )
+            return resp.content[0].text.strip()
+        elif model_name == "Gemini 1.5":
+            model = genai.GenerativeModel("gemini-1.5-pro")
+            return model.generate_content(follow_up).text.strip()
+    except Exception as e:
+        return f"[Reasoning Error] {e}"
+def get_gpt4_response(prompt, resume=None, jd=None, search_snippets=None):
+    system_instruction = (
+        "You are ChatGPT. If search results are included in the prompt, use them explicitly. "
+        "Do not ignore them or hallucinate."
+    )
+    full_prompt = build_prompt(prompt, resume, jd, search_snippets)
+    try:
+        resp = openai_client.chat.completions.create(
+            model="gpt-4",
+            messages=[
+                {"role": "system", "content": system_instruction},
+                {"role": "user", "content": full_prompt}
+            ],
+            temperature=0.3
+        )
+        return resp.choices[0].message.content.strip()
+    except Exception as e:
+        return f"[GPT-4 Error] {e}"
+def get_claude_response(prompt, resume=None, jd=None, search_snippets=None):
+    full_prompt = build_prompt(prompt, resume, jd, search_snippets)
+    try:
+        resp = anthropic_client.messages.create(
+            model="claude-3-opus-20240229",
+            max_tokens=1000,
+            temperature=0.3,
+            messages=[{"role": "user", "content": full_prompt}]
+        )
+        return resp.content[0].text.strip()
+    except Exception as e:
+        return f"[Claude Error] {e}"
+def get_gemini_response(prompt, resume=None, jd=None, search_snippets=None):
+    full_prompt = build_prompt(prompt, resume, jd, search_snippets)
+    try:
+        model = genai.GenerativeModel("gemini-1.5-pro")
+        return model.generate_content(full_prompt).text.strip()
+    except Exception as e:
+        return f"[Gemini Error] {e}"
+def universal_model_responses(prompt, resume=None, jd=None, selected_models=None):
+    if selected_models is None:
+        selected_models = ["GPT-4", "Claude 3", "Gemini 1.5"]
+    is_ats = bool(resume and jd)
+    # Use realtime search only if not ATS prompt
+    search_snippets = None
+    if not is_ats and detect_realtime_prompt(prompt):
+        try:
+            search_snippets = get_google_snippets(prompt)
+        except Exception as e:
+            print("[Search Fallback Error]", e)
+    results = {}
+    if "GPT-4" in selected_models:
+        gpt_resp = get_gpt4_response(prompt, resume, jd, search_snippets)
+        results["GPT-4"] = {
+            "response": gpt_resp,
+            "reasoning": ask_reasoning("GPT-4", gpt_resp, prompt),
+            "search_results": search_snippets,
+            "is_ats": is_ats
+        }
+    if "Claude 3" in selected_models:
+        claude_resp = get_claude_response(prompt, resume, jd, search_snippets)
+        results["Claude 3"] = {
+            "response": claude_resp,
+            "reasoning": ask_reasoning("Claude 3", claude_resp, prompt),
+            "search_results": search_snippets,
+            "is_ats": is_ats
+        }
+    if "Gemini 1.5" in selected_models:
+        gemini_resp = get_gemini_response(prompt, resume, jd, search_snippets)
+        results["Gemini 1.5"] = {
+            "response": gemini_resp,
+            "reasoning": ask_reasoning("Gemini 1.5", gemini_resp, prompt),
+            "search_results": search_snippets,
+            "is_ats": is_ats
+        }
+    return results