Spaces:

dheeraxspide
/

shl-rec

Sleeping

App Files Files Community

dheeraxspide commited on Dec 18, 2025

Commit

ff0df44

1 Parent(s): df4b24a

Deploy to Hugging Face Spaces (Clean)

Browse files

Files changed (6) hide show

.gitignore +11 -0
deployment_guide.md +0 -39
project_report.txt +0 -69
requirements.txt +1 -0
shl_recommender/src/app.py +47 -0
submission_document.txt +0 -50

.gitignore CHANGED Viewed

@@ -39,3 +39,14 @@ ENV/
 # Large Data Files (Generated on startup)
 shl_recommender/data/*.index
 shl_recommender/data/*.pkl

 # Large Data Files (Generated on startup)
 shl_recommender/data/*.index
 shl_recommender/data/*.pkl
+# Local Documentation (Fluff)
+project_report.txt
+submission_document.txt
+deployment_guide.md
+# Assignment Files & Scripts (Fluff)
+Gen_AI Dataset.xlsx
+SHL AI Intern RE Generative AI assignment.pdf
+assignment_text.txt
+push_to_hf.sh

deployment_guide.md DELETED Viewed

@@ -1,39 +0,0 @@
-# Deployment Guide: Hugging Face Spaces (Recommended)
-Since your application uses `sentence-transformers` and `faiss`, it requires more RAM than most standard free tiers (like Render/Vercel) provide. **Hugging Face Spaces** offers 16GB RAM for free, which is perfect for this.
-## Step 1: Create a Hugging Face Space
-1.  Go to [huggingface.co/spaces](https://huggingface.co/spaces) and create a new Space.
-2.  **Space Name**: `shl-assessment-api` (or similar).
-3.  **License**: MIT (or whatever you prefer).
-4.  **SDK**: Select **Docker**.
-5.  **Visibility**: Public.
-## Step 2: Upload Files
-You can upload files directly via the browser or use git.
-The easiest way if you have the code locally is to just upload these specific files:
-1.  `Dockerfile` (I just created this for you)
-2.  `requirements.txt`
-3.  `shl_recommender/` (The whole folder)
-4.  `submission.csv` (Optional, but good to have)
-## Step 3: Set Secrets (Environment Variables)
-1.  Go to the **Settings** tab of your Space.
-2.  Scroll to **Variables and secrets**.
-3.  Click **New Secret**.
-4.  Name: `GOOGLE_API_KEY`
-5.  Value: (Paste your Gemini API Key from your `.env` file)
-## Step 4: Access Your API
-Once the space builds (it takes a few minutes), your API will be live!
-- **API URL**: `https://<your-username>-<space-name>.hf.space/recommend`
-- **Swagger UI**: `https://<your-username>-<space-name>.hf.space/docs`
-## Alternative: Render.com (Might be tight on RAM)
-1.  Push your code to GitHub.
-2.  Create a new **Web Service** on Render.
-3.  Connect your repo.
-4.  Runtime: **Docker**.
-5.  Add Environment Variable `GOOGLE_API_KEY`.
-6.  *Note: Render's free tier has 512MB RAM. If your app crashes with "Out of Memory", switch to Hugging Face.*

project_report.txt DELETED Viewed

@@ -1,69 +0,0 @@
-SHL Assessment Recommendation System - Project Report
-1. Approach Overview
---------------------
-The goal was to build an intelligent recommendation system that maps natural language queries (job descriptions or role requirements) to specific SHL assessments. The solution is built using a Retrieval-Augmented Generation (RAG) pipeline with a hybrid search strategy.
-1.1 Data Pipeline
-- Ingestion: The SHL assessment catalog was scraped/ingested into a structured format (JSON).
-- Indexing:
-    - Semantic Index: Used `sentence-transformers/all-mpnet-base-v2` to create dense vector embeddings of assessment descriptions and metadata. These are stored in a FAISS index for efficient semantic retrieval.
-    - Keyword Index: Created a BM25 index using `rank_bm25` on tokenized assessment names and descriptions to capture exact keyword matches (crucial for specific tool names like "Excel" or "Java").
-1.2 Retrieval Engine (Hybrid Search)
-- Query Expansion: An LLM (Gemini 2.5 Flash via LangChain) is used to expand the user's query. It adds relevant synonyms and related skills (e.g., "Java" -> "Spring, Hibernate, OOP") to improve retrieval recall.
-- Hybrid Retrieval: The system performs two parallel searches:
-    1. Semantic Search (FAISS): Finds conceptually similar assessments.
-    2. Keyword Search (BM25): Finds exact matches for tools and technologies.
-- Fusion: Results are combined using Reciprocal Rank Fusion (RRF) to prioritize candidates that appear in both lists or rank highly in one.
-1.3 Reranking & Selection
-- The top candidates (e.g., 20) from the retrieval stage are passed to the LLM for reranking.
-- The LLM is provided with the full metadata (name, duration, description) and a strict prompt to select the best 5-10 assessments.
-- Logic: The prompt enforces client-specific priorities, such as preferring specific tool tests (e.g., "Microsoft Excel") over general role assessments when specific skills are mentioned.
-2. Problems Faced & Solutions
------------------------------
-2.1 Problem: Low Recall on Specific Tools
-- Issue: Initial semantic search often returned general "Software Engineer" assessments for queries asking specifically for "Java" or "Excel", missing the exact tool tests.
-- Solution: Implemented Hybrid Search with BM25. BM25 is excellent at exact keyword matching, ensuring that if "Excel" is in the query, "Microsoft Excel" assessments are retrieved even if their semantic vector is slightly different from the query's intent.
-2.2 Problem: LLM Output Formatting Errors
-- Issue: The LLM used for reranking occasionally returned indices as strings (e.g., `["1", "2"]`) instead of integers, or included markdown formatting that broke the JSON parser. This caused the inference script to crash.
-- Solution: Added robust error handling and type casting in `engine.py`. The code now sanitizes the LLM output (removing markdown backticks) and explicitly casts indices to integers before using them.
-2.3 Problem: Balancing Hard and Soft Skills
-- Issue: Queries often require a mix of technical skills (Hard Skills) and behavioral traits (Soft Skills). Simple retrieval might bias towards one.
-- Solution:
-    1. Query Expansion: The LLM explicitly expands the query to include terms for both.
-    2. Reranking Prompt: The system prompt explicitly instructs the LLM to "balance" the recommendations if the query implies multiple domains, ensuring a mix of "Knowledge & Skills" and "Personality & Behavior" tests.
-2.4 Problem: "Vibe-Coding" vs. Reliability
-- Issue: Relying solely on the LLM to "hallucinate" recommendations from memory is unreliable.
-- Solution: The system is grounded in the retrieved context. The LLM *only* selects from the retrieved candidates and does not invent new assessments. This ensures all recommendations are valid URLs from the catalog.
-3. Optimization Efforts
------------------------
-- Reciprocal Rank Fusion (RRF): Tuned the `k` parameter in RRF to balance the influence of keyword vs. semantic scores.
-- Prompt Engineering: Iterated on the Reranking prompt to strictly enforce the "minimum 5, maximum 10" constraint and to prioritize "Individual Test Solutions" over "Pre-packaged Solutions" as per requirements.
-- Performance: Used `faiss-cpu` for fast vector similarity search, ensuring the API responds within reasonable time limits.
-4. Repository Structure for Submission
---------------------------------------
-- `shl_recommender/`: Core package containing source code and data.
-    - `src/engine.py`: Main logic for the recommendation engine.
-    - `src/app.py`: FastAPI application serving the endpoints.
-    - `data/`: Contains the FAISS index, metadata pickle, and raw JSON.
-- `submission.csv`: The generated predictions for the test set.
-- `requirements.txt`: List of dependencies.
-- `project_report.txt`: This document.
-- `README.md`: Instructions for running the code.
-5. How to Run
--------------
-1. Install dependencies: `pip install -r requirements.txt`
-2. Set up `.env` with `GOOGLE_API_KEY`.
-3. Run the API: `python shl_recommender/src/app.py`
-4. Run inference: `python generate_submission.py`

requirements.txt CHANGED Viewed

@@ -11,3 +11,4 @@ beautifulsoup4
 pandas
 openpyxl
 rank_bm25

 pandas
 openpyxl
 rank_bm25
+gradio

shl_recommender/src/app.py CHANGED Viewed

@@ -1,5 +1,6 @@
 import os
 import requests
 from bs4 import BeautifulSoup
 from fastapi import FastAPI, HTTPException
 from pydantic import BaseModel
@@ -60,6 +61,52 @@ def recommend(request: RecommendRequest):
     return response
 if __name__ == "__main__":
     import uvicorn
     uvicorn.run(app, host="0.0.0.0", port=8002)

 import os
 import requests
+import gradio as gr
 from bs4 import BeautifulSoup
 from fastapi import FastAPI, HTTPException
 from pydantic import BaseModel
     return response
+# Gradio Interface Logic
+def gradio_recommend(query, url):
+    if not query and not url:
+        return "Please provide a query or a URL."
+    # Reuse the logic from the API
+    query_text = query
+    if url:
+        scraped_text = scrape_url(url)
+        if scraped_text:
+            if query_text:
+                query_text += f"\n\nContext from URL:\n{scraped_text}"
+            else:
+                query_text = scraped_text
+    if not query_text:
+        return "Could not extract text from URL."
+    results = engine.recommend(query_text, top_n=10)
+    # Format for display
+    output = []
+    for item in results:
+        output.append([item['name'], item['url'], ", ".join(item['test_type'])])
+    return output
+# Create Gradio Interface
+iface = gr.Interface(
+    fn=gradio_recommend,
+    inputs=[
+        gr.Textbox(lines=2, placeholder="Enter your job description or query here...", label="Query"),
+        gr.Textbox(placeholder="Optional: Paste a job posting URL", label="Job Posting URL")
+    ],
+    outputs=gr.Dataframe(headers=["Assessment Name", "URL", "Type"], label="Recommendations"),
+    title="SHL Assessment Recommender",
+    description="Enter a job description or URL to get SHL assessment recommendations.",
+    examples=[
+        ["Looking for a Java developer with good communication skills", ""],
+        ["", "https://www.shl.com/careers/"]
+    ]
+)
+# Mount Gradio App
+app = gr.mount_gradio_app(app, iface, path="/")
 if __name__ == "__main__":
     import uvicorn
     uvicorn.run(app, host="0.0.0.0", port=8002)

submission_document.txt DELETED Viewed

@@ -1,50 +0,0 @@
-SHL Assessment Recommendation System - Approach & Optimization Report
-1. Problem Statement & Solution Overview
-The objective was to build an intelligent recommendation system that maps natural language queries (job descriptions or role requirements) to relevant SHL assessments. The core challenge lies in bridging the semantic gap between high-level job descriptions (e.g., "hiring a Java dev") and specific assessment titles (e.g., "Java 8 - Knowledge & Skills").
-My solution implements a Retrieval-Augmented Generation (RAG) pipeline with a Hybrid Search strategy, leveraging both semantic understanding and exact keyword matching, followed by an LLM-based reranking step.
-2. Technical Approach
-2.1 Data Ingestion & Indexing
-- The SHL assessment catalog was ingested and structured into a JSON format.
-- Semantic Index: I used `sentence-transformers/all-mpnet-base-v2` to generate dense vector embeddings for each assessment's description and metadata. These are stored in a FAISS index for efficient similarity search.
-- Keyword Index: A BM25 index was created using `rank_bm25` to capture exact keyword matches. This is critical for distinguishing between specific tools (e.g., "Excel" vs. "PowerPoint") which might be semantically close but functionally distinct.
-2.2 Retrieval Engine (Hybrid Search)
-- Query Expansion: The user query is first processed by an LLM (Gemini 2.5 Flash) to expand it with relevant synonyms and related skills (e.g., "Frontend" -> "HTML, CSS, React, JavaScript").
-- Hybrid Retrieval: The system executes two parallel searches:
-    1. Semantic Search (FAISS): Retrieves assessments that are conceptually similar to the query.
-    2. Keyword Search (BM25): Retrieves assessments that contain exact matches for tools mentioned in the query.
-- Fusion: Results are combined using Reciprocal Rank Fusion (RRF) to produce a robust candidate list.
-2.3 Reranking & Selection
-- The top 20 candidates from the retrieval stage are passed to the LLM for reranking.
-- The LLM is provided with full metadata (name, duration, description) and a strict prompt to select the best 5-10 assessments based on client-specific priorities (e.g., prioritizing "Individual Test Solutions" over pre-packaged ones).
-3. Optimization Efforts & Performance Improvement
-3.1 Optimization Journey & Recall Improvements
-- Phase 1: Baseline Semantic Search (FAISS + LLM)
-    - Metric: Mean Recall@10 = 0.20
-    - Analysis: The initial approach used simple semantic search followed by LLM generation. The low recall was primarily due to **Ground Truth Alignment** issues. The semantic model retrieved conceptually similar assessments, but often missed the specific URLs defined in the ground truth, which required exact keyword matches.
-- Phase 2: LLM Filtering + FAISS + LLM Reranking
-    - Metric: Mean Recall@10 = 0.33
-    - Action: Introduced an LLM-based filtering step and a dedicated LLM reranking stage to refine the results.
-    - Result: The score improved to 0.33. However, analysis revealed that the bottleneck was not the presence of irrelevant "pre-packaged" solutions, but rather the strictness of the **Ground Truth**. The system was recommending valid, relevant assessments, but they were penalized because they didn't match the exact arbitrary URLs labeled in the dataset.
-- Phase 3: Query Expansion + Hybrid Search (FAISS + BM25) + LLM Reranking
-    - Metric: Mean Recall@10 = 0.41
-    - Action: Implemented the full pipeline: 1) LLM-based Query Expansion to bridge vocabulary gaps, 2) Hybrid Search (FAISS + BM25) to capture both semantic meaning and exact tool names, and 3) Final LLM Reranking.
-    - Result: This combination provided the best performance. BM25 fixed the exact match issues (e.g., "Excel"), Query Expansion handled synonyms, and Reranking ensured the final top 10 were highly relevant and balanced.
-    - **Critical Observation**: It was noted that the provided Ground Truth dataset contained approximately 11 "Pre-packaged Job Solution" URLs. This contradicts the assignment instruction to explicitly ignore/filter out pre-packaged solutions. Since our system correctly followed the instruction to filter these out, it naturally resulted in a "miss" against the ground truth for those specific entries, artificially capping the maximum possible recall score.
-3.2 Solving Specific Challenges
-- "Vibe-Coding" vs. Reliability: To prevent hallucination, the LLM is strictly constrained to select *only* from the retrieved candidates. It cannot invent new URLs.
-- Balancing Hard & Soft Skills: The reranking prompt was engineered to explicitly look for a mix of "Knowledge & Skills" and "Personality & Behavior" tests when the query implies a multi-faceted role (e.g., "Manager" or "Team Lead").
-4. Conclusion
-The final system is a robust, low-latency recommendation engine that effectively combines the precision of keyword search with the understanding of semantic search. The addition of LLM-based query expansion and reranking ensures that the recommendations are not just textually similar, but contextually relevant to the hiring need.