dheeraxspide commited on
Commit
ff0df44
·
1 Parent(s): df4b24a

Deploy to Hugging Face Spaces (Clean)

Browse files
.gitignore CHANGED
@@ -39,3 +39,14 @@ ENV/
39
  # Large Data Files (Generated on startup)
40
  shl_recommender/data/*.index
41
  shl_recommender/data/*.pkl
 
 
 
 
 
 
 
 
 
 
 
 
39
  # Large Data Files (Generated on startup)
40
  shl_recommender/data/*.index
41
  shl_recommender/data/*.pkl
42
+
43
+ # Local Documentation (Fluff)
44
+ project_report.txt
45
+ submission_document.txt
46
+ deployment_guide.md
47
+
48
+ # Assignment Files & Scripts (Fluff)
49
+ Gen_AI Dataset.xlsx
50
+ SHL AI Intern RE Generative AI assignment.pdf
51
+ assignment_text.txt
52
+ push_to_hf.sh
deployment_guide.md DELETED
@@ -1,39 +0,0 @@
1
-
2
- # Deployment Guide: Hugging Face Spaces (Recommended)
3
-
4
- Since your application uses `sentence-transformers` and `faiss`, it requires more RAM than most standard free tiers (like Render/Vercel) provide. **Hugging Face Spaces** offers 16GB RAM for free, which is perfect for this.
5
-
6
- ## Step 1: Create a Hugging Face Space
7
- 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) and create a new Space.
8
- 2. **Space Name**: `shl-assessment-api` (or similar).
9
- 3. **License**: MIT (or whatever you prefer).
10
- 4. **SDK**: Select **Docker**.
11
- 5. **Visibility**: Public.
12
-
13
- ## Step 2: Upload Files
14
- You can upload files directly via the browser or use git.
15
- The easiest way if you have the code locally is to just upload these specific files:
16
- 1. `Dockerfile` (I just created this for you)
17
- 2. `requirements.txt`
18
- 3. `shl_recommender/` (The whole folder)
19
- 4. `submission.csv` (Optional, but good to have)
20
-
21
- ## Step 3: Set Secrets (Environment Variables)
22
- 1. Go to the **Settings** tab of your Space.
23
- 2. Scroll to **Variables and secrets**.
24
- 3. Click **New Secret**.
25
- 4. Name: `GOOGLE_API_KEY`
26
- 5. Value: (Paste your Gemini API Key from your `.env` file)
27
-
28
- ## Step 4: Access Your API
29
- Once the space builds (it takes a few minutes), your API will be live!
30
- - **API URL**: `https://<your-username>-<space-name>.hf.space/recommend`
31
- - **Swagger UI**: `https://<your-username>-<space-name>.hf.space/docs`
32
-
33
- ## Alternative: Render.com (Might be tight on RAM)
34
- 1. Push your code to GitHub.
35
- 2. Create a new **Web Service** on Render.
36
- 3. Connect your repo.
37
- 4. Runtime: **Docker**.
38
- 5. Add Environment Variable `GOOGLE_API_KEY`.
39
- 6. *Note: Render's free tier has 512MB RAM. If your app crashes with "Out of Memory", switch to Hugging Face.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
project_report.txt DELETED
@@ -1,69 +0,0 @@
1
-
2
- SHL Assessment Recommendation System - Project Report
3
-
4
- 1. Approach Overview
5
- --------------------
6
- The goal was to build an intelligent recommendation system that maps natural language queries (job descriptions or role requirements) to specific SHL assessments. The solution is built using a Retrieval-Augmented Generation (RAG) pipeline with a hybrid search strategy.
7
-
8
- 1.1 Data Pipeline
9
- - Ingestion: The SHL assessment catalog was scraped/ingested into a structured format (JSON).
10
- - Indexing:
11
- - Semantic Index: Used `sentence-transformers/all-mpnet-base-v2` to create dense vector embeddings of assessment descriptions and metadata. These are stored in a FAISS index for efficient semantic retrieval.
12
- - Keyword Index: Created a BM25 index using `rank_bm25` on tokenized assessment names and descriptions to capture exact keyword matches (crucial for specific tool names like "Excel" or "Java").
13
-
14
- 1.2 Retrieval Engine (Hybrid Search)
15
- - Query Expansion: An LLM (Gemini 2.5 Flash via LangChain) is used to expand the user's query. It adds relevant synonyms and related skills (e.g., "Java" -> "Spring, Hibernate, OOP") to improve retrieval recall.
16
- - Hybrid Retrieval: The system performs two parallel searches:
17
- 1. Semantic Search (FAISS): Finds conceptually similar assessments.
18
- 2. Keyword Search (BM25): Finds exact matches for tools and technologies.
19
- - Fusion: Results are combined using Reciprocal Rank Fusion (RRF) to prioritize candidates that appear in both lists or rank highly in one.
20
-
21
- 1.3 Reranking & Selection
22
- - The top candidates (e.g., 20) from the retrieval stage are passed to the LLM for reranking.
23
- - The LLM is provided with the full metadata (name, duration, description) and a strict prompt to select the best 5-10 assessments.
24
- - Logic: The prompt enforces client-specific priorities, such as preferring specific tool tests (e.g., "Microsoft Excel") over general role assessments when specific skills are mentioned.
25
-
26
- 2. Problems Faced & Solutions
27
- -----------------------------
28
-
29
- 2.1 Problem: Low Recall on Specific Tools
30
- - Issue: Initial semantic search often returned general "Software Engineer" assessments for queries asking specifically for "Java" or "Excel", missing the exact tool tests.
31
- - Solution: Implemented Hybrid Search with BM25. BM25 is excellent at exact keyword matching, ensuring that if "Excel" is in the query, "Microsoft Excel" assessments are retrieved even if their semantic vector is slightly different from the query's intent.
32
-
33
- 2.2 Problem: LLM Output Formatting Errors
34
- - Issue: The LLM used for reranking occasionally returned indices as strings (e.g., `["1", "2"]`) instead of integers, or included markdown formatting that broke the JSON parser. This caused the inference script to crash.
35
- - Solution: Added robust error handling and type casting in `engine.py`. The code now sanitizes the LLM output (removing markdown backticks) and explicitly casts indices to integers before using them.
36
-
37
- 2.3 Problem: Balancing Hard and Soft Skills
38
- - Issue: Queries often require a mix of technical skills (Hard Skills) and behavioral traits (Soft Skills). Simple retrieval might bias towards one.
39
- - Solution:
40
- 1. Query Expansion: The LLM explicitly expands the query to include terms for both.
41
- 2. Reranking Prompt: The system prompt explicitly instructs the LLM to "balance" the recommendations if the query implies multiple domains, ensuring a mix of "Knowledge & Skills" and "Personality & Behavior" tests.
42
-
43
- 2.4 Problem: "Vibe-Coding" vs. Reliability
44
- - Issue: Relying solely on the LLM to "hallucinate" recommendations from memory is unreliable.
45
- - Solution: The system is grounded in the retrieved context. The LLM *only* selects from the retrieved candidates and does not invent new assessments. This ensures all recommendations are valid URLs from the catalog.
46
-
47
- 3. Optimization Efforts
48
- -----------------------
49
- - Reciprocal Rank Fusion (RRF): Tuned the `k` parameter in RRF to balance the influence of keyword vs. semantic scores.
50
- - Prompt Engineering: Iterated on the Reranking prompt to strictly enforce the "minimum 5, maximum 10" constraint and to prioritize "Individual Test Solutions" over "Pre-packaged Solutions" as per requirements.
51
- - Performance: Used `faiss-cpu` for fast vector similarity search, ensuring the API responds within reasonable time limits.
52
-
53
- 4. Repository Structure for Submission
54
- --------------------------------------
55
- - `shl_recommender/`: Core package containing source code and data.
56
- - `src/engine.py`: Main logic for the recommendation engine.
57
- - `src/app.py`: FastAPI application serving the endpoints.
58
- - `data/`: Contains the FAISS index, metadata pickle, and raw JSON.
59
- - `submission.csv`: The generated predictions for the test set.
60
- - `requirements.txt`: List of dependencies.
61
- - `project_report.txt`: This document.
62
- - `README.md`: Instructions for running the code.
63
-
64
- 5. How to Run
65
- -------------
66
- 1. Install dependencies: `pip install -r requirements.txt`
67
- 2. Set up `.env` with `GOOGLE_API_KEY`.
68
- 3. Run the API: `python shl_recommender/src/app.py`
69
- 4. Run inference: `python generate_submission.py`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -11,3 +11,4 @@ beautifulsoup4
11
  pandas
12
  openpyxl
13
  rank_bm25
 
 
11
  pandas
12
  openpyxl
13
  rank_bm25
14
+ gradio
shl_recommender/src/app.py CHANGED
@@ -1,5 +1,6 @@
1
  import os
2
  import requests
 
3
  from bs4 import BeautifulSoup
4
  from fastapi import FastAPI, HTTPException
5
  from pydantic import BaseModel
@@ -60,6 +61,52 @@ def recommend(request: RecommendRequest):
60
 
61
  return response
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  if __name__ == "__main__":
64
  import uvicorn
65
  uvicorn.run(app, host="0.0.0.0", port=8002)
 
1
  import os
2
  import requests
3
+ import gradio as gr
4
  from bs4 import BeautifulSoup
5
  from fastapi import FastAPI, HTTPException
6
  from pydantic import BaseModel
 
61
 
62
  return response
63
 
64
+ # Gradio Interface Logic
65
+ def gradio_recommend(query, url):
66
+ if not query and not url:
67
+ return "Please provide a query or a URL."
68
+
69
+ # Reuse the logic from the API
70
+ query_text = query
71
+ if url:
72
+ scraped_text = scrape_url(url)
73
+ if scraped_text:
74
+ if query_text:
75
+ query_text += f"\n\nContext from URL:\n{scraped_text}"
76
+ else:
77
+ query_text = scraped_text
78
+
79
+ if not query_text:
80
+ return "Could not extract text from URL."
81
+
82
+ results = engine.recommend(query_text, top_n=10)
83
+
84
+ # Format for display
85
+ output = []
86
+ for item in results:
87
+ output.append([item['name'], item['url'], ", ".join(item['test_type'])])
88
+
89
+ return output
90
+
91
+ # Create Gradio Interface
92
+ iface = gr.Interface(
93
+ fn=gradio_recommend,
94
+ inputs=[
95
+ gr.Textbox(lines=2, placeholder="Enter your job description or query here...", label="Query"),
96
+ gr.Textbox(placeholder="Optional: Paste a job posting URL", label="Job Posting URL")
97
+ ],
98
+ outputs=gr.Dataframe(headers=["Assessment Name", "URL", "Type"], label="Recommendations"),
99
+ title="SHL Assessment Recommender",
100
+ description="Enter a job description or URL to get SHL assessment recommendations.",
101
+ examples=[
102
+ ["Looking for a Java developer with good communication skills", ""],
103
+ ["", "https://www.shl.com/careers/"]
104
+ ]
105
+ )
106
+
107
+ # Mount Gradio App
108
+ app = gr.mount_gradio_app(app, iface, path="/")
109
+
110
  if __name__ == "__main__":
111
  import uvicorn
112
  uvicorn.run(app, host="0.0.0.0", port=8002)
submission_document.txt DELETED
@@ -1,50 +0,0 @@
1
-
2
- SHL Assessment Recommendation System - Approach & Optimization Report
3
-
4
- 1. Problem Statement & Solution Overview
5
- The objective was to build an intelligent recommendation system that maps natural language queries (job descriptions or role requirements) to relevant SHL assessments. The core challenge lies in bridging the semantic gap between high-level job descriptions (e.g., "hiring a Java dev") and specific assessment titles (e.g., "Java 8 - Knowledge & Skills").
6
-
7
- My solution implements a Retrieval-Augmented Generation (RAG) pipeline with a Hybrid Search strategy, leveraging both semantic understanding and exact keyword matching, followed by an LLM-based reranking step.
8
-
9
- 2. Technical Approach
10
-
11
- 2.1 Data Ingestion & Indexing
12
- - The SHL assessment catalog was ingested and structured into a JSON format.
13
- - Semantic Index: I used `sentence-transformers/all-mpnet-base-v2` to generate dense vector embeddings for each assessment's description and metadata. These are stored in a FAISS index for efficient similarity search.
14
- - Keyword Index: A BM25 index was created using `rank_bm25` to capture exact keyword matches. This is critical for distinguishing between specific tools (e.g., "Excel" vs. "PowerPoint") which might be semantically close but functionally distinct.
15
-
16
- 2.2 Retrieval Engine (Hybrid Search)
17
- - Query Expansion: The user query is first processed by an LLM (Gemini 2.5 Flash) to expand it with relevant synonyms and related skills (e.g., "Frontend" -> "HTML, CSS, React, JavaScript").
18
- - Hybrid Retrieval: The system executes two parallel searches:
19
- 1. Semantic Search (FAISS): Retrieves assessments that are conceptually similar to the query.
20
- 2. Keyword Search (BM25): Retrieves assessments that contain exact matches for tools mentioned in the query.
21
- - Fusion: Results are combined using Reciprocal Rank Fusion (RRF) to produce a robust candidate list.
22
-
23
- 2.3 Reranking & Selection
24
- - The top 20 candidates from the retrieval stage are passed to the LLM for reranking.
25
- - The LLM is provided with full metadata (name, duration, description) and a strict prompt to select the best 5-10 assessments based on client-specific priorities (e.g., prioritizing "Individual Test Solutions" over pre-packaged ones).
26
-
27
- 3. Optimization Efforts & Performance Improvement
28
-
29
- 3.1 Optimization Journey & Recall Improvements
30
- - Phase 1: Baseline Semantic Search (FAISS + LLM)
31
- - Metric: Mean Recall@10 = 0.20
32
- - Analysis: The initial approach used simple semantic search followed by LLM generation. The low recall was primarily due to **Ground Truth Alignment** issues. The semantic model retrieved conceptually similar assessments, but often missed the specific URLs defined in the ground truth, which required exact keyword matches.
33
-
34
- - Phase 2: LLM Filtering + FAISS + LLM Reranking
35
- - Metric: Mean Recall@10 = 0.33
36
- - Action: Introduced an LLM-based filtering step and a dedicated LLM reranking stage to refine the results.
37
- - Result: The score improved to 0.33. However, analysis revealed that the bottleneck was not the presence of irrelevant "pre-packaged" solutions, but rather the strictness of the **Ground Truth**. The system was recommending valid, relevant assessments, but they were penalized because they didn't match the exact arbitrary URLs labeled in the dataset.
38
-
39
- - Phase 3: Query Expansion + Hybrid Search (FAISS + BM25) + LLM Reranking
40
- - Metric: Mean Recall@10 = 0.41
41
- - Action: Implemented the full pipeline: 1) LLM-based Query Expansion to bridge vocabulary gaps, 2) Hybrid Search (FAISS + BM25) to capture both semantic meaning and exact tool names, and 3) Final LLM Reranking.
42
- - Result: This combination provided the best performance. BM25 fixed the exact match issues (e.g., "Excel"), Query Expansion handled synonyms, and Reranking ensured the final top 10 were highly relevant and balanced.
43
- - **Critical Observation**: It was noted that the provided Ground Truth dataset contained approximately 11 "Pre-packaged Job Solution" URLs. This contradicts the assignment instruction to explicitly ignore/filter out pre-packaged solutions. Since our system correctly followed the instruction to filter these out, it naturally resulted in a "miss" against the ground truth for those specific entries, artificially capping the maximum possible recall score.
44
-
45
- 3.2 Solving Specific Challenges
46
- - "Vibe-Coding" vs. Reliability: To prevent hallucination, the LLM is strictly constrained to select *only* from the retrieved candidates. It cannot invent new URLs.
47
- - Balancing Hard & Soft Skills: The reranking prompt was engineered to explicitly look for a mix of "Knowledge & Skills" and "Personality & Behavior" tests when the query implies a multi-faceted role (e.g., "Manager" or "Team Lead").
48
-
49
- 4. Conclusion
50
- The final system is a robust, low-latency recommendation engine that effectively combines the precision of keyword search with the understanding of semantic search. The addition of LLM-based query expansion and reranking ensures that the recommendations are not just textually similar, but contextually relevant to the hiring need.