Spaces:
Build error
Build error
| title: shl-ai-agent | |
| sdk: docker | |
| app_port: 7860 | |
| license: mit | |
| # SHL Assessment Recommendation Agent | |
| A conversational AI agent for recommending SHL psychometric assessments from the SHL Individual Test Solutions catalog. | |
| Built for the SHL Research Intern, AI Assignment — deployed on Hugging Face Spaces using Docker. | |
| --- | |
| ## What it does | |
| - Accepts a full conversation history via `POST /chat` and returns a recommendation reply. | |
| - Recommends 1–10 SHL assessments per response when enough context is available. | |
| - Asks clarifying questions when the query is vague. | |
| - Refuses off-topic requests (legal advice, compensation, prompt-injection). | |
| - Tracks constraints across conversation turns (role, seniority, domain, language). | |
| - Returns `end_of_conversation: true` when the user confirms the shortlist. | |
| - Stateless — no server-side session storage. | |
| --- | |
| ## API Schema | |
| ### `GET /health` | |
| ```json | |
| {"status": "ok"} | |
| ``` | |
| ### `POST /chat` | |
| **Request:** | |
| ```json | |
| { | |
| "messages": [ | |
| {"role": "user", "content": "..."}, | |
| {"role": "assistant", "content": "..."} | |
| ] | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "reply": "string", | |
| "recommendations": [ | |
| {"name": "string", "url": "string", "test_type": "string"} | |
| ], | |
| "end_of_conversation": false | |
| } | |
| ``` | |
| - `recommendations` is `[]` when clarifying or refusing. | |
| - `recommendations` has 1–10 items when shortlisting. | |
| --- | |
| ## Project Structure | |
| ``` | |
| shl-agent/ | |
| ├── app/ | |
| │ ├── __init__.py # Package marker | |
| │ ├── main.py # FastAPI app, routes, lifespan | |
| │ ├── schemas.py # Pydantic request/response models | |
| │ ├── agent.py # LLM orchestration, refusal logic, response parsing | |
| │ ├── retrieval.py # TF-IDF index build + query | |
| │ └── catalog_loader.py # Catalog I/O and validation | |
| ├── data/ | |
| │ └── shl_catalog.json # SHL catalog (35 items extracted from sample conversations) | |
| ├── scripts/ | |
| │ └── build_index.py # Precompute TF-IDF index artifacts | |
| ├── tests/ | |
| │ ├── sample_requests.json # 10 test scenarios | |
| │ └── evaluate.py # Automated evaluation script | |
| ├── Dockerfile | |
| ├── requirements.txt | |
| ├── .gitignore | |
| └── README.md | |
| ``` | |
| --- | |
| ## Local Setup and Run | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - An Anthropic API key (`claude-sonnet-4-20250514`) | |
| ### Steps | |
| ```bash | |
| # 1. Clone the repo | |
| git clone <your-repo-url> | |
| cd shl-agent | |
| # 2. Create and activate virtual environment | |
| python -m venv .venv | |
| source .venv/bin/activate # Linux/macOS | |
| # .venv\Scripts\activate # Windows | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| # 4. Set your API key | |
| export ANTHROPIC_API_KEY="sk-ant-..." | |
| # 5. (Optional but recommended) Pre-build the TF-IDF index | |
| python scripts/build_index.py | |
| # 6. Start the server | |
| uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload | |
| ``` | |
| The server is now running at `http://localhost:7860`. | |
| ### Docker Local Run | |
| ```bash | |
| # Build the Docker image | |
| docker build -t shl-agent . | |
| # Run the container with your API key | |
| docker run -p 7860:7860 -e ANTHROPIC_API_KEY="sk-ant-..." shl-agent | |
| ``` | |
| --- | |
| ## Curl Commands | |
| ### Health check | |
| ```bash | |
| curl http://localhost:7860/health | |
| # Expected: {"status":"ok"} | |
| ``` | |
| ### Vague query (should clarify) | |
| ```bash | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "messages": [ | |
| {"role": "user", "content": "We need a solution for senior leadership."} | |
| ] | |
| }' | |
| ``` | |
| ### Clear query (should recommend) | |
| ```bash | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "messages": [ | |
| {"role": "user", "content": "I need a cognitive ability test and personality test for graduate management trainees."} | |
| ] | |
| }' | |
| ``` | |
| ### Multi-turn conversation (add constraint) | |
| ```bash | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "messages": [ | |
| {"role": "user", "content": "I need a cognitive ability test and personality test for graduate management trainees."}, | |
| {"role": "assistant", "content": "For graduate trainees I recommend Verify G+ and OPQ32r."}, | |
| {"role": "user", "content": "Can you also add a situational judgement element?"} | |
| ] | |
| }' | |
| ``` | |
| ### Comparison question | |
| ```bash | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "messages": [ | |
| {"role": "user", "content": "What is the difference between OPQ32r and OPQ MQ Sales Report?"} | |
| ] | |
| }' | |
| ``` | |
| ### Off-topic refusal | |
| ```bash | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "messages": [ | |
| {"role": "user", "content": "Are we legally required under HIPAA to test all staff?"} | |
| ] | |
| }' | |
| ``` | |
| --- | |
| ## Hugging Face Spaces Deployment | |
| ### Prerequisites | |
| - A Hugging Face account and a Space (Docker SDK). | |
| - `huggingface_hub` CLI installed: `pip install huggingface_hub`. | |
| - Your Anthropic API key added as a **Space Secret** (not in code). | |
| ### Add the API key as a Space Secret | |
| In your Space settings → Secrets → add: | |
| ``` | |
| ANTHROPIC_API_KEY = sk-ant-... | |
| ``` | |
| ### Git commands to push to Hugging Face Spaces | |
| ```bash | |
| # 1. Install git-lfs (required for HF) | |
| git lfs install | |
| # 2. Clone your HF Space repo | |
| git clone https://huggingface.co/spaces/<your-username>/shl-ai-agent | |
| cd shl-ai-agent | |
| # 3. Copy project files into the cloned repo | |
| cp -r /path/to/shl-agent/* . | |
| # 4. Commit and push | |
| git add . | |
| git commit -m "Initial deployment: SHL AI Agent" | |
| git push | |
| # HF will build the Docker image automatically on push. | |
| # Monitor the build in: https://huggingface.co/spaces/<username>/shl-ai-agent | |
| ``` | |
| --- | |
| ## Running the Evaluation Script | |
| ```bash | |
| # Against local server | |
| python tests/evaluate.py --base-url http://localhost:7860 | |
| # Against deployed HF Space | |
| python tests/evaluate.py --base-url https://<username>-shl-ai-agent.hf.space | |
| ``` | |
| The script exits with code 0 on full pass, code 1 on any failure — suitable for CI. | |
| --- | |
| ## Common Deployment Mistakes on HF Spaces (and how to avoid them) | |
| | Mistake | Fix | | |
| |---------|-----| | |
| | Binding to `127.0.0.1` instead of `0.0.0.0` | Always use `--host 0.0.0.0` in uvicorn CMD | | |
| | Wrong port | `app_port: 7860` in README front matter must match Dockerfile EXPOSE and uvicorn `--port` | | |
| | API key in code | Set as a Space Secret; read via `os.environ.get("ANTHROPIC_API_KEY")` | | |
| | Running as root | Add `useradd` and `USER` in Dockerfile | | |
| | Importing heavy ML libraries (torch) | We use scikit-learn only — stays within HF free-tier RAM | | |
| | Cold build takes too long | Pre-build TF-IDF index in Dockerfile (`RUN python scripts/build_index.py`) | | |
| | Missing README YAML front matter | The `---` block must be the first thing in README.md | | |
| | git-lfs not installed | Run `git lfs install` before cloning HF repo | | |
| | Forgetting to set Space to Public | Public required for evaluator to reach your endpoint | | |
| --- | |
| ## Approach Document | |
| ### 1. Problem Framing | |
| Given a multi-turn conversation between an HR professional and an AI agent, the system must recommend the most relevant SHL psychometric assessments from a fixed catalog. The agent must handle vague queries, accumulate constraints, support comparisons, and refuse off-topic requests. | |
| ### 2. Data Ingestion | |
| The SHL catalog is stored as a structured JSON file (`data/shl_catalog.json`) with 35 items extracted from the 10 provided sample conversations. Each item has: `name`, `url`, `test_type`, `description`, `duration`, `languages`, `keys`, `seniority`, and `domains`. The catalog is the single source of truth — no external APIs are called for catalog data. | |
| ### 3. Retrieval Design | |
| We use TF-IDF (bigrams) over rich document strings constructed from all catalog fields. Query = concatenation of all user messages (latest message doubled for recency bias). Similarity = cosine distance via `linear_kernel`. Top-10 results above a score threshold of 0.05 are injected into the system prompt as context. | |
| **Why TF-IDF over sentence-transformers?** At 35 items, neural embeddings provide minimal recall benefit while adding ~2 GB of model weight. TF-IDF with bigrams and rich field concatenation is fast, transparent, and interview-defensible. | |
| ### 4. Agent Policy and Decision Logic | |
| The LLM (Claude Sonnet) receives: | |
| - A fixed system prompt defining scope, refusal rules, and output format. | |
| - Retrieved catalog items as grounding context. | |
| - Full conversation history. | |
| The system prompt uses XML output tags (`<reply>`, `<recommendations>`, `<end_of_conversation>`) for reliable parsing. This avoids JSON fragility from LLM outputs. | |
| ### 5. Scope Control and Refusal Strategy | |
| Two layers: | |
| 1. **Pre-LLM regex guard**: checks the latest user message against known refusal patterns (legal, compensation, prompt-injection). Fires before any LLM call — zero token cost. | |
| 2. **System prompt instructions**: tells the LLM to refuse anything outside the catalog scope. Belt-and-suspenders. | |
| URL validation post-parse: every URL returned by the LLM is checked against the catalog URL set. Non-catalog URLs are silently dropped. This eliminates hallucinated URLs. | |
| ### 6. Evaluation Strategy | |
| 10 test scenarios covering: vague→clarify, clear→recommend, add constraint→refine, comparison, off-topic refusal, prompt injection, EOC detection, technical roles, high-volume screening, compensation refusal. Each test checks: recommendations empty/non-empty, end_of_conversation flag, reply non-empty, URL format. | |
| ### 7. Trade-offs and Future Improvements | |
| | Current | Improvement | | |
| |---------|-------------| | |
| | TF-IDF retrieval | Sentence-transformers + FAISS for semantic recall on large catalogs | | |
| | Static JSON catalog | Live SHL API feed with caching | | |
| | Regex refusal guards | Fine-tuned classifier for nuanced refusal detection | | |
| | Single-worker uvicorn | Gunicorn + multiple uvicorn workers for throughput | | |
| | Closing phrase heuristic | LLM-based intent classification for EOC detection | | |
| ### 8. Use of AI Assistance | |
| Claude (Anthropic) was used for: scaffolding boilerplate (FastAPI app structure, Dockerfile patterns), suggesting retrieval approaches, and reviewing code for obvious bugs. All architecture decisions, retrieval design, refusal logic, schema choices, and system prompt engineering were made by the developer and are fully explainable and reviewable. | |