--- title: shl-ai-agent sdk: docker app_port: 7860 license: mit --- # SHL Assessment Recommendation Agent A conversational AI agent for recommending SHL psychometric assessments from the SHL Individual Test Solutions catalog. Built for the SHL Research Intern, AI Assignment — deployed on Hugging Face Spaces using Docker. --- ## What it does - Accepts a full conversation history via `POST /chat` and returns a recommendation reply. - Recommends 1–10 SHL assessments per response when enough context is available. - Asks clarifying questions when the query is vague. - Refuses off-topic requests (legal advice, compensation, prompt-injection). - Tracks constraints across conversation turns (role, seniority, domain, language). - Returns `end_of_conversation: true` when the user confirms the shortlist. - Stateless — no server-side session storage. --- ## API Schema ### `GET /health` ```json {"status": "ok"} ``` ### `POST /chat` **Request:** ```json { "messages": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] } ``` **Response:** ```json { "reply": "string", "recommendations": [ {"name": "string", "url": "string", "test_type": "string"} ], "end_of_conversation": false } ``` - `recommendations` is `[]` when clarifying or refusing. - `recommendations` has 1–10 items when shortlisting. --- ## Project Structure ``` shl-agent/ ├── app/ │ ├── __init__.py # Package marker │ ├── main.py # FastAPI app, routes, lifespan │ ├── schemas.py # Pydantic request/response models │ ├── agent.py # LLM orchestration, refusal logic, response parsing │ ├── retrieval.py # TF-IDF index build + query │ └── catalog_loader.py # Catalog I/O and validation ├── data/ │ └── shl_catalog.json # SHL catalog (35 items extracted from sample conversations) ├── scripts/ │ └── build_index.py # Precompute TF-IDF index artifacts ├── tests/ │ ├── sample_requests.json # 10 test scenarios │ └── evaluate.py # Automated evaluation script ├── Dockerfile ├── requirements.txt ├── .gitignore └── README.md ``` --- ## Local Setup and Run ### Prerequisites - Python 3.11+ - An Anthropic API key (`claude-sonnet-4-20250514`) ### Steps ```bash # 1. Clone the repo git clone cd shl-agent # 2. Create and activate virtual environment python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows # 3. Install dependencies pip install -r requirements.txt # 4. Set your API key export ANTHROPIC_API_KEY="sk-ant-..." # 5. (Optional but recommended) Pre-build the TF-IDF index python scripts/build_index.py # 6. Start the server uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload ``` The server is now running at `http://localhost:7860`. ### Docker Local Run ```bash # Build the Docker image docker build -t shl-agent . # Run the container with your API key docker run -p 7860:7860 -e ANTHROPIC_API_KEY="sk-ant-..." shl-agent ``` --- ## Curl Commands ### Health check ```bash curl http://localhost:7860/health # Expected: {"status":"ok"} ``` ### Vague query (should clarify) ```bash curl -X POST http://localhost:7860/chat \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "We need a solution for senior leadership."} ] }' ``` ### Clear query (should recommend) ```bash curl -X POST http://localhost:7860/chat \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "I need a cognitive ability test and personality test for graduate management trainees."} ] }' ``` ### Multi-turn conversation (add constraint) ```bash curl -X POST http://localhost:7860/chat \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "I need a cognitive ability test and personality test for graduate management trainees."}, {"role": "assistant", "content": "For graduate trainees I recommend Verify G+ and OPQ32r."}, {"role": "user", "content": "Can you also add a situational judgement element?"} ] }' ``` ### Comparison question ```bash curl -X POST http://localhost:7860/chat \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "What is the difference between OPQ32r and OPQ MQ Sales Report?"} ] }' ``` ### Off-topic refusal ```bash curl -X POST http://localhost:7860/chat \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "user", "content": "Are we legally required under HIPAA to test all staff?"} ] }' ``` --- ## Hugging Face Spaces Deployment ### Prerequisites - A Hugging Face account and a Space (Docker SDK). - `huggingface_hub` CLI installed: `pip install huggingface_hub`. - Your Anthropic API key added as a **Space Secret** (not in code). ### Add the API key as a Space Secret In your Space settings → Secrets → add: ``` ANTHROPIC_API_KEY = sk-ant-... ``` ### Git commands to push to Hugging Face Spaces ```bash # 1. Install git-lfs (required for HF) git lfs install # 2. Clone your HF Space repo git clone https://huggingface.co/spaces//shl-ai-agent cd shl-ai-agent # 3. Copy project files into the cloned repo cp -r /path/to/shl-agent/* . # 4. Commit and push git add . git commit -m "Initial deployment: SHL AI Agent" git push # HF will build the Docker image automatically on push. # Monitor the build in: https://huggingface.co/spaces//shl-ai-agent ``` --- ## Running the Evaluation Script ```bash # Against local server python tests/evaluate.py --base-url http://localhost:7860 # Against deployed HF Space python tests/evaluate.py --base-url https://-shl-ai-agent.hf.space ``` The script exits with code 0 on full pass, code 1 on any failure — suitable for CI. --- ## Common Deployment Mistakes on HF Spaces (and how to avoid them) | Mistake | Fix | |---------|-----| | Binding to `127.0.0.1` instead of `0.0.0.0` | Always use `--host 0.0.0.0` in uvicorn CMD | | Wrong port | `app_port: 7860` in README front matter must match Dockerfile EXPOSE and uvicorn `--port` | | API key in code | Set as a Space Secret; read via `os.environ.get("ANTHROPIC_API_KEY")` | | Running as root | Add `useradd` and `USER` in Dockerfile | | Importing heavy ML libraries (torch) | We use scikit-learn only — stays within HF free-tier RAM | | Cold build takes too long | Pre-build TF-IDF index in Dockerfile (`RUN python scripts/build_index.py`) | | Missing README YAML front matter | The `---` block must be the first thing in README.md | | git-lfs not installed | Run `git lfs install` before cloning HF repo | | Forgetting to set Space to Public | Public required for evaluator to reach your endpoint | --- ## Approach Document ### 1. Problem Framing Given a multi-turn conversation between an HR professional and an AI agent, the system must recommend the most relevant SHL psychometric assessments from a fixed catalog. The agent must handle vague queries, accumulate constraints, support comparisons, and refuse off-topic requests. ### 2. Data Ingestion The SHL catalog is stored as a structured JSON file (`data/shl_catalog.json`) with 35 items extracted from the 10 provided sample conversations. Each item has: `name`, `url`, `test_type`, `description`, `duration`, `languages`, `keys`, `seniority`, and `domains`. The catalog is the single source of truth — no external APIs are called for catalog data. ### 3. Retrieval Design We use TF-IDF (bigrams) over rich document strings constructed from all catalog fields. Query = concatenation of all user messages (latest message doubled for recency bias). Similarity = cosine distance via `linear_kernel`. Top-10 results above a score threshold of 0.05 are injected into the system prompt as context. **Why TF-IDF over sentence-transformers?** At 35 items, neural embeddings provide minimal recall benefit while adding ~2 GB of model weight. TF-IDF with bigrams and rich field concatenation is fast, transparent, and interview-defensible. ### 4. Agent Policy and Decision Logic The LLM (Claude Sonnet) receives: - A fixed system prompt defining scope, refusal rules, and output format. - Retrieved catalog items as grounding context. - Full conversation history. The system prompt uses XML output tags (``, ``, ``) for reliable parsing. This avoids JSON fragility from LLM outputs. ### 5. Scope Control and Refusal Strategy Two layers: 1. **Pre-LLM regex guard**: checks the latest user message against known refusal patterns (legal, compensation, prompt-injection). Fires before any LLM call — zero token cost. 2. **System prompt instructions**: tells the LLM to refuse anything outside the catalog scope. Belt-and-suspenders. URL validation post-parse: every URL returned by the LLM is checked against the catalog URL set. Non-catalog URLs are silently dropped. This eliminates hallucinated URLs. ### 6. Evaluation Strategy 10 test scenarios covering: vague→clarify, clear→recommend, add constraint→refine, comparison, off-topic refusal, prompt injection, EOC detection, technical roles, high-volume screening, compensation refusal. Each test checks: recommendations empty/non-empty, end_of_conversation flag, reply non-empty, URL format. ### 7. Trade-offs and Future Improvements | Current | Improvement | |---------|-------------| | TF-IDF retrieval | Sentence-transformers + FAISS for semantic recall on large catalogs | | Static JSON catalog | Live SHL API feed with caching | | Regex refusal guards | Fine-tuned classifier for nuanced refusal detection | | Single-worker uvicorn | Gunicorn + multiple uvicorn workers for throughput | | Closing phrase heuristic | LLM-based intent classification for EOC detection | ### 8. Use of AI Assistance Claude (Anthropic) was used for: scaffolding boilerplate (FastAPI app structure, Dockerfile patterns), suggesting retrieval approaches, and reviewing code for obvious bugs. All architecture decisions, retrieval design, refusal logic, schema choices, and system prompt engineering were made by the developer and are fully explainable and reviewable.