Spaces:
Running
Running
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| This is a Hugging Face Spaces application that provides text embeddings using 15+ state-of-the-art embedding models including Nomic, BGE, Snowflake Arctic, IBM Granite, and sentence-transformers models. It runs on CPU and provides both a web interface and API endpoints for generating text embeddings with model selection. | |
| ## Key Commands | |
| ### Local Development | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run the application locally | |
| python app.py | |
| ``` | |
| ### Git Operations | |
| ```bash | |
| # Push to Hugging Face Spaces (requires authentication) | |
| git push origin main | |
| # Note: May need to authenticate with: | |
| huggingface-cli login | |
| ``` | |
| ## Architecture | |
| The application consists of a single `app.py` file with: | |
| - **Model Configuration**: Dictionary of 15+ embedding models with trust_remote_code settings (lines 10-26) | |
| - **Model Caching**: Dynamic model loading with caching to avoid reloading (lines 32-42) | |
| - **FastAPI App**: Direct HTTP endpoints at `/embed` and `/models` (lines 44, 57-102) | |
| - **Embedding Function**: Multi-model wrapper that calls model.encode() (lines 49-53) | |
| - **Gradio Interface**: UI with model dropdown selector and API endpoint (lines 106-135) | |
| - **Dual Server**: FastAPI mounted with Gradio using uvicorn (lines 214-219) | |
| ## Important Configuration Details | |
| - **Queue**: Hugging Face Spaces enforces queuing at infrastructure level, even without `.queue()` in code | |
| - **CPU Mode**: Explicitly set to CPU to avoid GPU requirements | |
| - **Trust Remote Code**: Only predefined models in MODELS dict allow `trust_remote_code=True` | |
| - **Any HF Model**: API accepts any Hugging Face model name but uses `trust_remote_code=False` for unlisted models | |
| - **API Access**: Direct HTTP available via FastAPI endpoints | |
| ## API Usage | |
| Two options for API access: | |
| 1. **Direct FastAPI endpoint** (no queue): | |
| ```bash | |
| # List models | |
| curl https://ipepe-nomic-embeddings.hf.space/models | |
| # Generate embedding with specific model | |
| curl -X POST https://ipepe-nomic-embeddings.hf.space/embed \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"text": "your text", "model": "mixedbread-ai/mxbai-embed-large-v1"}' | |
| ``` | |
| 2. **Gradio client** (handles queue automatically): | |
| ```python | |
| from gradio_client import Client | |
| client = Client("ipepe/nomic-embeddings") | |
| result = client.predict("text to embed", "model-name", api_name="/predict") | |
| ``` | |
| ## Deployment Notes | |
| - Deployed on Hugging Face Spaces at https://huggingface.co/spaces/ipepe/nomic-embeddings | |
| - Runs on port 7860 | |
| - Uses Gradio 4.36.1 (newer versions available) | |
| - PyTorch configured for CPU-only via `--extra-index-url` in requirements.txt | |
| ## Development Constraints | |
| - There is no python installed locally, everything needs to be deployed to hugging face first |