nomic-embeddings

Running

App Files Files Community

nomic-embeddings / CLAUDE.md

Patryk Ptasiński

Fix nvidia/NV-Embed-v2 to use trust_remote_code=True

0b5e578 5 months ago

preview code

raw

history blame contribute delete

2.86 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	This is a Hugging Face Spaces application that provides text embeddings using 15+ state-of-the-art embedding models including Nomic, BGE, Snowflake Arctic, IBM Granite, and sentence-transformers models. It runs on CPU and provides both a web interface and API endpoints for generating text embeddings with model selection.

	## Key Commands

	### Local Development
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run the application locally
	python app.py
	```

	### Git Operations
	```bash
	# Push to Hugging Face Spaces (requires authentication)
	git push origin main

	# Note: May need to authenticate with:
	huggingface-cli login
	```

	## Architecture

	The application consists of a single `app.py` file with:
	- Model Configuration: Dictionary of 15+ embedding models with trust_remote_code settings (lines 10-26)
	- Model Caching: Dynamic model loading with caching to avoid reloading (lines 32-42)
	- FastAPI App: Direct HTTP endpoints at `/embed` and `/models` (lines 44, 57-102)
	- Embedding Function: Multi-model wrapper that calls model.encode() (lines 49-53)
	- Gradio Interface: UI with model dropdown selector and API endpoint (lines 106-135)
	- Dual Server: FastAPI mounted with Gradio using uvicorn (lines 214-219)

	## Important Configuration Details

	- Queue: Hugging Face Spaces enforces queuing at infrastructure level, even without `.queue()` in code
	- CPU Mode: Explicitly set to CPU to avoid GPU requirements
	- Trust Remote Code: Only predefined models in MODELS dict allow `trust_remote_code=True`
	- Any HF Model: API accepts any Hugging Face model name but uses `trust_remote_code=False` for unlisted models
	- API Access: Direct HTTP available via FastAPI endpoints

	## API Usage

	Two options for API access:

	1. Direct FastAPI endpoint (no queue):
	```bash
	# List models
	curl https://ipepe-nomic-embeddings.hf.space/models

	# Generate embedding with specific model
	curl -X POST https://ipepe-nomic-embeddings.hf.space/embed \
	-H "Content-Type: application/json" \
	-d '{"text": "your text", "model": "mixedbread-ai/mxbai-embed-large-v1"}'
	```

	2. Gradio client (handles queue automatically):
	```python
	from gradio_client import Client
	client = Client("ipepe/nomic-embeddings")
	result = client.predict("text to embed", "model-name", api_name="/predict")
	```

	## Deployment Notes

	- Deployed on Hugging Face Spaces at https://huggingface.co/spaces/ipepe/nomic-embeddings
	- Runs on port 7860
	- Uses Gradio 4.36.1 (newer versions available)
	- PyTorch configured for CPU-only via `--extra-index-url` in requirements.txt

	## Development Constraints

	- There is no python installed locally, everything needs to be deployed to hugging face first