--- title: Ndl Core Data Api emoji: 🏃 colorFrom: yellow colorTo: purple sdk: docker pinned: false short_description: Semantic search API for NDL Core datasets --- # NDL Core Data API A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding. ## Base URL ``` https://theodi-ndl-core-data-api.hf.space ``` ## Endpoints ### Search **GET** `/search` Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links. **Parameters:** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `query` | string | Yes | - | Natural language search query | | `limit` | integer | No | 5 | Maximum number of results to return | **Example:** ```bash curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10" ``` **Response:** ```json [ { "identifier": "UUID1", "title": "Police use of force dataset1", "description": "...", "format": "parquet", "download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"], ... }, { "identifier": "UUID2", "title": "Police use of force dataset2", "description": "...", "format": "text", : ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"] ... }, ] ``` see [NDL Corpus](https://huggingface.co/datasets/theodi/ndl-core-corpus) the definition of all fields. --- ### Download Text File **GET** `/download/text/{identifier}` Stream text content as a downloadable `.txt` file. **Parameters:** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `identifier` | path | Yes | The dataset identifier | **Example:** ```bash curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2" ``` **Response:** - Returns a `text/plain` file download with `Content-Disposition: attachment` **Errors:** - `404` - No record found with the given identifier - `400` - Record exists but is not in text format --- ## Data Sources - **Vector Index:** [theodi/ndl-core-rag-index](https://huggingface.co/datasets/theodi/ndl-core-rag-index) - **Structured Data:** [theodi/ndl-core-structured-data](https://huggingface.co/datasets/theodi/ndl-core-structured-data) ## Technology Stack - **Framework:** FastAPI - **Vector Database:** LanceDB - **Embeddings:** Sentence Transformers (all-MiniLM-L6-v2) - **Deployment:** Docker on Hugging Face Spaces