Spaces:
Running
Running
File size: 2,734 Bytes
9057a71 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 210d485 0f0e7c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | ---
title: Ndl Core Data Api
emoji: 🏃
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
short_description: Semantic search API for NDL Core datasets
---
# NDL Core Data API
A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding.
## Base URL
```
https://theodi-ndl-core-data-api.hf.space
```
## Endpoints
### Search
**GET** `/search`
Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links.
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | string | Yes | - | Natural language search query |
| `limit` | integer | No | 5 | Maximum number of results to return |
**Example:**
```bash
curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10"
```
**Response:**
```json
[
{
"identifier": "UUID1",
"title": "Police use of force dataset1",
"description": "...",
"format": "parquet",
"download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"],
...
},
{
"identifier": "UUID2",
"title": "Police use of force dataset2",
"description": "...",
"format": "text",
: ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"]
...
},
]
```
see [NDL Corpus](https://huggingface.co/datasets/theodi/ndl-core-corpus) the definition of all fields.
---
### Download Text File
**GET** `/download/text/{identifier}`
Stream text content as a downloadable `.txt` file.
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `identifier` | path | Yes | The dataset identifier |
**Example:**
```bash
curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2"
```
**Response:**
- Returns a `text/plain` file download with `Content-Disposition: attachment`
**Errors:**
- `404` - No record found with the given identifier
- `400` - Record exists but is not in text format
---
## Data Sources
- **Vector Index:** [theodi/ndl-core-rag-index](https://huggingface.co/datasets/theodi/ndl-core-rag-index)
- **Structured Data:** [theodi/ndl-core-structured-data](https://huggingface.co/datasets/theodi/ndl-core-structured-data)
## Technology Stack
- **Framework:** FastAPI
- **Vector Database:** LanceDB
- **Embeddings:** Sentence Transformers (all-MiniLM-L6-v2)
- **Deployment:** Docker on Hugging Face Spaces
|