ndl-core-data-api / README.md
hkir-dev's picture
Update README.md
210d485 verified
---
title: Ndl Core Data Api
emoji: ๐Ÿƒ
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
short_description: Semantic search API for NDL Core datasets
---
# NDL Core Data API
A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding.
## Base URL
```
https://theodi-ndl-core-data-api.hf.space
```
## Endpoints
### Search
**GET** `/search`
Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links.
**Parameters:**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | string | Yes | - | Natural language search query |
| `limit` | integer | No | 5 | Maximum number of results to return |
**Example:**
```bash
curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10"
```
**Response:**
```json
[
{
"identifier": "UUID1",
"title": "Police use of force dataset1",
"description": "...",
"format": "parquet",
"download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"],
...
},
{
"identifier": "UUID2",
"title": "Police use of force dataset2",
"description": "...",
"format": "text",
: ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"]
...
},
]
```
see [NDL Corpus](https://huggingface.co/datasets/theodi/ndl-core-corpus) the definition of all fields.
---
### Download Text File
**GET** `/download/text/{identifier}`
Stream text content as a downloadable `.txt` file.
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `identifier` | path | Yes | The dataset identifier |
**Example:**
```bash
curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2"
```
**Response:**
- Returns a `text/plain` file download with `Content-Disposition: attachment`
**Errors:**
- `404` - No record found with the given identifier
- `400` - Record exists but is not in text format
---
## Data Sources
- **Vector Index:** [theodi/ndl-core-rag-index](https://huggingface.co/datasets/theodi/ndl-core-rag-index)
- **Structured Data:** [theodi/ndl-core-structured-data](https://huggingface.co/datasets/theodi/ndl-core-structured-data)
## Technology Stack
- **Framework:** FastAPI
- **Vector Database:** LanceDB
- **Embeddings:** Sentence Transformers (all-MiniLM-L6-v2)
- **Deployment:** Docker on Hugging Face Spaces