Spaces:
Running
Running
metadata
title: Ndl Core Data Api
emoji: 🏃
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
short_description: Semantic search API for NDL Core datasets
NDL Core Data API
A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding.
Base URL
https://theodi-ndl-core-data-api.hf.space
Endpoints
Search
GET /search
Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Natural language search query |
limit |
integer | No | 5 | Maximum number of results to return |
Example:
curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10"
Response:
[
{
"identifier": "UUID1",
"title": "Police use of force dataset1",
"description": "...",
"format": "parquet",
"download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"],
...
},
{
"identifier": "UUID2",
"title": "Police use of force dataset2",
"description": "...",
"format": "text",
: ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"]
...
},
]
see NDL Corpus the definition of all fields.
Download Text File
GET /download/text/{identifier}
Stream text content as a downloadable .txt file.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
identifier |
path | Yes | The dataset identifier |
Example:
curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2"
Response:
- Returns a
text/plainfile download withContent-Disposition: attachment
Errors:
404- No record found with the given identifier400- Record exists but is not in text format
Data Sources
- Vector Index: theodi/ndl-core-rag-index
- Structured Data: theodi/ndl-core-structured-data
Technology Stack
- Framework: FastAPI
- Vector Database: LanceDB
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Deployment: Docker on Hugging Face Spaces