ndl-core-data-api / README.md
hkir-dev's picture
Update README.md
210d485 verified
metadata
title: Ndl Core Data Api
emoji: 🏃
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
short_description: Semantic search API for NDL Core datasets

NDL Core Data API

A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding.

Base URL

https://theodi-ndl-core-data-api.hf.space

Endpoints

Search

GET /search

Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links.

Parameters:

Parameter Type Required Default Description
query string Yes - Natural language search query
limit integer No 5 Maximum number of results to return

Example:

curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10"

Response:

[
  {
    "identifier": "UUID1",
    "title": "Police use of force dataset1",
    "description": "...",
    "format": "parquet",
    "download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"],
    ...
  },
  {
    "identifier": "UUID2",
    "title": "Police use of force dataset2",
    "description": "...",
    "format": "text",
    : ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"]
    ...
  },
]

see NDL Corpus the definition of all fields.


Download Text File

GET /download/text/{identifier}

Stream text content as a downloadable .txt file.

Parameters:

Parameter Type Required Description
identifier path Yes The dataset identifier

Example:

curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2"

Response:

  • Returns a text/plain file download with Content-Disposition: attachment

Errors:

  • 404 - No record found with the given identifier
  • 400 - Record exists but is not in text format

Data Sources

Technology Stack

  • Framework: FastAPI
  • Vector Database: LanceDB
  • Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • Deployment: Docker on Hugging Face Spaces