Spaces:
Running
Running
| title: Ndl Core Data Api | |
| emoji: ๐ | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| short_description: Semantic search API for NDL Core datasets | |
| # NDL Core Data API | |
| A FastAPI-based service that provides semantic search and data download capabilities for NDL Core datasets. The API uses LanceDB for vector search with sentence transformers for embedding. | |
| ## Base URL | |
| ``` | |
| https://theodi-ndl-core-data-api.hf.space | |
| ``` | |
| ## Endpoints | |
| ### Search | |
| **GET** `/search` | |
| Perform semantic search across NDL Core datasets using natural language queries and provides dataset details along with the ownload links. | |
| **Parameters:** | |
| | Parameter | Type | Required | Default | Description | | |
| |-----------|------|----------|---------|-------------| | |
| | `query` | string | Yes | - | Natural language search query | | |
| | `limit` | integer | No | 5 | Maximum number of results to return | | |
| **Example:** | |
| ```bash | |
| curl "https://theodi-ndl-core-data-api.hf.space/search?query="Police%20use%20of%20force"&limit=10" | |
| ``` | |
| **Response:** | |
| ```json | |
| [ | |
| { | |
| "identifier": "UUID1", | |
| "title": "Police use of force dataset1", | |
| "description": "...", | |
| "format": "parquet", | |
| "download": ["https://huggingface.co/datasets/theodi/ndl-core-structured-data/resolve/main/ac923bbd-57ca-4a84-8d6d-53dbb3614d3d/eca8b02a-c09a-43e9-86c4-b5a9294bce67.parquet"], | |
| ... | |
| }, | |
| { | |
| "identifier": "UUID2", | |
| "title": "Police use of force dataset2", | |
| "description": "...", | |
| "format": "text", | |
| : ["https://theodi-ndl-core-data-api.hf.space/download/text/e06b5cf8-3e2f-4bc6-a6e9-0530b5bd165d"] | |
| ... | |
| }, | |
| ] | |
| ``` | |
| see [NDL Corpus](https://huggingface.co/datasets/theodi/ndl-core-corpus) the definition of all fields. | |
| --- | |
| ### Download Text File | |
| **GET** `/download/text/{identifier}` | |
| Stream text content as a downloadable `.txt` file. | |
| **Parameters:** | |
| | Parameter | Type | Required | Description | | |
| |-----------|------|----------|-------------| | |
| | `identifier` | path | Yes | The dataset identifier | | |
| **Example:** | |
| ```bash | |
| curl -O "https://theodi-ndl-core-data-api.hf.space/download/text/UUID2" | |
| ``` | |
| **Response:** | |
| - Returns a `text/plain` file download with `Content-Disposition: attachment` | |
| **Errors:** | |
| - `404` - No record found with the given identifier | |
| - `400` - Record exists but is not in text format | |
| --- | |
| ## Data Sources | |
| - **Vector Index:** [theodi/ndl-core-rag-index](https://huggingface.co/datasets/theodi/ndl-core-rag-index) | |
| - **Structured Data:** [theodi/ndl-core-structured-data](https://huggingface.co/datasets/theodi/ndl-core-structured-data) | |
| ## Technology Stack | |
| - **Framework:** FastAPI | |
| - **Vector Database:** LanceDB | |
| - **Embeddings:** Sentence Transformers (all-MiniLM-L6-v2) | |
| - **Deployment:** Docker on Hugging Face Spaces | |