Spaces:

InsanAlex
/

iris-at-text2sparql

Build error

App Files Files Community

Alex Latipov commited on Apr 15

Commit

86083ec

1 Parent(s): ac1fd0a

Expose raw DBpedia and corporate SPARQL proxy endpoints

Browse files

Files changed (2) hide show

deployment/hf_eval_backend/README.md +41 -30
deployment/hf_eval_backend/app.py +104 -84

deployment/hf_eval_backend/README.md CHANGED Viewed

@@ -1,35 +1,46 @@
-# HF Evaluation Backend
-This package is the frozen evaluation backend for the Text2SPARQL challenge.
-Frozen configuration:
-- model: `gpt-5-mini`
-- judge mode: `single_general`
-- linker: `internal_max`
-- syntax attempts: `5`
-- semantic attempts: `5`
-- repair prompt: `local_edit`
-Main files:
-- `configs/final_gpt5mini_single_general_5_5_internal_max_local_edit.yaml`
-- `app.py`
-- `start_backend.sh`
-- `Dockerfile`
-API:
 - `GET /health`
-- `GET /text2sparql?question=...&dataset=...` -> returns plain SPARQL text
-- `GET /text2sparql.json?question=...&dataset=...` -> returns JSON with query + status
-Environment variables:
-- `OPENAI_API_KEY` required
-- `DBPEDIA_ENDPOINT_URL` optional override for DBpedia endpoint
-- `CORPORATE_ENDPOINT_URL` optional override for corporate endpoint
-- `CORPORATE_GRAPH_URI` optional override for the locally loaded corporate graph
-- `PORT` optional, defaults to `7860`
-Notes:
-- This package is intended to stay stable while research continues elsewhere.
-- It reuses the main repo code from `src/`, but the serving config is frozen here.
-- In the HF Space, DBpedia is restored from the snapshot bucket and the small
-  2026 corporate KG is loaded at boot from `data/corporate_2026/`.

+# HF KG Endpoint Backend
+This package now serves the Hugging Face Space as a **raw SPARQL endpoint proxy** over the locally booted Virtuoso instance.
+It no longer acts as the active challenge Text2SPARQL generation API.
+## Active Routes
 - `GET /health`
+- `GET /sparql/dbpedia`
+- `POST /sparql/dbpedia`
+- `GET /sparql/corporate`
+- `POST /sparql/corporate`
+## Behavior
+- DBpedia proxy forwards to the local DBpedia Virtuoso endpoint restored from the snapshot bucket.
+- Corporate proxy forwards to the same local Virtuoso instance, but through the corporate graph-aware endpoint URL.
+- The startup flow is unchanged:
+  - restore DBpedia snapshot
+  - start Virtuoso
+  - load the corporate graph
+  - start FastAPI
+## Environment Variables
+- `DBPEDIA_ENDPOINT_URL` optional override for the internal DBpedia upstream
+- `CORPORATE_ENDPOINT_URL` optional override for the internal corporate upstream
+- `CORPORATE_GRAPH_URI` optional corporate graph URI override
+- `PORT` optional FastAPI port, default `7860`
+## Expected Public URLs
+If the Space URL is:
+- `https://insanalex-iris-at-text2sparql.hf.space`
+then the public raw SPARQL endpoints are:
+- `https://insanalex-iris-at-text2sparql.hf.space/sparql/dbpedia`
+- `https://insanalex-iris-at-text2sparql.hf.space/sparql/corporate`
+## Notes
+- The old challenge-specific `/text2sparql` routes are no longer part of the active HF app.
+- The main repository still contains the repair pipeline code for local paper experiments; this deployment package now focuses only on stable KG access.

deployment/hf_eval_backend/app.py CHANGED Viewed

@@ -1,20 +1,18 @@
 from __future__ import annotations
 import os
-import uuid
 from functools import lru_cache
 from pathlib import Path
-from fastapi import FastAPI, HTTPException
 from fastapi.responses import PlainTextResponse
 from pydantic import BaseModel
 from src.config import RuntimeConfig, load_config
-from src.models import QueryRequest
-from src.pipeline import Text2SPARQLPipeline
-APP_TITLE = "TEXT2SPARQL Evaluation Backend"
 PROJECT_ROOT = Path(__file__).resolve().parents[2]
 BASE_CONFIG_PATH = PROJECT_ROOT / "configs" / "default.yaml"
 FROZEN_CONFIG_PATH = (
@@ -22,29 +20,29 @@ FROZEN_CONFIG_PATH = (
     / "configs"
     / "final_gpt5mini_single_general_5_5_internal_max_local_edit.yaml"
 )
-SUPPORTED_DATASETS = {
-    "https://text2sparql.aksw.org/2025/dbpedia/",
-    "https://text2sparql.aksw.org/2025/corporate/",
-    "https://text2sparql.aksw.org/2026/dbpedia/",
-    "https://text2sparql.aksw.org/2026/corporate/",
 }
 class HealthResponse(BaseModel):
     status: str
     service: str
-    dbpedia_endpoint_url: str
-    corporate_endpoint_url: str
-    config_path: str
-class QueryResponse(BaseModel):
-    dataset: str
-    question: str
-    query: str
-    status: str
-    request_id: str
 def _apply_endpoint_overrides(runtime: RuntimeConfig) -> RuntimeConfig:
@@ -76,9 +74,67 @@ def get_runtime() -> RuntimeConfig:
     return _apply_endpoint_overrides(runtime)
-@lru_cache(maxsize=1)
-def get_pipeline() -> Text2SPARQLPipeline:
-    return Text2SPARQLPipeline(get_runtime())
 app = FastAPI(title=APP_TITLE)
@@ -86,73 +142,37 @@ app = FastAPI(title=APP_TITLE)
 @app.get("/health", response_model=HealthResponse)
 def health() -> HealthResponse:
-    runtime = get_runtime()
     return HealthResponse(
         status="ok",
         service=APP_TITLE,
-        dbpedia_endpoint_url=runtime.datasets["https://text2sparql.aksw.org/2026/dbpedia/"]["endpoint_url"],
-        corporate_endpoint_url=runtime.datasets["https://text2sparql.aksw.org/2026/corporate/"]["endpoint_url"],
-        config_path=str(FROZEN_CONFIG_PATH),
     )
-@app.get("/")
-def root(question: str | None = None, dataset: str | None = None):
-    # Challenge client calls the API at "/" with dataset/question query params.
-    if question is not None and dataset is not None:
-        query, status, request_id = _run_query(dataset, question)
-        return QueryResponse(
-            dataset=dataset,
-            question=question,
-            query=query,
-            status=status,
-            request_id=request_id,
-        )
-    return PlainTextResponse("ok")
-def _run_query(dataset: str, question: str) -> tuple[str, str, str]:
-    if dataset not in SUPPORTED_DATASETS:
-        raise HTTPException(status_code=404, detail="Unsupported dataset identifier.")
-    request_id = f"eval_{uuid.uuid4().hex[:12]}"
-    request = QueryRequest(
-        request_id=request_id,
-        dataset_id=dataset,
-        question=question,
-        language=None,
-    )
-    trace = get_pipeline().run(request)
-    if not trace.final_query:
-        raise HTTPException(status_code=500, detail=f"Pipeline returned empty query ({trace.final_status}).")
-    return trace.final_query, trace.final_status, request_id
-@app.get("/text2sparql", response_model=QueryResponse)
-def text2sparql(question: str, dataset: str) -> QueryResponse:
-    query, status, request_id = _run_query(dataset, question)
-    return QueryResponse(
-        dataset=dataset,
-        question=question,
-        query=query,
-        status=status,
-        request_id=request_id,
     )
-@app.get("/text2sparql.raw", response_class=PlainTextResponse)
-def text2sparql_raw(question: str, dataset: str) -> str:
-    query, _, _ = _run_query(dataset, question)
-    return query
-@app.get("/text2sparql.json", response_model=QueryResponse)
-def text2sparql_json(question: str, dataset: str) -> QueryResponse:
-    query, status, request_id = _run_query(dataset, question)
-    return QueryResponse(
-        dataset=dataset,
-        question=question,
-        query=query,
-        status=status,
-        request_id=request_id,
-    )

 from __future__ import annotations
 import os
 from functools import lru_cache
 from pathlib import Path
+import requests
+from fastapi import FastAPI, HTTPException, Request, Response
 from fastapi.responses import PlainTextResponse
 from pydantic import BaseModel
 from src.config import RuntimeConfig, load_config
+APP_TITLE = "TEXT2SPARQL KG Endpoint Backend"
 PROJECT_ROOT = Path(__file__).resolve().parents[2]
 BASE_CONFIG_PATH = PROJECT_ROOT / "configs" / "default.yaml"
 FROZEN_CONFIG_PATH = (
     / "configs"
     / "final_gpt5mini_single_general_5_5_internal_max_local_edit.yaml"
 )
+REQUEST_TIMEOUT_SEC = 180
+EXCLUDED_PROXY_HEADERS = {
+    "connection",
+    "content-length",
+    "content-encoding",
+    "keep-alive",
+    "proxy-authenticate",
+    "proxy-authorization",
+    "te",
+    "trailer",
+    "transfer-encoding",
+    "upgrade",
 }
 class HealthResponse(BaseModel):
     status: str
     service: str
+    mode: str
+    dbpedia_proxy_path: str
+    corporate_proxy_path: str
+    dbpedia_upstream_url: str
+    corporate_upstream_url: str
 def _apply_endpoint_overrides(runtime: RuntimeConfig) -> RuntimeConfig:
     return _apply_endpoint_overrides(runtime)
+def _upstream_url(kind: str) -> str:
+    runtime = get_runtime()
+    dataset_id_map = {
+        "dbpedia": "https://text2sparql.aksw.org/2026/dbpedia/",
+        "corporate": "https://text2sparql.aksw.org/2026/corporate/",
+    }
+    dataset_id = dataset_id_map[kind]
+    return runtime.datasets[dataset_id]["endpoint_url"]
+def _proxy_headers(request: Request) -> dict[str, str]:
+    headers: dict[str, str] = {}
+    for name in ("accept", "content-type", "user-agent"):
+        value = request.headers.get(name)
+        if value:
+            headers[name] = value
+    return headers
+def _response_from_upstream(upstream: requests.Response) -> Response:
+    headers = {
+        key: value
+        for key, value in upstream.headers.items()
+        if key.lower() not in EXCLUDED_PROXY_HEADERS
+    }
+    return Response(
+        content=upstream.content,
+        status_code=upstream.status_code,
+        headers=headers,
+    )
+async def _proxy_sparql(request: Request, kind: str) -> Response:
+    endpoint_url = _upstream_url(kind)
+    params = list(request.query_params.multi_items())
+    headers = _proxy_headers(request)
+    try:
+        if request.method == "GET":
+            upstream = requests.get(
+                endpoint_url,
+                params=params,
+                headers=headers,
+                timeout=REQUEST_TIMEOUT_SEC,
+            )
+        else:
+            body = await request.body()
+            upstream = requests.post(
+                endpoint_url,
+                params=params,
+                data=body,
+                headers=headers,
+                timeout=REQUEST_TIMEOUT_SEC,
+            )
+    except requests.RequestException as exc:
+        raise HTTPException(
+            status_code=502,
+            detail=f"Upstream {kind} SPARQL endpoint request failed: {exc}",
+        ) from exc
+    return _response_from_upstream(upstream)
 app = FastAPI(title=APP_TITLE)
 @app.get("/health", response_model=HealthResponse)
 def health() -> HealthResponse:
     return HealthResponse(
         status="ok",
         service=APP_TITLE,
+        mode="sparql_proxy",
+        dbpedia_proxy_path="/sparql/dbpedia",
+        corporate_proxy_path="/sparql/corporate",
+        dbpedia_upstream_url=_upstream_url("dbpedia"),
+        corporate_upstream_url=_upstream_url("corporate"),
     )
+@app.get("/", response_class=PlainTextResponse)
+def root() -> str:
+    return (
+        "ok\n"
+        "Available SPARQL proxy endpoints:\n"
+        "  /sparql/dbpedia\n"
+        "  /sparql/corporate\n"
     )
+@app.get("/sparql", response_class=PlainTextResponse)
+def sparql_index() -> str:
+    return "Use /sparql/dbpedia or /sparql/corporate"
+@app.api_route("/sparql/dbpedia", methods=["GET", "POST"])
+async def sparql_dbpedia(request: Request) -> Response:
+    return await _proxy_sparql(request, "dbpedia")
+@app.api_route("/sparql/corporate", methods=["GET", "POST"])
+async def sparql_corporate(request: Request) -> Response:
+    return await _proxy_sparql(request, "corporate")