Spaces:

InsanAlex
/

iris-at-text2sparql

Running on CPU Upgrade

iris-at-text2sparql / slides /challenge_pitch.tex

Alex Latipov

Upload challenge API

0f21eb9 about 1 month ago

10.4 kB

	\documentclass[aspectratio=169]{beamer}

	\usetheme{Madrid}
	\usecolortheme{default}
	\setbeamertemplate{navigation symbols}{}

	\usepackage[T1]{fontenc}
	\usepackage[utf8]{inputenc}
	\usepackage{lmodern}
	\usepackage{booktabs}
	\usepackage{array}
	\usepackage{ragged2e}
	\usepackage{xcolor}

	\title{Why Join the TEXT2SPARQL'26 Challenge?}
	\subtitle{A compact testbed for reliable LLM-to-KG interaction}
	\author{Alex Latipov}
	\institute{Draft for supervisor discussion}
	\date{March 2026}

	\begin{document}

	\begin{frame}
	\titlepage
	\end{frame}

	\begin{frame}{Why TEXT2SPARQL'26, Why Now?}
	\begin{itemize}
	\item I already have a working research direction on \textbf{semantic SPARQL error detection}.
	\item The next natural step is to apply that knowledge to a real \textbf{NL2SPARQL translation pipeline}.
	\item TEXT2SPARQL'26 offers a \textbf{compact, externally validated, and time-bounded} setting to test this idea.
	\item The challenge creates a concrete deliverable: not only an analysis framework, but a \textbf{working system}.
	\item This is useful scientifically and also valuable for current industrial conversations around KG access and query reliability.
	\end{itemize}
	\end{frame}

	\begin{frame}{The Research Gap: LLMs Still Need a Reliable Interface to Knowledge Graphs}
	\begin{itemize}
	\item LLMs communicate well in natural language, but they still struggle to interact \textbf{reliably} with structured knowledge.
	\item Knowledge graphs become much more useful when users can query them through language.
	\item In practice, this requires dependable \textbf{natural language to SPARQL translation}.
	\item The central problem is not only syntax: many failures are \textbf{semantic}:
	\begin{itemize}
	\item wrong entities or relations,
	\item incorrect joins,
	\item invalid type assumptions,
	\item wrong aggregations or filters,
	\item queries that execute but return the wrong result set.
	\end{itemize}
	\item This is exactly where my existing SPARQL error detection work can contribute.
	\end{itemize}
	\end{frame}

	\begin{frame}{How This Fits My PhD Trajectory}
	\begin{block}{Working PhD direction}
	Making LLM interaction with structured knowledge \textbf{more reliable}, from graph construction to graph querying.
	\end{block}

	\begin{itemize}
	\item \textbf{Study 1:} Triple-based factual correctness evaluation of AI-generated summaries
	\begin{itemize}
	\item focus: extracting graph structure for factuality assessment
	\end{itemize}
	\item \textbf{Study 2:} LLM-based ontology learning
	\begin{itemize}
	\item focus: enriching and filtering structured semantic representations
	\end{itemize}
	\item \textbf{Study 3:} SPARQL semantic error analysis
	\begin{itemize}
	\item focus: diagnosing semantic failures in knowledge graph queries
	\end{itemize}
	\item \textbf{Next step:} apply semantic error detection to improve \textbf{end-to-end NL2SPARQL translation}
	\end{itemize}
	\end{frame}

	\begin{frame}{What the Challenge Actually Is}
	\begin{itemize}
	\item TEXT2SPARQL'26 evaluates systems that translate \textbf{natural language questions} into \textbf{SPARQL queries}.
	\item Evaluation is performed on two knowledge graphs:
	\begin{itemize}
	\item \textbf{DBpedia}: a larger open-domain KG
	\item \textbf{Corporate KG}: a smaller domain-specific KG released shortly before evaluation
	\end{itemize}
	\item Participants do not upload predictions as files; instead, they expose a \textbf{live API endpoint}.
	\item Organizers call the endpoint with questions, retrieve predicted SPARQL, execute it, and score the resulting answers.
	\item This makes the challenge closer to a realistic system deployment than a static benchmark submission.
	\end{itemize}
	\end{frame}

	\begin{frame}{Challenge Mechanics: API, Data, and Evaluation Workflow}
	\begin{itemize}
	\item \textbf{API requirement.} The system is evaluated as a live web API. It receives a natural language question together with a dataset identifier and returns a JSON response containing the predicted SPARQL query. During evaluation, organizers query this endpoint directly rather than asking for a static prediction file.

	\item \textbf{Data situation.} For DBpedia, the challenge provides the KG dump, but no clearly exposed 2026 train split is available, so the training setup has to be designed independently. Public NL2SPARQL resources such as DBNQA, QALD, and LC-QuAD can be reused. For the corporate setting, only the KG dump is released, roughly 24 hours before evaluation.

	\item \textbf{Official scoring workflow.} The organizers provide an official command-line client that can query the API, execute predicted SPARQL queries, and compute evaluation metrics. This is useful because it allows local self-evaluation with the same workflow that is used in the official challenge.
	\end{itemize}
	\end{frame}

	\begin{frame}{My Proposed Solution: NL2SPARQL with Semantic Error Detection}
	\begin{block}{Core idea}
	Use semantic error detection as a reliability layer on top of NL2SPARQL generation.
	\end{block}

	\begin{enumerate}
	\item Generate multiple candidate SPARQL queries from the natural language question.
	\item Check the candidates for semantic problems.
	\item Rank or filter candidates based on semantic plausibility.
	\item Return the best query through the challenge API.
	\end{enumerate}

	\vspace{0.4em}
	\textbf{Selling point:}
	the system is not ``just another generator''; it explicitly tries to \textbf{detect and reduce semantic failures}.
	\end{frame}

	\begin{frame}{Why I Believe This Approach Can Work}
	\begin{itemize}
	\item NL2SPARQL errors are often semantic rather than purely syntactic.
	\item Candidate generation plus reranking is a natural place to inject semantic knowledge.
	\item My SPARQL error-bench work already provides insight into:
	\begin{itemize}
	\item common failure types,
	\item likely semantic inconsistencies,
	\item patterns that can be reused for validation and selection.
	\end{itemize}
	\item The hidden corporate KG favors \textbf{general semantic robustness} over pure dataset memorization.
	\item DBpedia is close enough to my current experience with structured knowledge graphs to make rapid adaptation feasible.
	\end{itemize}
	\end{frame}

	\begin{frame}{Why Doing It as a Challenge is Better Than Doing It in Isolation}
	\begin{itemize}
	\item It gives an \textbf{external evaluation setting} instead of an internal-only experiment.
	\item The hidden test setup reduces overfitting to my own assumptions.
	\item The challenge forces a complete research artifact:
	\begin{itemize}
	\item model or pipeline,
	\item API,
	\item evaluation,
	\item deployment discipline.
	\end{itemize}
	\item It provides visibility and comparability against other approaches in the field.
	\item It is a \textbf{time-compact pilot}:
	useful evidence can be obtained quickly without committing to a large standalone project first.
	\end{itemize}
	\end{frame}

	\begin{frame}{Registration and Technical Setup}
	\begin{itemize}
	\item Challenge participation requires a \textbf{public API endpoint}.
	\item The endpoint must answer simple HTTP GET requests with:
	\begin{itemize}
	\item question
	\item dataset
	\end{itemize}
	\item I already prepared:
	\begin{itemize}
	\item a challenge-compatible API scaffold,
	\item a local testing environment,
	\item compatibility checks with the official challenge client,
	\item a containerized deployment setup.
	\end{itemize}
	\item Registration is done by adding the endpoint URL and team information to the official \texttt{CHALLENGERS.yaml}.
	\end{itemize}
	\end{frame}

	\begin{frame}{Safety, Deployment, and Risk Mitigation}
	\begin{itemize}
	\item The deployment will not expose my full university account or working environment.
	\item The plan is to run the system in an \textbf{isolated container}.
	\item Safety measures already considered:
	\begin{itemize}
	\item non-root execution inside the container,
	\item mount only required directories,
	\item do not bake secrets into the image,
	\item expose only the API port,
	\item prefer local binding plus tunnel / reverse proxy over raw public host exposure.
	\end{itemize}
	\item Local testing comes first; public exposure is added only after controlled validation.
	\item This keeps the engineering risk limited while still making the system deployable for the challenge.
	\end{itemize}
	\end{frame}

	\begin{frame}{Expected Outcomes for Research and External Stakeholders}
	\begin{columns}[T]
	\begin{column}{0.48\textwidth}
	\textbf{Research outcomes}
	\begin{itemize}
	\item evidence on whether semantic error detection improves NL2SPARQL reliability
	\item a reusable end-to-end testbed
	\item benchmarked challenge results
	\item possible basis for a paper or follow-up study
	\end{itemize}
	\end{column}
	\begin{column}{0.48\textwidth}
	\textbf{External relevance}
	\begin{itemize}
	\item Semaku and NXP are interested not only in error analysis, but also in translation systems
	\item a challenge result would provide a visible proof-of-concept
	\item this strengthens the practical credibility of the semantic error detection line of work
	\end{itemize}
	\end{column}
	\end{columns}
	\end{frame}

	\begin{frame}{Decision Needed and Immediate Next Steps}
	\begin{block}{Decision needed}
	Treat challenge participation as a focused short-term research activity that tests whether semantic SPARQL error detection can improve NL2SPARQL systems.
	\end{block}

	\textbf{Immediate next steps}
	\begin{itemize}
	\item finalize the public endpoint for registration
	\item connect the current NL2SPARQL generator to the challenge API
	\item integrate semantic error detection into candidate selection
	\item run self-evaluation with the official client
	\item register and participate in the evaluation
	\end{itemize}

	\vspace{0.5em}
	\centering
	\textit{The challenge is not the end goal; it is a compact and credible testbed for the next stage of the PhD.}
	\end{frame}

	\end{document}