Spaces:
Running
Running
Semantic IR: Presentation Script & Sample Queries
Query 1: The "Agent + Action" Match
Search for: Bush nominated
- Type: Simple Structural Query
- Why I chose it: To demonstrate the fundamental difference between Keyword matching and Grammar parsing.
- What to expect & point out:
- BM25: Finds the words "Bush" and "nominated" anywhere in the text, even if they aren't grammatically related.
- SRL: Parses the query to understand that
Bushis theARG0(The Doer/Agent) andnominateis thePredicate(The Action). It strictly returns sentences where Bush is the one actively doing the nominating. - Hybrid: Combines the keyword strength of BM25 with the grammatical strictness of SRL. Point to the top Hybrid result and show the red and blue highlights proving the grammar was enforced.
Query 2: The Multi-Argument Constraint (Directionality)
Search for: Google acquired Firefox
- Type: Complex Directional Query
- Why I chose it: To prove that our system understands "Who did what to whom." In Information Retrieval, Dense Neural Networks notoriously struggle with directionality (they confuse "Man bit dog" with "Dog bit man" because the vectors are almost identical).
- What to expect & point out:
- Dense: Might return sentences about general tech acquisitions or confuse the acquirer with the acquired.
- SRL: Strictly maps
GoogletoARG0(Acquirer) andFirefoxtoARG1(Acquired). - Hybrid: Look at the Top Result. Click "View Parse Tree" to visually prove to the evaluator that the backend built a dependency graph and correctly mapped Google to the subject and Firefox to the object.
Query 3: The Role Reversal Disambiguation
Search for: guerrillas killed
- Type: Ambiguity Resolution
- Why I chose it: "Guerrillas killed" is highly ambiguous. Does it mean guerrillas committed murder, or does it mean they were the victims?
- What to expect & point out:
- BM25 & Dense: Will return a messy mix of both scenarios.
- SRL & Hybrid: Because "guerrillas" precedes the active verb, the system maps it to
ARG0(The Killer). Point out that the Hybrid column actively suppressed sentences where guerrillas were the victims (ARG1), proving that Semantic Roles act as a powerful precision guardrail.
Query 4: The Conversational Concept
Search for: buy a phone
- Type: Broad/Conversational Concept
- Why I chose it: To show that the system still functions perfectly as a modern search engine for everyday queries, not just complex political sentences.
- What to expect & point out:
- Dense: Does the heavy lifting here, expanding the search to concepts related to purchasing and technology.
- Hybrid: Takes the broad semantic meaning from Dense, but uses the SRL weight to ensure the action is actually related to "buying" (
Predicate: buy.01) and the object is a "phone" (ARG1).
Query 5: The "Out-of-Vocabulary" Fallback (Empty State)
Search for: himanshu (or any completely made-up word)
- Type: Hallucination Guardrail Test
- Why I chose it: To demonstrate how the architecture handles complete failures and how Lexical/Structural algorithms differ from Neural Networks.
- What to expect & point out:
- BM25 & SRL: Will instantly show beautiful "No lexical/structural matches found" empty states. Explain that because the word doesn't exist, the strict algorithms correctly scored it a
0.0and our backend dynamically filtered them out. - Dense: Will actually return results! Point this out to the evaluator as a classic flaw of Vector Search: Neural Networks always guess the "closest mathematical vector", even if the distance is huge, leading to hallucinated results.
- Conclusion: Conclude by saying this proves why the Hybrid approach is safest—it relies on Dense for broad recall but uses BM25 and SRL to pull the emergency brake when the neural network is hallucinating.
- BM25 & SRL: Will instantly show beautiful "No lexical/structural matches found" empty states. Explain that because the word doesn't exist, the strict algorithms correctly scored it a