Spaces:

nayan90k
/

semantic-ir-backend

Running

semantic-ir-backend / Sample_Queries.md

Kamal Nayan Kumar

Initial commit

dbc6ebe 12 days ago

4.04 kB

Semantic IR: Presentation Script & Sample Queries

Search for: Bush nominated

Type: Simple Structural Query
Why I chose it: To demonstrate the fundamental difference between Keyword matching and Grammar parsing.
What to expect & point out:
- BM25: Finds the words "Bush" and "nominated" anywhere in the text, even if they aren't grammatically related.
- SRL: Parses the query to understand that Bush is the ARG0 (The Doer/Agent) and nominate is the Predicate (The Action). It strictly returns sentences where Bush is the one actively doing the nominating.
- Hybrid: Combines the keyword strength of BM25 with the grammatical strictness of SRL. Point to the top Hybrid result and show the red and blue highlights proving the grammar was enforced.

Search for: Google acquired Firefox

Type: Complex Directional Query
Why I chose it: To prove that our system understands "Who did what to whom." In Information Retrieval, Dense Neural Networks notoriously struggle with directionality (they confuse "Man bit dog" with "Dog bit man" because the vectors are almost identical).
What to expect & point out:
- Dense: Might return sentences about general tech acquisitions or confuse the acquirer with the acquired.
- SRL: Strictly maps Google to ARG0 (Acquirer) and Firefox to ARG1 (Acquired).
- Hybrid: Look at the Top Result. Click "View Parse Tree" to visually prove to the evaluator that the backend built a dependency graph and correctly mapped Google to the subject and Firefox to the object.

Search for: guerrillas killed

Type: Ambiguity Resolution
Why I chose it: "Guerrillas killed" is highly ambiguous. Does it mean guerrillas committed murder, or does it mean they were the victims?
What to expect & point out:
- BM25 & Dense: Will return a messy mix of both scenarios.
- SRL & Hybrid: Because "guerrillas" precedes the active verb, the system maps it to ARG0 (The Killer). Point out that the Hybrid column actively suppressed sentences where guerrillas were the victims (ARG1), proving that Semantic Roles act as a powerful precision guardrail.

Search for: buy a phone

Type: Broad/Conversational Concept
Why I chose it: To show that the system still functions perfectly as a modern search engine for everyday queries, not just complex political sentences.
What to expect & point out:
- Dense: Does the heavy lifting here, expanding the search to concepts related to purchasing and technology.
- Hybrid: Takes the broad semantic meaning from Dense, but uses the SRL weight to ensure the action is actually related to "buying" (Predicate: buy.01) and the object is a "phone" (ARG1).

Search for: himanshu (or any completely made-up word)

Type: Hallucination Guardrail Test
Why I chose it: To demonstrate how the architecture handles complete failures and how Lexical/Structural algorithms differ from Neural Networks.
What to expect & point out:
- BM25 & SRL: Will instantly show beautiful "No lexical/structural matches found" empty states. Explain that because the word doesn't exist, the strict algorithms correctly scored it a 0.0 and our backend dynamically filtered them out.
- Dense: Will actually return results! Point this out to the evaluator as a classic flaw of Vector Search: Neural Networks always guess the "closest mathematical vector", even if the distance is huge, leading to hallucinated results.
- Conclusion: Conclude by saying this proves why the Hybrid approach is safest—it relies on Dense for broad recall but uses BM25 and SRL to pull the emergency brake when the neural network is hallucinating.