Understanding the Enhanced Lesk Algorithm

What is Word Sense Disambiguation?

Word Sense Disambiguation (WSD) is the task of identifying which meaning of a word is used in a sentence when the word has multiple meanings. For example, determining whether "bank" refers to a financial institution or the side of a river.

The Original Lesk Algorithm

The Lesk algorithm, introduced by Michael Lesk in 1986, is a classical approach to WSD that works by comparing the dictionary definition of each possible sense with the words in the context.

Basic Idea: Choose the sense whose dictionary definition shares the most words with the context in which the target word appears.

Our Enhanced Lesk Algorithm

We've significantly improved the original Lesk algorithm with several enhancements:

1BERT Semantic Similarity

Instead of just counting overlapping words, we use BERT embeddings to calculate semantic similarity between the context and each sense definition, capturing deeper meaning relationships.

2Context Weighting

Words closer to the target word are given higher weight, as they're more likely to be relevant to its meaning. This proximity-based weighting improves accuracy.

3Rich Sense Signatures

We expand sense definitions with examples, hypernyms, hyponyms, and other related terms from WordNet to create richer signatures for comparison.

4Collocation Detection

We identify common word combinations (like "river bank" or "baseball bat") that strongly indicate specific senses.

5User Feedback Learning

The system learns from user corrections, improving its accuracy over time by adjusting sense scores based on feedback.

Example

For the sentence "She saw a bat flying in the dark":

Target word: "bat" Possible senses: 1. "a nocturnal mammal with wings" 2. "a implement used for hitting a ball in sports" Context words: [she, saw, flying, dark] Collocation check: "bat flying" → strong indicator of animal sense Rule application: "flying" → animal sense rule triggered Sense 1 signature: [nocturnal, mammal, wing, fly, night, animal, cave, ...] Sense 2 signature: [implement, hit, ball, sport, game, baseball, cricket, ...] Overlap scores: - Sense 1: High overlap with "flying" and "dark" (related to nocturnal, night) - Sense 2: Low overlap with context words BERT similarity: - Sense 1: High similarity between "bat flying in the dark" and "nocturnal mammal with wings" - Sense 2: Lower similarity with sports equipment definition Final scores: - Sense 1 (animal): 8.7 - Sense 2 (sports): 2.3 Result: Sense 1 is selected as the correct meaning.

Advantages Over Basic Lesk

Higher accuracy for common ambiguous words
Better handling of contextual nuances
Integration of modern NLP techniques
Adaptive learning from user feedback
Combination of statistical and rule-based approaches
Try the WSD Tool