Word Sense Disambiguation (WSD) is the task of identifying which meaning of a word is used in a sentence when the word has multiple meanings. For example, determining whether "bank" refers to a financial institution or the side of a river.
The Lesk algorithm, introduced by Michael Lesk in 1986, is a classical approach to WSD that works by comparing the dictionary definition of each possible sense with the words in the context.
Basic Idea: Choose the sense whose dictionary definition shares the most words with the context in which the target word appears.
We've significantly improved the original Lesk algorithm with several enhancements:
Instead of just counting overlapping words, we use BERT embeddings to calculate semantic similarity between the context and each sense definition, capturing deeper meaning relationships.
Words closer to the target word are given higher weight, as they're more likely to be relevant to its meaning. This proximity-based weighting improves accuracy.
We expand sense definitions with examples, hypernyms, hyponyms, and other related terms from WordNet to create richer signatures for comparison.
We identify common word combinations (like "river bank" or "baseball bat") that strongly indicate specific senses.
The system learns from user corrections, improving its accuracy over time by adjusting sense scores based on feedback.
For the sentence "She saw a bat flying in the dark":