View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Word Sense Disambiguation

The task of Word Sense Disambiguation (WSD) consists of associating words in context with their most suitable entry in a pre-defined sense inventory. The de-facto sense inventory for English in WSD is WordNet. For example, given the word “mouse” and the following sentence:

“A mouse consists of an object held in one’s hand, with one or more buttons.”

we would assign “mouse” with its electronic device sense (the 4th sense in the WordNet sense inventory).

Fine-grained WSD:

The Evaluation framework of Raganato et al. 2017 [1] includes two training sets (SemCor-Miller et al., 1993- and OMSTI-Taghipour and Ng, 2015-) and five test sets from the Senseval/SemEval series (Edmonds and Cotton, 2001; Snyder and Palmer, 2004; Pradhan et al., 2007; Navigli et al., 2013; Moro and Navigli, 2015), standardized to the same format and sense inventory (i.e. WordNet 3.0).

Typically, there are two kinds of approach for WSD: supervised (which make use of sense-annotated training data) and knowledge-based (which make use of the properties of lexical resources).

Supervised: The most widely used training corpus used is SemCor, with 226,036 sense annotations from 352 documents manually annotated. All supervised systems in the evaluation table are trained on SemCor. Some supervised methods, particularly neural architectures, usually employ the SemEval 2007 dataset as development set (marked by *). The most usual baseline is the Most Frequent Sense (MFS) heuristic, which selects for each target word the most frequent sense in the training data.

Knowledge-based: Knowledge-based systems usually exploit WordNet or BabelNet as semantic network. The first sense given by the underlying sense inventory (i.e. WordNet 3.0) is included as a baseline.

The main evaluation measure is F1-score.

Supervised:

Model	Senseval 2	Senseval 3	SemEval 2007	SemEval 2013	SemEval 2015	Paper / Source
MFS baseline	65.6	66.0	54.5	63.8	67.1	[1]
Bi-LSTM_att+LEX	72.0	69.4	63.7*	66.4	72.4	[2]
Bi-LSTM_att+LEX+POS	72.0	69.1	64.8*	66.9	71.5	[2]
context2vec	71.8	69.1	61.3	65.6	71.9	[3]
ELMo	71.6	69.6	62.2	66.2	71.3	[4]
GAS (Linear)	72.0	70.0	–*	66.7	71.6	[5]
GAS (Concatenation)	72.1	70.2	–*	67	71.8	[5]
GAS_ext (Linear)	72.4	70.1	–*	67.1	72.1	[5]
GAS_ext (Concatenation)	72.2	70.5	–*	67.2	72.6	[5]
supWSD	71.3	68.8	60.2	65.8	70.0	[6] [11]
supWSD_emb	72.7	70.6	63.1	66.8	71.8	[7] [11]
BERT (nearest neighbor)	73.8	71.6	63.3	69.2	74.4	[13] [code]
BERT (linear projection)	75.5	73.6	68.1	71.1	76.2	[13] [code]
GlossBERT	77.7	75.2	72.5	76.1	80.4	[14]
SemCor+WNGC, hypernyms	79.7	77.8	73.4	78.7	82.6	[15]
BEM	79.4	77.4	74.5	79.7	81.7	[17][code]
EWISER	78.9	78.4	71.0	78.9	79.3	[18][code]
EWISER+WNGC	80.8	79.0	75.2	80.7	81.8	[18][code]
SparseLMMS	77.9	77.8	68.8	76.1	77.5	[19][code]
SparseLMMS+WNGC	79.6	77.3	73.0	79.4	81.3	[19][code]
ARES	78.0	77.1	71.0	77.3	83.2	[20]
ESCHER	81.7	77.8	76.3	82.2	83.2	[21]
ESR	81.3	79.9	77.0	81.5	84.1	[22][code]
ESR+WNGC	82.5	80.2	78.5	82.3	85.3	[22][code]
ConSeC	82.3	79.9	77.4	83.2	85.2	[23][code]
ConSeC+WNGC	82.7	81.0	78.5	85.2	87.5	[23][code]

Knowledge-based:

Model	All	Senseval 2	Senseval 3	SemEval 2007	SemEval 2013	SemEval 2015	Paper / Source
WN 1st sense baseline	65.2	66.8	66.2	55.2	63.0	67.8	[1]
Babelfy	65.5	67.0	63.5	51.6	66.4	70.3	[8]
UKB_{ppr_w2w-nf}	57.5	64.2	54.8	40.0	64.5	64.5	[9] [12]
UKB_{ppr_w2w}	67.3	68.8	66.1	53.0	68.8	70.3	[9] [12]
WSD-TM	66.9	69.0	66.9	55.6	65.3	69.6	[10]
KEF	68.0	69.6	66.1	56.9	68.4	72.3	[16] [code]

Note: ‘All’ is the concatenation of all datasets, as described in [10] and [12]. The scores of [6,7] and [9] are not taken from the original papers but from the results of the implementations of [11] and [12], respectively.

[1] Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

[2] Neural Sequence Learning Models for Word Sense Disambiguation

[3] context2vec: Learning generic context embedding with bidirectional lstm

[4] Deep contextualized word representations

[5] Incorporating Glosses into Neural Word Sense Disambiguation

[6] It makes sense: A wide-coverage word sense disambiguation system for free text

[7] Embeddings for Word Sense Disambiguation: An Evaluation Study

[8] Entity Linking meets Word Sense Disambiguation: A Unified Approach

[9] Random walks for knowledge-based word sense disambiguation

[10] Knowledge-based Word Sense Disambiguation using Topic Models

[11] SupWSD: A Flexible Toolkit for Supervised Word Sense Disambiguation

[12] The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD

[13] Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations

[14] GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

[15] Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

[16] Word Sense Disambiguation: A Comprehensive Knowledge Exploitation Framework

[17] Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders

[18] Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information

[19] Sparsity Makes Sense: Word Sense Disambiguation Using Sparse Contextualized Word Representations

[20] With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation

[21] ESC: Redesigning WSD with Extractive Sense Comprehension

[22] Improved Word Sense Disambiguation with Enhanced Sense Representations

[23] ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension

WSD Lexical Sample task:

Above task is called All-words WSD because the systems attempt to disambiguate all of the words in a document, while there is another task which is called Lexical Sample task. In this task a number of words are selected and the system should only disambiguate the occurrences of these words in a test set. Iaccobacci et, al. (2016) provide the state-of-the-art results until 2016 [1]. Main tasks include Senseval 2, Senseval 3 and SemEval 2007. Evaluation metrics are as same as All words task.

Lexical Sample results:

Model	Senseval 2	Senseval 3	SemEval 2007	Paper / Source
IMSE + heuristics	71.4	76.2	-	[Preprint] [2]
IMS + Word2vec	69.9	75.2	89.4	[1]
AutoExtend	66.5	73.6	−	[3] [4]
Taghipour and Ng	66.2	73.4	−	[4]
IMS	65.3	72.9	87.9	[6]

Word Sense Induction

Word sense induction (WSI) is widely known as the “unsupervised version” of WSD. The problem states as: Given a target word (e.g., “cold”) and a collection of sentences (e.g., “I caught a cold”, “The weather is cold”) that use the word, cluster the sentences according to their different senses/meanings. We do not need to know the sense/meaning of each cluster, but sentences inside a cluster should have used the target words with the same sense.

There are two widely used datasets: SemEval 2010 and 2013, and both of them use different kinds of metrices: V-Measure (V-M) and paired F-Score (F-S) for SemEval 2010, and fuzzy B-Cubed (F-BC) and fuzzy normalized mutual information (F-NMI). For ease of system comparisons, the metrics are usually aggregated using a geometric mean (AVG).

SemEval 2010

Model	F-S	V-M	AVG	Paper/source	Code
BERT+DP (Amrami and Goldberg, 2019)	71.3	40.4	53.6	Towards better substitution-based word sense induction	Code
AutoSense (Amplayo et al., 2019)	61.7	9.8	24.59	AutoSense Model for Word Sense Induction	Code
SE-WSI-fix (Song et al., 2016)	55.1	9.8	23.24	Sense Embedding Learning for Word Sense Induction
BNP-HC (Chang et al., 2014)	23.1	21.4	22.23	Inducing Word Sense with Automatically Learned Hidden Concepts
LDA (Goyal and Hovy, 2014)	60.7	4.4	16.34	Unsupervised Word Sense Induction using Distributional Statistics

SemEval 2013

Model	F-BC	F_NMI	AVG	Paper/source	Code
BERT+DP (Amrami and Goldberg, 2019)	64.0	21.4	37.0	Towards better substitution-based word sense induction	Code
LSDP (Amrami and Goldberg, 2018)	57.5	11.3	25.4	Word Sense Induction with Neural biLM and Symmetric Patterns	Code
AutoSense (Amplayo et al., 2019)	61.7	7.96	22.16	AutoSense Model for Word Sense Induction	Code
MCC-S (Komninos and Manandhar, 2016)	55.6	7.62	20.58	Structured Generative Models of Continuous Features for Word Sense Induction
STM+w2v (Wang et al., 2016)	55.4	7.14	19.89	A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment
AI-KU (Baskaya et al., 2013)	39.0	6.5	15.92	AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation