View on GitHub


Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Coreference resolution

Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.


               |           |
I voted for Obama because he was most aligned with my values", she said.
 |                                                 |            |

“I”, “my”, and “she” belong to the same cluster and “Obama” and “he” belong to the same cluster.

CoNLL 2012

Experiments are conducted on the data of the CoNLL-2012 shared task, which uses OntoNotes coreference annotations. Papers report the precision, recall, and F1 of the MUC, B3, and CEAFφ4 metrics using the official CoNLL-2012 evaluation scripts. The main evaluation metric is the average F1 of the three metrics.

Model Avg F1 Paper / Source Code
wl-coref + RoBERTa 81.0 Word-Level Coreference Resolution Official
s2e+Longformer-Large 80.3 Coreference Resolution without Span Representations Official
Xu et al. (2020) 80.2 Revealing the Myth of Higher-Order Inference in Coreference Resolution Official
Joshi et al. (2019)1 79.6 SpanBERT: Improving Pre-training by Representing and Predicting Spans Official
Joshi et al. (2019)2 76.9 BERT for Coreference Resolution: Baselines and Analysis Official
Kantor and Globerson (2019) 76.6 Coreference Resolution with Entity Equalization Official
Fei et al. (2019) 73.8 End-to-end Deep Reinforcement Learning Based Coreference Resolution  
(Lee et al., 2017)+ELMo (Peters et al., 2018)+coarse-to-fine & second-order inference (Lee et al., 2018) 73.0 Higher-order Coreference Resolution with Coarse-to-fine Inference Official
(Lee et al., 2017)+ELMo (Peters et al., 2018) 70.4 Deep contextualized word representations  
Lee et al. (2017) 67.2 End-to-end Neural Coreference Resolution  

[1] Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+SpanBERT (Joshi et al., 2019)

[2] Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+BERT (Devlin et al., 2019)

Gendered Ambiguous Pronoun Resolution

Experiments are conducted on GAP dataset. Metrics used are F1 score on Masculine (M) and Feminine (F) examples, Overall, and a Bias factor calculated as F / M.

Model Overall F1 Masculine F1 (M) Feminine F1 (F) Bias (F/M) Paper / Source Code
Attree et al. (2019) 92.5 94.0 91.1 0.97 Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling GREP
Chada et al. (2019) 90.2 90.9 89.5 0.98 Gendered Pronoun Resolution using BERT and an extractive question answering formulation CorefQA

Go back to the README