Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.
+-----------+ | | I voted for Obama because he was most aligned with my values", she said. | | | +-------------------------------------------------+------------+
“I”, “my”, and “she” belong to the same cluster and “Obama” and “he” belong to the same cluster.
Experiments are conducted on the data of the CoNLL-2012 shared task, which uses OntoNotes coreference annotations. Papers report the precision, recall, and F1 of the MUC, B3, and CEAFφ4 metrics using the official CoNLL-2012 evaluation scripts. The main evaluation metric is the average F1 of the three metrics.
|Model||Avg F1||Paper / Source||Code|
|Joshi et al. (2019)1||79.6||SpanBERT: Improving Pre-training by Representing and Predicting Spans||Official|
|Joshi et al. (2019)2||76.9||BERT for Coreference Resolution: Baselines and Analysis||Official|
|Kantor and Globerson (2019)||76.6||Coreference Resolution with Entity Equalization||Official|
|Fei et al. (2019)||73.8||End-to-end Deep Reinforcement Learning Based Coreference Resolution|
|(Lee et al., 2017)+ELMo (Peters et al., 2018)+coarse-to-fine & second-order inference (Lee et al., 2018)||73.0||Higher-order Coreference Resolution with Coarse-to-fine Inference||Official|
|(Lee et al., 2017)+ELMo (Peters et al., 2018)||70.4||Deep contextualized word representations|
|Lee et al. (2017)||67.2||End-to-end Neural Coreference Resolution|
 Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+SpanBERT (Joshi et al., 2019)
 Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+BERT (Devlin et al., 2019)
Gendered Ambiguous Pronoun Resolution
Experiments are conducted on GAP dataset. Metrics used are F1 score on Masculine (M) and Feminine (F) examples, Overall, and a Bias factor calculated as F / M.
|Model||Overall F1||Masculine F1 (M)||Feminine F1 (F)||Bias (F/M)||Paper / Source||Code|
|Attree et al. (2019)||92.5||94.0||91.1||0.97||Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling||GREP|
|Chada et al. (2019)||90.2||90.9||89.5||0.98||Gendered Pronoun Resolution using BERT and an extractive question answering formulation||CorefQA|