View on GitHub


Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Constituency parsing

Constituency parsing aims to extract a constituency-based parse tree from a sentence that represents its syntactic structure according to a phrase structure grammar.


             Sentence (S)
   |                          |
 Noun (N)                Verb Phrase (VP)
   |                          |
 John                 +-------+--------+
                      |                |
                    Verb (V)         Noun (N)
                      |                |
                    sees              Bill

Recent approaches convert the parse tree into a sequence following a depth-first traversal in order to be able to apply sequence-to-sequence models to it. The linearized version of the above parse tree looks as follows: (S (N) (VP V N)).

Penn Treebank

The Wall Street Journal section of the Penn Treebank is used for evaluating constituency parsers. Section 22 is used for development and Section 23 is used for evaluation. Models are evaluated based on F1. Most of the below models incorporate external data or features. For a comparison of single models trained only on WSJ, refer to Kitaev and Klein (2018).

Model F1 score Paper / Source Code
Span Attention + XLNet (Tian et al., 2020) 96.40 Improving Constituency Parsing with Span Attention Official
Label Attention Layer + HPSG + XLNet (Mrini et al., 2020) 96.38 Rethinking Self-Attention: Towards Interpretability for Neural Parsing Official
Attach-Juxtapose Parser + XLNet (Yang and Deng, 2020) 96.34 Strongly Incremental Constituency Parsing with Graph Neural Networks Official
Head-Driven Phrase Structure Grammar Parsing (Joint) + XLNet (Zhou and Zhao, 2019) 96.33 Head-Driven Phrase Structure Grammar Parsing on Penn Treebank  
Head-Driven Phrase Structure Grammar Parsing (Joint) + BERT (Zhou and Zhao, 2019) 95.84 Head-Driven Phrase Structure Grammar Parsing on Penn Treebank  
CRF Parser + BERT (Zhang et al., 2020) 95.69 Fast and Accurate Neural CRF Constituency Parsing Official
Self-attentive encoder + ELMo (Kitaev and Klein, 2018) 95.13 Constituency Parsing with a Self-Attentive Encoder Official
Model combination (Fried et al., 2017) 94.66 Improving Neural Parsing by Disentangling Model Combination and Reranking Effects  
LSTM Encoder-Decoder + LSTM-LM (Takase et al., 2018) 94.47 Direct Output Connection for a High-Rank Language Model  
LSTM Encoder-Decoder + LSTM-LM (Suzuki et al., 2018) 94.32 An Empirical Study of Building a Strong Baseline for Constituency Parsing  
In-order (Liu and Zhang, 2017) 94.2 In-Order Transition-based Constituent Parsing  
CRF Parser (Zhang et al., 2020) 94.12 Fast and Accurate Neural CRF Constituency Parsing Official
Semi-supervised LSTM-LM (Choe and Charniak, 2016) 93.8 Parsing as Language Modeling  
Stack-only RNNG (Kuncoro et al., 2017) 93.6 What Do Recurrent Neural Network Grammars Learn About Syntax?  
RNN Grammar (Dyer et al., 2016) 93.3 Recurrent Neural Network Grammars  
Transformer (Vaswani et al., 2017) 92.7 Attention Is All You Need  
Combining Constituent Parsers (Fossum and Knight, 2009) 92.4 Combining constituent parsers via parse selection or parse hybridization  
Semi-supervised LSTM (Vinyals et al., 2015) 92.1 Grammar as a Foreign Language  
Self-trained parser (McClosky et al., 2006) 92.1 Effective Self-Training for Parsing  

Go back to the README