View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Chinese Word Segmentation

Task

Chinese word segmentation is the task of splitting Chinese text (a sequence of Chinese characters) into words.

Example:

'上海浦东开发与建设同步' → ['上海', '浦东', '开发', ‘与', ’建设', '同步']

Systems

♠ marks the system that uses character unigram as input. ♣ marks the system that uses character bigram as input.

Evaluation

Metrics

F1-score

Dataset

Chinese Treebank 6

Model F1 Paper / Source Code
Huang et al. (2019) 97.6 Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning  
Tian et al. (2020) 97.3 Improving Chinese Word Segmentation with Wordhood Memory Networks Github
Ma et al. (2018) 96.7 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Yang et al. (2018) 96.3 Subword Encoding in Lattice LSTM for Chinese Word Segmentation Github
Yang et al. (2017) 96.2 Neural Word Segmentation with Rich Pretraining Github
Zhou et al. (2017) 96.2 Word-Context Character Embeddings for Chinese Word Segmentation  
Chen et al. (2017) 96.2 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 95.5 Exploring Segment Representations for Neural Segmentation Models Github
Chen et al. (2015b) 96.0 Long Short-Term Memory Neural Networks for Chinese Word Segmentation Github

Chinese Treebank 7

Model F1 Paper / Source Code
Ma et al. (2018) 96.6 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Kurita et al. (2017) 96.2 Neural Joint Model for Transition-based Chinese Syntactic Analysis  

AS

Model F1 Paper / Source Code
Tian et al. (2020) 96.6 Improving Chinese Word Segmentation with Wordhood Memory Networks Github
Huang et al. (2019) 96.6 Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning  
Ma et al. (2018) 96.2 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Yang et al. (2017) 95.7 Neural Word Segmentation with Rich Pretraining Github
Cai et al. (2017) 95.3 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 94.8 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github

CityU

Model F1 Paper / Source Code
Tian et al. (2020) 97.9 Improving Chinese Word Segmentation with Wordhood Memory Networks Github
Huang et al. (2019) 97.6 Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning  
Ma et al. (2018) 97.2 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Yang et al. (2017) 96.9 Neural Word Segmentation with Rich Pretraining Github
Cai et al. (2017) 95.6 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 95.6 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github

PKU

Model F1 Paper / Source Code
Huang et al. (2019) 96.6 Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning  
Tian et al. (2020) 96.5 Improving Chinese Word Segmentation with Wordhood Memory Networks Github
Yang et al. (2017) 96.3 Neural Word Segmentation with Rich Pretraining Github
Ma et al. (2018) 96.1 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Yang et al. (2018) 95.9 Subword Encoding in Lattice LSTM for Chinese Word Segmentation Github
Cai et al. (2017) 95.8 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 94.3 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 95.7 Exploring Segment Representations for Neural Segmentation Models Github
Cai and Zhao (2016) 95.7 Neural Word Segmentation Learning for Chinese Github

MSR

Model F1 Paper / Source Code
Tian et al. (2020) 98.4 Improving Chinese Word Segmentation with Wordhood Memory Networks Github
Ma et al. (2018) 98.1 State-of-the-art Chinese Word Segmentation with Bi-LSTMs  
Huang et al. (2019) 97.9 Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning  
Yang et al. (2018) 97.8 Subword Encoding in Lattice LSTM for Chinese Word Segmentation Github
Yang et al. (2017) 97.5 Neural Word Segmentation with Rich Pretraining Github
Cai et al. (2017) 97.1 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 96.0 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 97.6 Exploring Segment Representations for Neural Segmentation Models Github
Cai and Zhao (2016) 96.4 Neural Word Segmentation Learning for Chinese Github

Go back to the README