View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Vietnamese NLP tasks

Dependency parsing

  Model LAS UAS Paper Code
Predicted POS VnCoreNLP (2018) 70.23 76.93 VnCoreNLP: A Vietnamese Natural Language Processing Toolkit Official
Gold POS VnCoreNLP (2018) 73.39 79.02 VnCoreNLP: A Vietnamese Natural Language Processing Toolkit Official
Gold POS BiLSTM graph-based parser (2016) 73.17 79.39 Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations Official
Gold POS BiLSTM transition-based parser (2016) 72.53 79.33 Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations Official
Gold POS MSTparser (2006) 70.29 76.47 Online large-margin training of dependency parsers  
Gold POS MaltParser (2007) 69.10 74.91 MaltParser: A language-independent system for datadriven dependency parsing  

Machine translation

English-to-Vietnamese translation

Model BLEU Paper Code
CVT (2018) 29.6 Semi-Supervised Sequence Modeling with Cross-View Training  
ELMo (2018) 29.3 Deep contextualized word representations  
Transformer (2017) 28.9 Attention is all you need Link
Google (2017) 26.1 Neural machine translation (seq2seq) tutorial Official
Stanford (2015) 23.3 Stanford Neural Machine Translation Systems for Spoken Language Domains  

Named entity recognition

Model F1 Paper Code
VnCoreNLP (2018) 88.55 VnCoreNLP: A Vietnamese Natural Language Processing Toolkit Official
BiLSTM-CRF + CNN-char (2016) 88.28 End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF Official / Link
BiLSTM-CRF + LSTM-char (2016) 87.71 Neural Architectures for Named Entity Recognition Link
BiLSTM-CRF (2015) 86.48 Bidirectional LSTM-CRF Models for Sequence Tagging Link

Part-of-speech tagging

Model Accuracy Paper Code
VnCoreNLP-VnMarMoT (2017) 95.88 From Word Segmentation to POS Tagging for Vietnamese Official
BiLSTM-CRF + CNN-char (2016) 95.40 End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF Official / Link
BiLSTM-CRF + LSTM-char (2016) 95.31 Neural Architectures for Named Entity Recognition Link
BiLSTM-CRF (2015) 95.06 Bidirectional LSTM-CRF Models for Sequence Tagging Link
RDRPOSTagger (2014) 95.11 RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger Official

Word segmentation

Model F1 Paper Code
VnCoreNLP-RDRsegmenter (2018) 97.90 A Fast and Accurate Vietnamese Word Segmenter Official
UETsegmenter (2016) 97.87 A hybrid approach to Vietnamese word segmentation Official
vnTokenizer (2008) 97.33 A Hybrid Approach to Word Segmentation of Vietnamese Texts  
JVnSegmenter (2006) 97.06 Vietnamese Word Segmentation with CRFs and SVMs: An Investigation  
DongDu (2012) 96.90 Ứng dụng phương pháp Pointwise vào bài toán tách từ cho tiếng Việt