View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

IMDb

The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. Models are evaluated based on accuracy.

Model Score Paper / Source
ULMFiT (Howard and Ruder, 2018) 95.4 Universal Language Model Fine-tuning for Text Classification
Block-sparse LSTM (Gray et al., 2017) 94.99 GPU Kernels for Block-Sparse Weights
oh-LSTM (Johnson and Zhang, 2016) 94.1 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Virtual adversarial training (Miyato et al., 2016) 94.1 Adversarial Training Methods for Semi-Supervised Text Classification
BCN+Char+CoVe (McCann et al., 2017) 91.8 Learned in Translation: Contextualized Word Vectors

SST

The Stanford Sentiment Treebank contains of 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.

Fine-grained classification:

Model Accuracy Paper / Source
BCN+ELMo (Peters et al., 2018) 54.7 Deep contextualized word representations
BCN+Char+CoVe (McCann et al., 2017) 53.7 Learned in Translation: Contextualized Word Vectors

Binary classification:

Model Accuracy Paper / Source
Block-sparse LSTM (Gray et al., 2017) 93.2 GPU Kernels for Block-Sparse Weights
bmLSTM (Radford et al., 2017) 91.8 Learning to Generate Reviews and Discovering Sentiment
BCN+Char+CoVe (McCann et al., 2017) 90.3 Learned in Translation: Contextualized Word Vectors
Neural Semantic Encoder (Munkhdalai and Yu, 2017) 89.7 Neural Semantic Encoders
BLSTM-2DCNN (Zhou et al., 2017) 89.5 Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Yelp

The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).

Fine-grained classification:

Model Error Paper / Source
ULMFiT (Howard and Ruder, 2018) 29.98 Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017) 30.58 Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016) 32.39 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015) 37.95 Character-level Convolutional Networks for Text Classification

Binary classification:

Model Error Paper / Source
ULMFiT (Howard and Ruder, 2018) 2.16 Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017) 2.64 Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016) 2.90 Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015) 4.88 Character-level Convolutional Networks for Text Classification

Aspect-based sentiment analysis

Sentihood

Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric for aspect detection and accuracy as evaluation metric for sentiment analysis.

Model Aspect Sentiment Paper / Source
Liu et al. (2018) 78.5 91.0 Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis
SenticLSTM (Ma et al., 2018) 78.2 89.3 Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
LSTM-LOC (Saeidi et al., 2016) 69.3 81.9 Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods

Go back to the README