View on GitHub


Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.


Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.

Dialog state tracking

Dialogue state tacking consists of determining at each turn of a dialog the full representation of what the user wants at that point in the dialog, which contains a goal constraint, a set of requested slots, and the user’s dialog act.

Second dialog state tracking challenge

For goal-oriented dialogue, the dataset of the second dialog state tracking challenge (DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are evaluated based on accuracy on both individual and joint slot tracking.

Model Area Food Price Joint Paper / Source Code
Liu et al. (2018) 90 84 92 72 Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Neural belief tracker by Mrkšić et al. (2017) 90 84 94 72 Neural Belief Tracker: Data-Driven Dialogue State Tracking
RNN by Henderson et al. (2014) 92 86 86 69 Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate

Liu et al. (2018)


Mrkšić et al. (2017)


Henderson et al. (2014)


Go back to the README