View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Data-to-Text Generation

Data-to-Text Generation (D2T NLG) can be described as Natural Language Generation from structured input. Unlike other NLG tasks such as, Machine Translation or Question Answering (also referred as Text-to-Text Generation or T2T NLG) where requirement is to generate textual output using some unstructured textual input, in D2T NLG the requirement is to generate textual output from the input provided in a structured format such as: tables; or knowledge graphs; or JSONs [1].

RotoWire

The dataset consists of articles summarizing NBA basketball games, paired with their corresponding box- and line-score tables. It is professionally written, medium length game summaries targeted at fantasy basketball fans. The writing is colloquial, but structured, and targets an audience primarily interested in game statistics [2].

The performance is evaluated on two different automated metrics: first, BLEU score; and second, a family of Extractive Evaluations (EE). EE contains three different submetrics evaluating three different aspects of the generation:

  1. Content Selection (CS): precision (P%) and recall (R%) of unique relations extracted from generated text that are also extracted from golden text. This measures how well the generated document matches the gold document in terms of selecting which records to generate.

  2. Relation Generation (RG): precision (P%) and number of unique relations (#) extracted from generated text that also appear in structured input provided. This measures how well the system is able to generate text containing factual (i.e., correct) records.

  3. Content Ordering (CO): normalized Damerau-Levenshtein Distance (DLD%) between the sequences of records extracted from golden text and that extracted from generated text. This measures how well the system orders the records it chooses to discuss.

Model BLEU CS (P% & R%) RG (P% & #) CO (DLD%) Paper / Source Code
Rebuffel, Clément, et al. (2020)[4] 17.50 39.47 & 51.64 89.46 & 21.17 18.90 A Hierarchical Model for Data-to-Text Generation Official
Puduppully et al. (2019)[3] 16.50 34.18 & 51.22 87.47 & 34.28 18.58 Data-to-text generation with content selection and planning Official
Wiseman et al. (2017)[2] 14.49 22.17 & 27.16 71.82 & 12.82 8.68 Challenges in Data-to-Document Generation Official

WebNLG

The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For example, given the three DBpedia triples (as shown in [a]), the aim is to generate a text (as shown in [b]):

The performance is evaluated on the basis of BLEU, METEOR and TER scores. The data from WebNLG Challenge 2017 can be downloaded here.

Model BLEU METEOR TER Paper / Source Code
Kale, Mihir. (2020) [9] 57.1 0.44   Text-to-Text Pre-Training for Data-to-Text Tasks  
Moryossef et al. (2019) [5] 47.4 0.391 0.631 Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation Official
Baseline 33.24 0.235436 0.613080 Baseline system provided during the challenge Official

P.S.: The test dataset of WebNLG consists of total 15 categories, out of which 10 (seen) catgories are used for training while 5 (unseen) are not. The results reported here are those obtained on overall test data, i.e., all 15 categories.

Meaning Representations

The dataset was first provided for the E2E Challenge in 2017. It is a crowd-sourced data set of 50k instances in the restaurant domain.Each instance consist of a dialogue act-based meaning representations (MR) and up to 5 references in natural language (NL). For example:

The performance is evaluated using BLEU, NIST, METEOR, ROUGE-L, CIDEr scores. The data from E2E Challenge 2017 can be downloaded here.

Model BLEU NIST METEOR ROUGE-L CIDEr Paper / Source Code
Shen, Sheng, et al. (2019) [7] 68.60 8.73 45.25 70.82 2.37 Pragmatically Informative Text Generation Official
Elder, Henry, et al. (2019) [8] 67.38 8.7277 45.72 71.52 2.2995 Designing a Symbolic Intermediate Representation for Neural Surface Realization  
Gehrmann, Sebastian, et al. (2018) [6] 66.2 8.60 45.7 70.4 2.34 End-to-End Content and Plan Selection for Data-to-Text Generation Official
Baseline 65.93 8.61 44.83 68.50 2.23 Baseline system provided during the challenge Official

References

[1] Albert Gatt and Emiel Krahmer. 2018. Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Int. Res. 61, 1 (January 2018), 65–170.

[2] Wiseman, Sam, Stuart M. Shieber, and Alexander M. Rush. “Challenges in Data-to-Document Generation.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.

[3] Puduppully, Ratish, Li Dong, and Mirella Lapata. “Data-to-text generation with content selection and planning.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

[4] Rebuffel, Clément, et al. “A Hierarchical Model for Data-to-Text Generation.” European Conference on Information Retrieval. Springer, Cham, 2020.

[5] Moryossef, Amit, Yoav Goldberg, and Ido Dagan. “Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[6] Gehrmann, Sebastian, et al. “End-to-End Content and Plan Selection for Data-to-Text Generation.” Proceedings of the 11th International Conference on Natural Language Generation. 2018.

[7] Shen, Sheng, et al. “Pragmatically Informative Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[8] Elder, Henry, et al. “Designing a Symbolic Intermediate Representation for Neural Surface Realization.” Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation. 2019.

[9] Kale, Mihir. “Text-to-Text Pre-Training for Data-to-Text Tasks” arXiv preprint arXiv:2005.10433 (2020).

Go back to the README