Data-to-Text Generation (D2T NLG) can be described as Natural Language Generation from structured input. Unlike other NLG tasks such as, Machine Translation or Question Answering (also referred as Text-to-Text Generation or T2T NLG) where requirement is to generate textual output using some unstructured textual input, in D2T NLG the requirement is to generate textual output from the input provided in a structured format such as: tables; or knowledge graphs; or JSONs .
The dataset consists of articles summarizing NBA basketball games, paired with their corresponding box- and line-score tables. It is professionally written, medium length game summaries targeted at fantasy basketball fans. The writing is colloquial, but structured, and targets an audience primarily interested in game statistics .
The performance is evaluated on two different automated metrics: first, BLEU score; and second, a family of Extractive Evaluations (EE). EE contains three different submetrics evaluating three different aspects of the generation:
Content Selection (CS): precision (P%) and recall (R%) of unique relations extracted from generated text that are also extracted from golden text. This measures how well the generated document matches the gold document in terms of selecting which records to generate.
Relation Generation (RG): precision (P%) and number of unique relations (#) extracted from generated text that also appear in structured input provided. This measures how well the system is able to generate text containing factual (i.e., correct) records.
Content Ordering (CO): normalized Damerau-Levenshtein Distance (DLD%) between the sequences of records extracted from golden text and that extracted from generated text. This measures how well the system orders the records it chooses to discuss.
|Model||BLEU||CS (P% & R%)||RG (P% & #)||CO (DLD%)||Paper / Source||Code|
|Rebuffel, Clément, et al. (2020)||17.50||39.47 & 51.64||89.46 & 21.17||18.90||A Hierarchical Model for Data-to-Text Generation||Official|
|Puduppully et al. (2019)||16.50||34.18 & 51.22||87.47 & 34.28||18.58||Data-to-text generation with content selection and planning||Official|
|Wiseman et al. (2017)||14.49||22.17 & 27.16||71.82 & 12.82||8.68||Challenges in Data-to-Document Generation||Official|
The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For example, given the three DBpedia triples (as shown in [a]), the aim is to generate a text (as shown in [b]):
[a]. (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot)
[b]. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot.
The performance is evaluated on the basis of BLEU, METEOR and TER scores. The data from WebNLG Challenge 2017 can be downloaded here.
|Model||BLEU||METEOR||TER||Paper / Source||Code|
|Kale, Mihir. (2020) ||57.1||0.44||Text-to-Text Pre-Training for Data-to-Text Tasks|
|Moryossef et al. (2019) ||47.4||0.391||0.631||Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation||Official|
|Baseline||33.24||0.235436||0.613080||Baseline system provided during the challenge||Official|
P.S.: The test dataset of WebNLG consists of total 15 categories, out of which 10 (seen) catgories are used for training while 5 (unseen) are not. The results reported here are those obtained on overall test data, i.e., all 15 categories.
The dataset was first provided for the E2E Challenge in 2017. It is a crowd-sourced data set of 50k instances in the restaurant domain.Each instance consist of a dialogue act-based meaning representations (MR) and up to 5 references in natural language (NL). For example:
MR: name[The Eagle], eatType[coffee shop], food[French], priceRange[moderate], customerRating[3/5], area[riverside], kidsFriendly[yes], near[Burger King]
NL: “The three star coffee shop, The Eagle, gives families a mid-priced dining experience featuring a variety of wines and cheeses. Find The Eagle near Burger King.”
The performance is evaluated using BLEU, NIST, METEOR, ROUGE-L, CIDEr scores. The data from E2E Challenge 2017 can be downloaded here.
|Model||BLEU||NIST||METEOR||ROUGE-L||CIDEr||Paper / Source||Code|
|Shen, Sheng, et al. (2019) ||68.60||8.73||45.25||70.82||2.37||Pragmatically Informative Text Generation||Official|
|Elder, Henry, et al. (2019) ||67.38||8.7277||45.72||71.52||2.2995||Designing a Symbolic Intermediate Representation for Neural Surface Realization|
|Gehrmann, Sebastian, et al. (2018) ||66.2||8.60||45.7||70.4||2.34||End-to-End Content and Plan Selection for Data-to-Text Generation||Official|
|Baseline||65.93||8.61||44.83||68.50||2.23||Baseline system provided during the challenge||Official|
 Albert Gatt and Emiel Krahmer. 2018. Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Int. Res. 61, 1 (January 2018), 65–170.
 Puduppully, Ratish, Li Dong, and Mirella Lapata. “Data-to-text generation with content selection and planning.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
 Moryossef, Amit, Yoav Goldberg, and Ido Dagan. “Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
 Gehrmann, Sebastian, et al. “End-to-End Content and Plan Selection for Data-to-Text Generation.” Proceedings of the 11th International Conference on Natural Language Generation. 2018.
 Shen, Sheng, et al. “Pragmatically Informative Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
 Elder, Henry, et al. “Designing a Symbolic Intermediate Representation for Neural Surface Realization.” Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation. 2019.