View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Data-to-Text Generation

Data-to-Text Generation (D2T NLG) can be described as Natural Language Generation from structured input. Unlike other NLG tasks such as, Machine Translation or Question Answering (also referred as Text-to-Text Generation or T2T NLG) where requirement is to generate textual output using some unstructured textual input, in D2T NLG the requirement is to generate textual output from the input provided in a structured format such as: tables; or knowledge graphs; or JSONs ^[1].

RotoWire

The dataset consists of articles summarizing NBA basketball games, paired with their corresponding box- and line-score tables. It is professionally written, medium length game summaries targeted at fantasy basketball fans. The writing is colloquial, but structured, and targets an audience primarily interested in game statistics ^[2].

The performance is evaluated on two different automated metrics: first, BLEU score; and second, a family of Extractive Evaluations (EE). EE contains three different submetrics evaluating three different aspects of the generation:

Content Selection (CS): precision (P%) and recall (R%) of unique relations extracted from generated text that are also extracted from golden text. This measures how well the generated document matches the gold document in terms of selecting which records to generate.
Relation Generation (RG): precision (P%) and number of unique relations (#) extracted from generated text that also appear in structured input provided. This measures how well the system is able to generate text containing factual (i.e., correct) records.
Content Ordering (CO): normalized Damerau-Levenshtein Distance (DLD%) between the sequences of records extracted from golden text and that extracted from generated text. This measures how well the system orders the records it chooses to discuss.

Model	BLEU	CS (P% & R%)	RG (P% & #)	CO (DLD%)	Paper / Source	Code
Rebuffel, Clément, et al. (2020)^[4]	17.50	39.47 & 51.64	89.46 & 21.17	18.90	A Hierarchical Model for Data-to-Text Generation	Official
Puduppully et al. (2019)^[3]	16.50	34.18 & 51.22	87.47 & 34.28	18.58	Data-to-text generation with content selection and planning	Official
Puduppully and Lapata (2021)^[10]	15.46	34.1 & 57.8	97.6 & 42.1	17.7	Data-to-text generation with macro planning	Official
Wiseman et al. (2017)^[2]	14.49	22.17 & 27.16	71.82 & 12.82	8.68	Challenges in Data-to-Document Generation	Official

WebNLG

The WebNLG challenge consists in mapping data to text. The training data consists of Data/Text pairs where the data is a set of triples extracted from DBpedia and the text is a verbalisation of these triples. For example, given the three DBpedia triples (as shown in [a]), the aim is to generate a text (as shown in [b]):

[a]. (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot)
[b]. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot.

The performance is evaluated on the basis of BLEU, METEOR and TER scores. The data from WebNLG Challenge 2017 can be downloaded here.

Model	BLEU	METEOR	TER	Paper / Source	Code
Kale, Mihir. (2020) ^[9]	57.1	0.44		Text-to-Text Pre-Training for Data-to-Text Tasks
Moryossef et al. (2019) ^[5]	47.4	0.391	0.631	Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation	Official
Baseline	33.24	0.235436	0.613080	Baseline system provided during the challenge	Official

P.S.: The test dataset of WebNLG consists of total 15 categories, out of which 10 (seen) catgories are used for training while 5 (unseen) are not. The results reported here are those obtained on overall test data, i.e., all 15 categories.

Meaning Representations

The dataset was first provided for the E2E Challenge in 2017. It is a crowd-sourced data set of 50k instances in the restaurant domain.Each instance consist of a dialogue act-based meaning representations (MR) and up to 5 references in natural language (NL). For example:

MR: name[The Eagle], eatType[coffee shop], food[French], priceRange[moderate], customerRating[3/5], area[riverside], kidsFriendly[yes], near[Burger King]
NL: “The three star coffee shop, The Eagle, gives families a mid-priced dining experience featuring a variety of wines and cheeses. Find The Eagle near Burger King.”

The performance is evaluated using BLEU, NIST, METEOR, ROUGE-L, CIDEr scores. The data from E2E Challenge 2017 can be downloaded here.

Model	BLEU	NIST	METEOR	ROUGE-L	CIDEr	Paper / Source	Code
Shen, Sheng, et al. (2019) ^[7]	68.60	8.73	45.25	70.82	2.37	Pragmatically Informative Text Generation	Official
Elder, Henry, et al. (2019) ^[8]	67.38	8.7277	45.72	71.52	2.2995	Designing a Symbolic Intermediate Representation for Neural Surface Realization
Gehrmann, Sebastian, et al. (2018) ^[6]	66.2	8.60	45.7	70.4	2.34	End-to-End Content and Plan Selection for Data-to-Text Generation	Official
Baseline	65.93	8.61	44.83	68.50	2.23	Baseline system provided during the challenge	Official

References

[1] Albert Gatt and Emiel Krahmer. 2018. Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Int. Res. 61, 1 (January 2018), 65–170.

[2] Wiseman, Sam, Stuart M. Shieber, and Alexander M. Rush. “Challenges in Data-to-Document Generation.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.

[3] Puduppully, Ratish, Li Dong, and Mirella Lapata. “Data-to-text generation with content selection and planning.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

[4] Rebuffel, Clément, et al. “A Hierarchical Model for Data-to-Text Generation.” European Conference on Information Retrieval. Springer, Cham, 2020.

[5] Moryossef, Amit, Yoav Goldberg, and Ido Dagan. “Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[6] Gehrmann, Sebastian, et al. “End-to-End Content and Plan Selection for Data-to-Text Generation.” Proceedings of the 11th International Conference on Natural Language Generation. 2018.

[7] Shen, Sheng, et al. “Pragmatically Informative Text Generation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

[8] Elder, Henry, et al. “Designing a Symbolic Intermediate Representation for Neural Surface Realization.” Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation. 2019.

[9] Kale, Mihir. “Text-to-Text Pre-Training for Data-to-Text Tasks” arXiv preprint arXiv:2005.10433 (2020).

[10] Puduppully, Ratish and Mirella Lapata. “Data-to-text generation with macro planning.” Transactions of the Association for Computational Linguistics 2021; 9 510–527.

Go back to the README