Paraphrase Generation
Paraphrase generation is the task of generating an output sentence that preserves the meaning of the input sentence but contains variations in word choice and grammar. See the example given below:
Input | Output |
---|---|
The need for investors to earn a commercial return may put upward pressure on prices | The need for profit is likely to push up prices |
PRANMT-50M
PARANMT-50M dataset is a dataset for training paraphrastic sentence embeddings. It consists of more than 50 million English-English sentential paraphrase pairs.
Model | BLEU | Paper / Source | Code |
---|---|---|---|
Trigram (baseline) | 47.4 | Wieting and Gimpel, 2018 | Unavailable |
Unsupervised BART w/ Dynamic Blocking | 20.9 | Niu et al., 2020 | Unavailable |
QQP-Pos
The QQP-POS dataset is a paraphrase generation dataset with 400K source-target pairs. Each pair is labelled as negative if two questions are not duplicates and positive otherwise.
Model | BLEU | Paper / Source | Code |
---|---|---|---|
Unsupervised BART w/ Dynamic Blocking | 26.76 | Niu et al., 2020 | Unavailable |
ParafraGPT-UC | 35.9 | Bui et al., 2020 | Code |