View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Keyphrase Extraction and Generation

Keyphrase extraction is the NLP task of identifying key phrases in the document, and has a wide range of applications applications such as information retrieval, question answering, text summarization etc. There are two aspects to keyphrases - some of them are directly occuring in the document, and are termed present keyphrases in the literature. Some of the keyphrases don’t occur in the document, but can still function as appropriate summaries/tags for a given document, and they are termed absent keyphrases. Traditionally, NLP research addressed extracting the present keyphrases, while the post-deep learning approaches are also considering absent keyphrases. Thus, while Keyphrase Extraction (KPE) can be termed a “sequence labeling” problem, Keyphrase Generation (KPG) is treated as a “sequence to sequence” generation problem. Another dominant approach is to treat both of them together as a generation problem in an integrated approach.

Two recent surveys summarizing all research on this topic:

  1. “A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models”. Song et.al., 2023. EACL 2023.
  2. “From statistical methods to deep learning, automatic keyphrase prediction: A survey”. Xie et.al., 2023. Information Processing and Management 60(4).

Standard Datasets and Evaluation Measures

There are several open datasets for this task, and they generally consists of text instances, followed by a list of assigned keyphrases per text. Keyphrases are either manually annotated or extracted automatically from pre-tagged web content in the training data. Keyphrases can be either present or absent in the text itself.

Commonly used Datasets

KP20K

This dataset was first described in Meng et.al., 2017 and contains the titles, abstracts, and keyphrases of 20,000 scientific articles in computer science extracted automaticallly, and it can be accessed from Huggingface hub.

Inspec

The dataset consists of 2000 English scientific abstracts from the Inspec database, with keyphrases annotated by professional indexers. The dataset is described in Hulth, 2003 and can be accessed from Huggingface hub.

Krapivin

Krapivin consists of 2000 English scientific articles (full text) from computer science domain, with keyphrases annotated by the authors, and verified by the reviewers. The dataset is described in Krapivin et.al., 2010 and can be accessed from Huggingface hub.

NUS

NUS consists of about 200 English scientific publications (full text), with keyphrases annotated by the authors, as well as an independent set of annotators. The dataset is described in Nguyen and Kan, 2007 and can be accessed from Huggingface hub.

SemEval

SemEval dataset was originally used in the SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications, and consists of 500 English open-access scientific publications from ScienceDirect. Keyphrases are annotated by a set of student volunteers followed by a second annotation by an expert annotator. It can be accessed from Huggingface hub.

Other Datasets

DUC

This dataset Wan and Xiao, 2008 consists of around 300 English news articles with their keyphrases, and is hosted on Huggingface hub.

KPTimes

KPTimes Gallina et.al., 2019 is a large dataset of 279,923 news articles from NYTimes and 10,000 articles from JPTimes, with curated keyphrase annotations by editors, and is hosted on Huggingface hub

OpenKP

OpenKP Xiong et.al., 2019 consists of approximately 150K web documents with manually annotated keyphrases, and is hosted on Huggingface hub.

Evaluation Measures

Macro Precision/Recall/F1 score are calculated for top-k matches while comparing the ground-truth keyphrases and the model output. While F1\@k where k= 5 or 10 are commonly reported, variants such as F1@/O/M are also reported. F1\@O uses the number of gold keyphrases as k, and F1\@M uses the number of predicted keyphrases as k. For “absent” keyphrases, some papers also report R\@10/50. The following tables will rank the models in terms of F1\@5, for the five most commonly reported datasets, KP20K, Inspec, Krapivin, NUS, SemEval [Most recent research reports experiments using KP20K as training data, and testing on KP20k, NUS, Semeval, Inspec and Krapivin].

Here are a few notes on results:

KP20K

Model Present-F1\@5 Absent-F1\@5 Paper / Source Code
ChatGPT (Martinez et.al., 2023) 0.232 (!) 0.044 (!) ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task -
P-AKG (Wu et.al., 2022) 0.351(!) 0.032(!) Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning -
WR-SetTrans (Xie et.al., 2022) 0.370 0.050 WR-One2Set: Towards Well-Calibrated Keyphrase Generation  
Beam+KPD-A (Chowdhury et.al., 2022) 0.363 0.067 KPDROP: Improving Absent Keyphrase Generation  
SetTrans (Ye et.al., 2021) 0.358 0.036 One2Set: Generating Diverse Keyphrases as a Set  
UniKeyphrase (Wu et.al., 2021) 0.408 (!) 0.047 (!) UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction  
ExHiRD-h (Chen et.al., 2020) 0.311 0.016 Exclusive Hierarchical Decoding for Deep Keyphrase Generation  
CorrRNN (Chen et.al., 2018) - - Keyphrase Generation with Correlation Constraints  
CopyRNN (Meng et.al., 2017) 0.333 - Deep Keyphrase Generation  

SemEval

| Model | Present-F1\@5 | Absent-F1\@5 | Paper / Source | Code | | ————— | :—–: | :—–: | ————– | —- | |ChatGPT (Martinez et.al., 2023)|-|- |ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task | - | | P-AKG (Wu et.al., 2022) | 0.329 (!)| 0.028 (!) | Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning | - | |WR-SetTrans (Xie et.al., 2022) |0.360 | 0.043 |WR-One2Set: Towards Well-Calibrated Keyphrase Generation| | Beam+KPD-A (Chowdhury et.al., 2022) | 0.343 | 0.053 | KPDROP: Improving Absent Keyphrase Generation | |SetTrans (Ye et.al., 2021) | 0.331| 0.026| One2Set: Generating Diverse Keyphrases as a Set| | UniKeyphrase (Wu et.al., 2021)| 0.416 (!) | 0.030 (!)| UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction | |ExHiRD-h (Chen et.al., 2020) |0.284 | 0.017|Exclusive Hierarchical Decoding for Deep Keyphrase Generation | |CorrRNN (Chen et.al., 2018) |0.320 | - | Keyphrase Generation with Correlation Constraints| |CopyRNN (Meng et.al., 2017) |0.293 | - | Deep Keyphrase Generation |

Inspec

Model Present-F1\@5 Absent-F1\@5 Paper / Source Code
ChatGPT (Martinez et.al., 2023) 0.352 (!) 0.049 (!) ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task -
P-AKG (Wu et.al., 2022) 0.26 (!) 0.017(!) Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning -
WR-SetTrans (Xie et.al., 2022) 0.330 0.025 WR-One2Set: Towards Well-Calibrated Keyphrase Generation  
Beam+KPD-A (Chowdhury et.al., 2022) 0.322 0.036 KPDROP: Improving Absent Keyphrase Generation  
SetTrans (Ye et.al., 2021) 0.285 0.021 One2Set: Generating Diverse Keyphrases as a Set  
UniKeyphrase (Wu et.al., 2021) 0.29 (!) 0.029 (!) UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction  
ExHiRD-h (Chen et.al., 2020) 0.253 0.011 Exclusive Hierarchical Decoding for Deep Keyphrase Generation  
CorrRNN (Chen et.al., 2018) - - Keyphrase Generation with Correlation Constraints  
CopyRNN (Meng et.al., 2017) 0.278 - Deep Keyphrase Generation  

Krapivin

| Model | Present-F1\@5 | Absent-F1\@5 | Paper / Source | Code | | ————— | :—–: | :—–: | ————– | —- | | P-AKG (Wu et.al., 2022) | - |- | Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning | - | |ChatGPT (Martinez et.al., 2023)|-|-|ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task | - | |WR-SetTrans (Xie et.al., 2022) | 0.360|0.057 |WR-One2Set: Towards Well-Calibrated Keyphrase Generation| | Beam+KPD-A (Chowdhury et.al., 2022) | 0.323 | 0.078 | KPDROP: Improving Absent Keyphrase Generation | |SetTrans (Ye et.al., 2021) | 0.326| 0.047| One2Set: Generating Diverse Keyphrases as a Set| | UniKeyphrase (Wu et.al., 2021)| - | - | UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction | |ExHiRD-h (Chen et.al., 2020) | 0.286 | 0.022 |Exclusive Hierarchical Decoding for Deep Keyphrase Generation | |CorrRNN (Chen et.al., 2018) | 0.318|- | Keyphrase Generation with Correlation Constraints| |CopyRNN (Meng et.al., 2017) |0.311 | - | Deep Keyphrase Generation |

NUS

| Model | Present-F1\@5 | Absent-F1\@5 | Paper / Source | Code | | ————— | :—–: | :—–: | ————– | —- | |ChatGPT (Martinez et.al., 2023)|-|-|ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task | - | | P-AKG (Wu et.al., 2022) | 0.412 (!)| 0.036(!) | Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning | - | |WR-SetTrans (Xie et.al., 2022) |0.428 |0.057 |WR-One2Set: Towards Well-Calibrated Keyphrase Generation| | Beam+KPD-A (Chowdhury et.al., 2022) | 0.418 | 0.079 | KPDROP: Improving Absent Keyphrase Generation | |SetTrans (Ye et.al., 2021) | 0.406| 0.042| One2Set: Generating Diverse Keyphrases as a Set| | UniKeyphrase (Wu et.al., 2021)| 0.434 (!) | 0.037 (!) | UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction | |ExHiRD-h (Chen et.al., 2020) | - | - |Exclusive Hierarchical Decoding for Deep Keyphrase Generation | |CorrRNN (Chen et.al., 2018) |0.358 |- | Keyphrase Generation with Correlation Constraints| |CopyRNN (Meng et.al., 2017) |0.334 | - | Deep Keyphrase Generation |

Go back to the README