View on GitHub

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Relation Prediction

Task

Relation Prediction is the task of recognizing a named relation between two named semantic entities. The common test setup is to hide one entity from the relation triplet, asking the system to recover it based on the other entity and the relation type.

For example, given the triple <Roman Jakobson, born-in-city, ?>, the system is required to replace the question mark with Moscow.

Relation Prediction datasets are typically extracted from two types of resources:

Knowledge Bases: KBs such as FreeBase contain hundreds or thousands of relation types pertaining to world-knowledge obtained automatically or semi-automatically from various resources on millions of entities. These relations include born-in, nationality, is-in (for geographical entities), part-of (for organizations, among others), and more.
Semantic Graphs: SGs such as WordNet are often manually-curated resources of semantic concepts, restricted to more “linguistic” relations compared to free real-world knowledge. The most common semantic relation is hypernym, also known as the is-a relation (example: <cat, hypernym, feline>).

Evaluation

Evaluation in Relation Prediction hinges on a list of ranked candidates given by the system to the test instance. The metrics below are derived from the location of correct candidate(s) in that list.

A common action performed before evaluation on a given list is filtering, where the list is cleaned of entities whose corresponding triples exist in the knowledge graph. Unless specified otherwise, results here are from filtered lists.

Metrics

Mean Reciprocal Rank (MRR):

The mean of all reciprocal ranks for the true candidates over the test set (1/rank).

Hits at k (H@k):

The rate of correct entities appearing in the top k entries for each instance list. This number may exceed 1.00 if the average k-truncated list contains more than one true entity.

Datasets

Freebase-15K-237 (FB15K-237)

The FB15K dataset was introduced in Bordes et al., 2013. It is a subset of Freebase which contains about 14,951 entities with 1,345 different relations. This dataset was found to suffer from major test leakage through inverse relations and a large number of test triples can be obtained simply by inverting triples in the training set initially by Toutanova et al.. To create a dataset without this property, Toutanova et al. introduced FB15k-237 – a subset of FB15k where inverse relations are removed.

WordNet-18-RR (WN18RR)

The WN18 dataset was also introduced in Bordes et al., 2013. It included the full 18 relations scraped from WordNet for roughly 41,000 synsets. Similar to FB15K, This dataset was found to suffer from test leakage by Dettmers et al. (2018) introduced the WN18RR.

As a way to overcome this problem, Dettmers et al. (2018) introduced the WN18RR dataset, derived from WN18, which features 11 relations only, no pair of which is reciprocal (but still include four internally-symmetric relations like verb_group, allowing the rule-based system to reach 35 on all three metrics).

Experimental Results

WN18RR

The test set is composed of triplets, each used to create two test instances, one for each entity to be predicted. Since each instance is associated with a single true entity, the maximum value for all metrics is 1.00.

Model	H@10	H@1	MRR	Paper / Source	Code
Max-Margin Markov Graph Models (Pinter & Eisenstein, 2018)	59.02	45.37	49.83	Predicting Semantic Relations using Global Graph Properties	Official
Concepts of Nearest Neighbors (Ferré, 2020)	51.9	44.4	46.9	Application of Concepts of Neighbors to Knowledge Graph Completion	Official
KBAT(Deepak et al., 2019)	58.1	36.1	44	Learning Attention Based Embeddings for Relation Prediction	Official
TransE (reimplementation by Pinter & Eisenstein, 2018)	55.55	42.26	46.59	Translating Embeddings for Modeling Multi-relational Data.	OpenKE
ConvKB (Nguyen et al., 2018)	52.50	-	24.80	A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network	Official
ConvE (v6; Dettmers et al., 2018)	52.00	40.00	43.00	Convolutional 2D Knowledge Graph Embeddings	Official
ComplEx (Trouillon et al., 2016)	51.00	41.00	44.00	Complex Embeddings for Simple Link Prediction	Official
DistMult (reimplementation by Dettmers et al., 2018)	49.00	40.00	43.00	Embedding Entities and Relations for Learning and Inference in Knowledge Bases.	Link

FB15K-237

Model	H@10	H@1	MRR	Paper / Source	Code
KBAT (Deepak et al., 2019)	62.6	46	51.8	Learning Attention Based Embeddings for Relation Prediction	Official
Concepts of Nearest Neighbors (Ferré, 2020)	44.6	22.2	29.6	Application of Concepts of Neighbors to Knowledge Graph Completion	Official
TransE (reimplementation by Han et al., 2018)	47.09	19.87	29.04	Translating Embeddings for Modeling Multi-relational Data.	OpenKE
TransH (reimplementation by Han et al., 2018)	41.32	5.79	17.66	Knowledge Graph Embedding by Translating on Hyperplanes.	OpenKE
TransR (reimplementation by Han et al., 2018)	40.67	16.35	24.44	Learning Entity and Relation Embeddings for Knowledge Graph Completion.	OpenKE
TransD (reimplementation by Han et al., 2018)	46.05	14.83	25.27	Knowledge Graph Embedding via Dynamic Mapping Matrix.	OpenKE
ConvKB (Nguyen et al., 2018)	51.70	-	39.60	A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network	Official
ConvE (v6; Dettmers et al., 2018)	50.10	23.70	32.50	Convolutional 2D Knowledge Graph Embeddings	Official
ComplEx (reimplementation by Dettmers et al., 2018)	42.80	15.80	24.70	Complex Embeddings for Simple Link Prediction	Official
DistMult (reimplementation by Dettmers et al., 2018)	41.90	15.50	24.10	Embedding Entities and Relations for Learning and Inference in Knowledge Bases.	Link

Resources

OpenKE is an open toolkit for relational learning which provides a standard training and testing framework. Currently, the implemented models in OpenKE include TransE, TransH, TransR, TransD, RESCAL, DistMult, ComplEx and HolE.

KRLPapers is a must-read paper list for relational learning.

datasets-knowledge-embedding is a collection of common datasets used in knowledge embedding.

Back to README