Publications

Uncovering the Semantics of Wikipedia Categories

Abstract: The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia's categories are a rich source of taxonomic as well as non-taxonomic information. The category German science fiction writers, for example, encodes the type of its resources (Writer), as well as their nationality (German) and genre (Science Fiction). Several approaches in the literature make use of fractions of this encoded information without exploiting its full potential. In this paper, we introduce an approach for the discovery of category axioms that uses information from the category network, category instances, and their lexicalisations. With DBpedia as background knowledge, we discover 703k axioms covering 502k of Wikipedia's categories and populate the DBpedia knowledge graph with additional 4.4 M relation assertions and 3.3 M type assertions at more than 87% and 90% precision, respectively.

Cite this paper if you use the Cat2Ax approach or the CaLiGraph data set.

Code

The complete code for the extraction of CaLiGraph is available on GitHub.

Data

The complete dataset is hosted on Zenodo. All files are gzipped and in N-Triples format. The data is published under the Creative Commons Attribution 4.0 International Public License.
The complete dataset is also available on the DBpedia Databus. Additionally, a version of DBpedia enriched with CaLiGraph is provided as collection.

caligraph-metadata.nt.bz2

Metadata about the dataset which is described using void vocabulary.

caligraph-ontology.nt.bz2

Class definitions, property definitions, restrictions, and labels of the CaLiGraph ontology.

caligraph-ontology_dbpedia-mapping.nt.bz2

Mapping of classes and properties to the DBpedia ontology.

caligraph-ontology_provenance.nt.bz2

Provenance information about classes (i.e. which Wikipedia category or list page has been used to create this class).

caligraph-instances_types.nt.bz2

Definition of instances and (non-transitive) types.

caligraph-instances_transitive-types.nt.bz2

Transitive types for instances (can also be induced by a reasoner).

caligraph-instances_labels.nt.bz2

Labels for instances.

caligraph-instances_relations.nt.bz2

Relations between instances derived from the class restrictions of the ontology (can also be induced by a reasoner).

caligraph-instances_dbpedia-mapping.nt.bz2

Mapping of instances to respective DBpedia instances.

caligraph-instances_provenance.nt.bz2

Provenance information about instances (e.g. if the instance has been extracted from a Wikipedia list page).

dbpedia_caligraph-instances.nt.bz2

Additional instances of CaLiGraph that are not in DBpedia.
This file is not part of CaLiGraph but should rather be used as an extension to DBpedia.

dbpedia_caligraph-types.nt.bz2

Additional types of CaLiGraph that are not in DBpedia.
This file is not part of CaLiGraph but should rather be used as an extension to DBpedia.

dbpedia_caligraph-relations.nt.bz2

Additional relations of CaLiGraph that are not in DBpedia.
This file is not part of CaLiGraph but should rather be used as an extension to DBpedia.