CaLiGraph is a knowledge graph that uses the encyclopaedic structure of Wikipedia to derive its fine-grained type ontology and extract huge amounts of novel entities. Additionally, it provides restrictions for many of the derived types to further refine the contained entities.

CaLiGraph Ontology

The type ontology of CaLiGraph is primarily based on Wikipedia categories and list pages. Both are used in Wikipedia to group similar entities together, e.g. the category Songs or the list page List of signature songs. We use the existing sub-category relationships in Wikipedia to derive a connected taxonomy. There are, however, some categories that need to be removed as they are rather grouping by topic than by type (e.g. the category London); and there are sub-category relationships that need to be removed as they are not representing taxonomic relationships (e.g. Song awards being a sub-category of Songs). Subsequently, we use DBpedia as a upper-level taxonomy and extend the rather general types in DBpedia (e.g. Artist) with more specific types (e.g. Cartoonist, Women cartoonist, Spanish women cartoonist, and so on).

For more details, refer to our paper Entity Extraction from Wikipedia List Pages

CaLiGraph Restrictions

As the categories we derive our types from are taken from Wikipedia, they are - due to the strict editing guidelines in Wikipedia - named very consistently. With our Cat2Ax approach, we exploit this fact and are able to generate restrictions for a large number of categories in Wikipedia. These restrictions define properties for all entities of a certain type. For example, the type Spanish cartoonist has the restriction nationality=Spain (note that we make restrictions explicitly browsable in CaLiGraph - when visiting the page you can view all types with the given restriction, like Spanish printer or Spanish navigator). All entities that have a type with such a restriction automatically have the respective fact (i.e., the Spanish nationality) attached as well.

For more details, refer to our paper Uncovering the Semantics of Wikipedia Categories

CaLiGraph Entities

CaLiGraph discovers novel entities in tables and enumerations all over Wikipedia. The focus of the entity discovery is here on subject entities, i.e. entities that are main subjects of the respective table or enumeration (e.g. in the List of signature songs, subject entities would be the songs and not the artists). We discover such subject entities in tables and enumerations all over Wikipedia - not only on list pages but on arbitrary pages (for example, we would discover new songs of Gilby Clarke in the Discography section of his Wikipedia article). In the current version 2 of CaLiGraph, we discover all these entities but do not properly disambiguate these discovered entities yet (we are currently investigating it as it is a non-trivial problem; a disambiguated version of CaLiGraph will be available with the release 3). However, we make sure that the URIs of discovered entities do not collide by prefixing them with the page they are discovered in (if the link would otherwise be ambiguous).

For more details, refer to our paper Information Extraction from Co-Occurring Similar Entities