LOD:ontology statistics

From STLab

Jump to: navigation, search

Contents

Setup

On this page we list useful statistics of ontologies (vocabularies, schemas, etc). This information about ontologies can be used to compare them and decide on which corpora to use for what.

Sketch for discovering patterns in LOD
Sketch for discovering patterns in LOD

Proposals

If you have any proposals for different statistics, or vocabularies to include, please state them here.

Statistics / measures explained

  • #triples
  • #classes
  • #props: amount of unique properties
  • #mappings: alignments with other LOD corpora. Should include total nr mappings, but preferably also nr mappings to specific corpora.
  • last update: date of the last update
  • update freq: nr of times the ontology is updated

Statistics

World knowledge

ontology #triples #classes #props #mappings last update update freq
dbpedia 1 billion 3.4 million (of which 1.5 million are classified in a consistent Ontology)  ? 4,887,000 2010-4-28 3 times per year
freebase N/A constantly
[opencyc] 56,780 73,132
  • 41039 to DBpedia
  • 20886 to Umbel
  • 11207 to WordNet 2.0
2008

People

ontology #triples #classes #props #mappings last update update freq
foaf
NY-Times people 103,496 4,979 19 6,094
  • 3027 to dbpedia
  • 3066 to freebase
 ?  ?

Places

ontology #triples #classes #props #mappings last update update freq

Lexical

ontology #triples #classes #props #mappings last update update freq
WordNet

Misc ontologies

ontology domain #triples #classes #props #mappings last update update freq
LinkedMDB movies 2,366,572 334,930 110 (linkedMDB properties or in ClioPatria)
  • freebase: 502,988
    • 197,271 are performances, eg [1] and [2]
    • 4,529 are music contributions, eg [3] and [4]
    • 45259 are film cuts, eg [5] and [6]
    • writers, eg [7] and [8]
    • etc ...
  • dbpedia: 91,062
  • yago: 60,708
  • imdb: 18,404 (using false rottentomatoes links)
2008  ?

Knowledge pattern implementations in LOD

The Music ontology (model)

Partly developed by BBC, it is widely used, in part or in full, by a vast amount of music and recording-related Linked Data Resources. Specifications available at [9], RDF code at [10] and [11].

Some elementary knowledge pattern classes can be identified as follows (mo is prefix for http://purl.org/ontology/mo/):

  • Roles of objects
    • As is most common, these are hardwired in object properties, e.g. mo:interpreter and its subproperties,and also mo:producer, mo:publisher, etc.
    • Some are available as OWL classes, e.g. mo:Listener equivalentTo mo:hasListened some mo:Performance
    • Other role relationships are embedded in event participation models (see below)
  • Event and Participation
    • Specialises the Event Ontology.
    • Participation in Event subclasses is inherently time-indexed and with a role e.g. mo:Performance mo:performer foaf:Person. See usage in DBTune for details.
  • About
    • dc:subject and foaf:isPrimaryTopicOf are available from the respective imported ontologies, but not specialised.
  • Aggregation
    • Reuses event:sub_event (formerly event:hasSubEvent) in order to define hierarchies by partitioning complex events by time, space and involvement (see DBTune usage).

DBTune

DBTune embeds several datasets on musical resources, mostly authored by BBC UK, that use the Music Ontology as a reference schema.

John Peel Sessions

Homepage: http://dbtune.org/bbc/peel/ (query interface + RDF dumps. SPARQL endpoint doesn't seem to be working.)

A DBTune RDF catalog of Radio One live performances for the John Peel Show. As this dataset is not concerned with describing musical artists per se, but only their radio performances, we do not expect topic patterns to be instantiated, but a strong take on events as described by the Music Ontology.

  • Linked Data Alignments
    • only with DBPedia. For RDF dumps, alignments are stored in separate modules that do not import the main dataset (!).
    • DBPedia alignments are owl:sameAs relations that hold for both foaf:Person and mo:MusicArtist individuals (if e.g. a MusicArtist is a single person such as Elton John). Although it is generally no good, it is not so dangerous, so long as DBPedia only contains triples that relate to the real person and we don't represent the musical artist as an ensemble. The risk would be to have a foaf:Person that is comprised of other mo:MusicArtists or foaf:Persons.
  • Event Patterns:
    • mo:MusicArtist mo:performed mo:Performance

(e.g. http://dbtune.org/bbc/peel/artist/280 ("Bratmobile", a http://purl.org/ontology/mo/MusicArtist) http://purl.org/ontology/mo/performed http://dbtune.org/bbc/peel/session/483 ("Performance 483 in Maida Vale 4", a http://purl.org/ontology/mo/Performance))

  • Role patterns:
    • for single musicians, we have participations with role:

http://dbtune.org/bbc/peel/perf_ins/2955cda0c6177f466b6e049686d225ab (Allison Wolfe's performance on vocals in the above show, a http://purl.org/ontology/mo/Performance) mo:performer http://dbtune.org/bbc/peel/artist/2955cda0c6177f466b6e049686d225ab (foaf:name="Allison Wolfe", a foaf:Person) mo:instrument = "Vocals" (note: the usage of this object property is ambiguous, e.g. uses untyped literals for Elton John's performances)

  • Topic patterns:
    • No usage of the dc:subject and foaf:isPrimaryTopicOf properties imported by the music ontology
  • Aggregation patterns
    • Achieved in Performances by means of hierarchies

(e.g. http://dbtune.org/bbc/peel/session/483 (Bratmobile Peel Session 483) event:hasSubEvent http://dbtune.org/bbc/peel/perf_ins/2955cda0c6177f466b6e049686d225ab (Allison Wolfe's vocal performance) mo:performer event:hasSubEvent http://dbtune.org/bbc/peel/perf_work/8323 (performance of "Make me Miss America" at that show) event:usesWork http://dbtune.org/bbc/peel/work/8323 (song "Make me Miss America", serql:directType mo:MusicalWork) )

Jamendo

Homepage: http://dbtune.org/jamendo/ (query interface + RDF dumps). SPARQL endpoint at http://dbtune.org/jamendo/sparql/ seems broken, use http://dbtune.org/jamendo/store/user/query instead

A DBTune RDF port of independent musical artists and their releases. Can be useful if we wish to exploit either geographical aggregation relations or the implicit semantics of genre-related tags.

  • Linked Data Alignments
    • with MusicBrainz via owl:sameAs (e.g. from mo:MusicArtist). This is generally safe. However, many link targets are resolved into Zitgist, so they mostly work as hyperlinks.
    • with Geonames via foaf:based_near. This should have no side effects.
  • Event Patterns:
    • none. Album recording events are not modelled.
  • Role patterns:
    • none. Artists are never represented as ensembles and the authorship relation with released is expressed via foaf:made.
  • Topic patterns:
    • Record tags are ported to RDF via the tag:taggedWithTag property from the Tag ontology at [12]. Tag values usually denote very specific genres (entities in Jamendo).
  • Aggregation patterns
    • Only implicit meronomy inherited from Geonames via foaf:based_near links.

MusicBrainz (plain)

Homepage: http://musicbrainz.org (with search interface). This is NOT the DBTune representation. Dataset access point: http://wiki.musicbrainz.org/RDF (no SPARQL endpoint found)

An open-content and release-centered musical knowledge base. It mainly uses internal vocabularies plus Dublin Corea and Amazon (http://www.amazon.com/gp/aws/landing.html) for release info.

  • Linked Data Alignments
    • none found. MusicBrainz is self-contained
  • Event Patterns:
    • Album releases, by anonymous instantiation of mm:ReleaseDate (with dc:date and mm:country values)
  • Role patterns:
    • Authorship-related roles are expressed via ar:Producer and ar:Composer properties.
    • Membership is time-indexed via ar:MemberofBand nodes (if subject is of mm:artistType tmm:TypeGroup) and ar:SupportingMusician and ar:InstrumentalSupportingMusician nodes. Note that artists can be of mm:artistType tmm:TypeGroup and mm:artistType mm:TypePerson
    • Instruments used are plain attribute lists wrapped into ar:attributeList nodes.
  • Topic patterns:
    • None found
  • Aggregation patterns
    • Mainly wraps RDF aggregations (eg. rdf:Bag) for artist-related lists such as album tracklists, album releases in countries etc.

Last.fm (RDFize)

Homepage: http://lastfm.rdfize.com (faceted search interface). This is NOT the DBTune representation. No SPARQL endpoint found.

A datasource providing an event-centered edge on Last.fm data. Vocabularies used include the Music Ontology, the Event Ontology by Yves Raimond, FOAF, Dublin Core, W3C Geo and Vcard.

  • Linked Data Alignments
    • TBD
  • Event Patterns:
    • Lots of interesting stuff. TBD
  • Role patterns:
    • TBD
  • Topic patterns:
    • TBD
  • Aggregation patterns
    • TBD

YAGO

endpoint cross-dataset

references

equivalence

relations

roles event and

participation

"about" relations aggregation

hierarchies

[13] (no SPARQL)
  • {Wikipedia English page URL} :describes
  •  :hasImdb {Entry code on IMDB}
  • means (weak equivalence)
  • [implicit] hasWonPrize :
    • usage of the same property for identifying the prize and its object
    • NOT contextualized in the actual prize assignment event!
    •  :Roberto_Benigni :hasWonPrize :Life_Is_Beautiful
    •  :Roberto_Benigni :hasWonPrize :Academy_Award_for_Best_Actor
  •  :show subclassOf :social_event
  •  :social_event subclassOf :event
  •  :event subclassOf :psychologicalFeature
  •  :Ghosts_of_Mars rdf:type [all of the above!]
  •  :Italy :establishedOnDate :1861-03-17
No geographical meronomy!

Rough notes

  • Freebase has separate individuals for movies and respective soundtracks, but does not appear to be linking them with RDF.
  • Freebase also has separate individuals for real and fictional characters (may come in useful for avoiding inconsistencies, but be wary of equivalences!).
Personal tools