WikipediaOntology

From STLab

(Redirected from WikipediaOntology/)
Jump to: navigation, search

Contents

Short cuts

Automatic typing of DBpedia entities

Tipalo is a tool for automatic typing of wikipedia page referred entities. The algorithm, that it implements, identifies the most appropriate type for an entity starting from its definition, provided by Wikipedia. An entity's definition is extracted from its Wikipedia abstract, and its type is identified by means of a set of graph-pattern-based heuristics and aligned to two lists of top concepts: Wordnet supersenses and a subset of DOLCE Ultra Lite classes. The algorithm has been tuned against a gold standard that has been built online by a group of selected users, and evaluated by means of a user study. Tipalo builds on top of FRED, a natural language processing tool that transform natural language to RDF/OWL. Informally, Tipalo extracts an entity definition, passes it to FRED, and infers such entity type based on the graph pattern that can be recognized in FRED output.

The tables below describe (respectively): (i) the graph patterns that Tipalo can recognize and their associated inferences, and (ii) the statistics related to frequency of detected graph patterns in the analyzed sample of Wikipedia entities.

Graph patterns

We have defined 11 graph patterns and have implemented a set of heuristics based on them for assigning types to an entity. Such graph patterns are described below in their priority order. The policy of heuristics execution is the following: only one heuristics is executed and it is the one associated with the first graph pattern that is satisfied according to the priority order. The priority order follows the sequence illustrated in the table. Assuming that we want to assign types to an entity e:

ID instance/class graph pattern inferred axiom
gp1 instance e owl:sameAs x && x domain:aliasOf y && y owl:sameAs z && z rdf:type C e rdf:type C
gp2 instance e rdf:type x && x owl:sameAs y && y domain:aliasOf z && w owl:sameAs z && w rdf:type C e rdf:type C
gp3 instance e owl:sameAs x && x [r] y && y rdf:type z e rdf:type z
gp4 instance e owl:sameAs x && x rdf:type C e rdf:type C
gp5 instance e dul:associatedWith x && x rdf:type C e rdf:type C
gp6 instance (e owl:sameAs x && x [anyP] y && y rdf:type C) || (e [anyP] x && x rdf:type C) e rdf:type C
gp7 class x rdf:type e && x owl:sameAs y && y [r] z && z rdf:type C e rdfs:subClassOf C
gp8 class x rdf:type e && x owl:sameAs y && y rdf:type C e rdfs:subClassOf C
gp9 class x rdf:type e && e dul:associatedWith y && y rdf:type C e rdfs:subClassOf C
gp10 class (x rdf:type e && x owl:sameAs y && y anyP z && z rdf:type C) || (x rdf:type e && y anyP x && y rdf:type C) e rdfs:subClassOf C

Legenda

  • [r] ∈ R = {wt:speciesOf, wt:nameOf, wt:kindOf, wt:varietyOf, wt:typeOf, wt:qtyOf, wt:genreOf, wt:seriesOf})
  • [anyP] ∈ {*} - R
  • _:bn -> blank node

Additional graph patterns and associated heuristics (gp3a is executed before gp3 and after gp4, gp7a is executed after gp7 and before gp8), gp3a and gp7a cause a modification of gp6 and gp10 as follows:

ID instance/class graph pattern inferred axiom
gp3a instance x rdf:type e && x owl:sameAs y && y [p] z && z rdf:type C e rdf:type _:bn && _:bn rdf:type owl:Restriction && _:bn owl:onProperty [p] _:bn owl:someValuesFrom C
gp6 instance e [anyP] x && x rdf:type C e rdf:type C
gp7a class x rdf:type e && x owl:sameAs y && y [p] z && z rdf:type C e rdfs:subClassOf _:bn && _:bn rdf:type owl:Restriction && _:bn owl:onProperty [p] _:bn owl:someValuesFrom C
gp10 class x rdf:type e && y anyP x && y rdf:type C e rdfs:subClassOf C

Legenda

  • [p] ∈ P = {wt:partOf, wt:portionOf, wt:segmentOf, wt:componentOf, wt:sectionOf, wt:divisionOf, wt:subdivisionOf, wt:constituentOf, wt:pieceOf}
  • [anyP] ∈ {*} - R || P

Statistics on detected graph patterns

We have evaluated our algorithm on a sample of 627 wikipedia entities, which are originally distributed into 424 (67.62%) typed with a YAGO class, 97 (15.47%) with a DBPO type, and 189 (30%) having no type at all. On such sample, the frequency of detected graph patterns are reported in table below.

Graph pattern Frequency (%)
gp1 0
gp2 0.15
gp3 3.98
gp3a 0.31
gp4 79.34
gp5 0
gp6 0
gp7 1.11
gp7a 0.15
gp8 11.46
gp9 0
gp10 3.5

Tools and resources for evalution

Golden standard-based evaluation

We have built a golden standard of Wikipedia typed entities for tuning and evaluating our algorithm. The golden standard has been built collaboratively by using a web-based application that manages argumentation for reaching agreement among users. The tool is available on line, and the golden standard is represented in CSV form.

You can watch a demonstrating video of how the tool works.

Given a DBpedia entity associated with its definition extracted from Wikipedia e.g.,:

  • Wind instrument: A wind instrument is a musical instrument that contains some type of resonator , in which a column of air is set into vibration by the player blowing into a mouthpiece set at the end of the resonator.

users were asked to (i) indicate in a text field the type of that entity as it was understandable from the definition, and to (ii) select from two lists the most appropriate types for such entity. The following two tables illustrate the two lists of types available to users (Wordnet Supersense and foundational ontology classes, respectively):

Wordnet supersense Gloss Examples
Act Nouns denoting intentional acts or actions Crying, predation, visit, shopping, etc.
Animal Nouns denoting animals Dog, cat, snake, etc.
Body part     Nouns denoting body parts Gingiva, jaw, tissue, arm, etc.
Characteristic Nouns denoting attributes of people and objects (except feelings and cognitive capabilities) Character, edibility, flexibility, alkalinity, identity, narcissistic personality, etc.
Cognitive objects Nouns denoting cognitive processes, content, and goals Mind, common sense, super-ego, know-how, urge, reason, logorrea, incentive, etc.
Communication entity     Nouns denoting communicative processes and contents Message, dissemination, disagreement, preyer, film, press, language, etc.
Economic entity Nouns denoting possession and transfer of possession Credit card, loan, insurance, debit, funds, etc.
Feeling     Nouns denoting feelings and emotions Passion, apathy, desire, nostalgia, etc.
Food     nouns denoting foods and drinks Cheese, milk, crab, mango, onion, etc.
Group or organization Nouns denoting groupings of people or objects Array, kingdom, biological group, genotype, community, people, company, etc.
Location     Nouns denoting spatial position Yucatan, Costa Rica, latitude, etc.
Natural Event     Nouns denoting natural events, processes, and phenomena Avalanche, flash, storm, superconductivity, crystallization, absorption, aging, alluvion,etc.
Object     Nouns denoting man-made as well as natural objects Accommodation, acoustic guitar, nail polish, raincoat, nebula, quark, stalagmite, universe, etc.
Person     Nouns denoting people Cezanne, Cassandra, biologist, foreigner, etc.
Plant     Nouns denoting plants Betula, blueberry, willow, etc.
Quantity     Nouns denoting quantities and units of measure Nautical mile, british pound, quotient, etc.
Relation     Nouns denoting relations between people or things or ideas Social relation, legal relation, trigonometric function, causality, motherhood, etc.
Shape     Nouns denoting two and three dimensional shapes Cartesian plane, angle, circle, curve, etc.
State     Nouns denoting stable states of affairs Ornamentation, health, friendship, air pollution, etc.
Substance     Nouns denoting substances Asphalt, chemical, fuel, incense, lipid, etc.
Time     Nouns denoting time and temporal relations Julian calendar, autumnal equinox, 1960s, Ramadan, etc.


Ontology class Label Gloss Examples
dul:Abstract Abstract Anything that cannot be located in space-time. Vectors, sets, fractals, equations, etc.
d0:Activity Action, activity or task Any action or task planned or executed by an agent intentionally causing and participating in it. Swimming, shopping, knowledge sharing, etc.
dul:Amount Amount, quantity Any quantity, independently from how it is measured, computed, etc. kelvin, angstrom, quarter mile, silver dollar, deadline, etc.
d0:Characteristic Quality, feature, attribute An aspect or quality of a thing. radial symmetry, poker face, alkalinity, attractiveness, darkness, etc.
dul:Collection Collection or social group A container or group of things (or agents) that share one or more common properties. coin collection, checkout line, public library, Milky Way, etc.
d0:CognitiveEntity Cognitive entity Attitudes, cognitive abilities, ideologies, psychological phenomena, mind, etc. discernment, homophobia, precognition, etc.
dul:Description Conceptualization, description, context A descriptive context that creates a relational view on a set of data or observations. hypothesis, danger, Avogadro's law, string theory, utopia, etc.
d0:Event Any natural event Any natural event, independently of its possible causes. avalanche, earthquake, brainwave, bonfire, etc.
dul:Goal Goal, aim, achievement The description of a situation that is desired by an agent. destination, purpose, intention
dul:InformationEntity Information entity, creative work, knowledge A piece of information, be it concretely realized or not: linguistic expressions, works of art, knowledge objects. data, string, message, novel, song, etc.
d0:Location Place or space A location, in a very generic sense e.g. geo-political entities, or physical object that are inherently located. Oslo, Australia, Inner Mongolia, resort area, intergalactic space, tundra, tunnel, etc.
dul:Organism Organism, animal, plant A physical object with biological characteristics, typically able to self-reproduce. Japanese banana, fox, fungus, etc.
dul:Organization Organization An internally structured, conventionally created social entity such as enterprises, bands, political parties, etc. mathematics department, headquarters, yakuza, The Beatles, etc.
dul:Person Person Persons in commonsense intuition. John Doe, Aristotle, Armenian, house guest, etc.
dul:Personification Fictional or imaginary agent A social entity with agentive features, invented or conceived through a cultural process. holy grail, deus ex machina, God, magic wands, etc.
dul:PhysicalObject Physical object Any object that has a proper space region, and an associated mass: natural bodies, artifacts, substances. Kleenex, beard, building, etc.
dul:Process Natural or social process Any natural process, independently of its possible causes. absorption, acidification, chemical process, condensation, etc.
dul:Relation Social, logical, or other relations Any social, logical, or quantitative relation (usually quite elementary). part, identity, homonymy, causality, reciprocality, etc.
dul:Role Role A concept that classifies some entity: social positions, roles, statuses. soldier, eminence, legal status, etc.
dul:Situation Case, condition, circumstance, state, situation A unified view on a set of entities, e.g. physical or social facts or conditions, configurations, etc. breaking point, circulatory failure, start topology, inflammation, alienation, etc.
d0:System System Physical, social, political systems. viticulture, non-linear system, democracy, water system, etc.
dul:TimeInterval Time interval A time span. January, Friday, 2011, Modern era, etc.
d0:Topic Area of knowledge Any area, discipline, subject of knowledge. algebra, avionics, ballet, theology, engineering, etc.

ISWC reviewers can access the tool by using username iswc2012.reviewer and password reviewer. We suggest reviewers to use a IP tunneling service such as Tor if they are concerned about preserving anonymity.

The golden standard tool manages argumentation among users in order to support discussion for reaching agreement. 10 users with expertise in ontology design have participated in this task so far, and have reached agreement on 100 entities (of which 29 have no type in DBpedia).

Evaluation: user study

The tool used for assessing the quality of Tipalo results is available on line and can be accessed by reviewers with the account given above. It is possible to download the data resulting from the evaluation in CVS format.

A demonstrating video is available online as well. The evaluation has been conducted on a sample of 627 DBpedia resources 67.62% of them have a YAGO type in DBpedia, 15.47% have a DBPO type, and 30% have no type.

This tool guides users through three evaluation steps:

  • validating the correctness of the types assigned to an entity;
  • validating the soundness of the induced taxonomy of types for an entity;
  • validating the correctness of the meaning of individual types (i.e. word sense disambiguation task).

RDF datasets download

The following resources have been produced

Personal tools