WikipediaOntology
From STLab
Contents |
Short cuts
- Graph patterns and their effect
- Evaluation:
- Resources (datasets and ontologies)
Automatic typing of DBpedia entities
Tipalo is a tool for automatic typing of wikipedia page referred entities. The algorithm, that it implements, identifies the most appropriate type for an entity starting from its definition, provided by Wikipedia. An entity's definition is extracted from its Wikipedia abstract, and its type is identified by means of a set of graph-pattern-based heuristics and aligned to two lists of top concepts: Wordnet supersenses and a subset of DOLCE Ultra Lite classes. The algorithm has been tuned against a gold standard that has been built online by a group of selected users, and evaluated by means of a user study. Tipalo builds on top of FRED, a natural language processing tool that transform natural language to RDF/OWL. Informally, Tipalo extracts an entity definition, passes it to FRED, and infers such entity type based on the graph pattern that can be recognized in FRED output.
The tables below describe (respectively): (i) the graph patterns that Tipalo can recognize and their associated inferences, and (ii) the statistics related to frequency of detected graph patterns in the analyzed sample of Wikipedia entities.
Graph patterns
We have defined 11 graph patterns and have implemented a set of heuristics based on them for assigning types to an entity. Such graph patterns are described below in their priority order. The policy of heuristics execution is the following: only one heuristics is executed and it is the one associated with the first graph pattern that is satisfied according to the priority order. The priority order follows the sequence illustrated in the table. Assuming that we want to assign types to an entity e:
ID | instance/class | graph pattern | inferred axiom |
---|---|---|---|
gp1 | instance | e owl:sameAs x && x domain:aliasOf y && y owl:sameAs z && z rdf:type C | e rdf:type C |
gp2 | instance | e rdf:type x && x owl:sameAs y && y domain:aliasOf z && w owl:sameAs z && w rdf:type C | e rdf:type C |
gp3 | instance | e owl:sameAs x && x [r] y && y rdf:type z | e rdf:type z |
gp4 | instance | e owl:sameAs x && x rdf:type C | e rdf:type C |
gp5 | instance | e dul:associatedWith x && x rdf:type C | e rdf:type C |
gp6 | instance | (e owl:sameAs x && x [anyP] y && y rdf:type C) || (e [anyP] x && x rdf:type C) | e rdf:type C |
gp7 | class | x rdf:type e && x owl:sameAs y && y [r] z && z rdf:type C | e rdfs:subClassOf C |
gp8 | class | x rdf:type e && x owl:sameAs y && y rdf:type C | e rdfs:subClassOf C |
gp9 | class | x rdf:type e && e dul:associatedWith y && y rdf:type C | e rdfs:subClassOf C |
gp10 | class | (x rdf:type e && x owl:sameAs y && y anyP z && z rdf:type C) || (x rdf:type e && y anyP x && y rdf:type C) | e rdfs:subClassOf C |
Legenda
- [r] ∈ R = {wt:speciesOf, wt:nameOf, wt:kindOf, wt:varietyOf, wt:typeOf, wt:qtyOf, wt:genreOf, wt:seriesOf})
- [anyP] ∈ {*} - R
- _:bn -> blank node
Additional graph patterns and associated heuristics (gp3a is executed before gp3 and after gp4, gp7a is executed after gp7 and before gp8), gp3a and gp7a cause a modification of gp6 and gp10 as follows:
ID | instance/class | graph pattern | inferred axiom |
---|---|---|---|
gp3a | instance | x rdf:type e && x owl:sameAs y && y [p] z && z rdf:type C | e rdf:type _:bn && _:bn rdf:type owl:Restriction && _:bn owl:onProperty [p] _:bn owl:someValuesFrom C |
gp6 | instance | e [anyP] x && x rdf:type C | e rdf:type C |
gp7a | class | x rdf:type e && x owl:sameAs y && y [p] z && z rdf:type C | e rdfs:subClassOf _:bn && _:bn rdf:type owl:Restriction && _:bn owl:onProperty [p] _:bn owl:someValuesFrom C |
gp10 | class | x rdf:type e && y anyP x && y rdf:type C | e rdfs:subClassOf C |
Legenda
- [p] ∈ P = {wt:partOf, wt:portionOf, wt:segmentOf, wt:componentOf, wt:sectionOf, wt:divisionOf, wt:subdivisionOf, wt:constituentOf, wt:pieceOf}
- [anyP] ∈ {*} - R || P
Statistics on detected graph patterns
We have evaluated our algorithm on a sample of 627 wikipedia entities, which are originally distributed into 424 (67.62%) typed with a YAGO class, 97 (15.47%) with a DBPO type, and 189 (30%) having no type at all. On such sample, the frequency of detected graph patterns are reported in table below.
Graph pattern | Frequency (%) |
---|---|
gp1 | 0 |
gp2 | 0.15 |
gp3 | 3.98 |
gp3a | 0.31 |
gp4 | 79.34 |
gp5 | 0 |
gp6 | 0 |
gp7 | 1.11 |
gp7a | 0.15 |
gp8 | 11.46 |
gp9 | 0 |
gp10 | 3.5 |
Tools and resources for evalution
Golden standard-based evaluation
We have built a golden standard of Wikipedia typed entities for tuning and evaluating our algorithm. The golden standard has been built collaboratively by using a web-based application that manages argumentation for reaching agreement among users. The tool is available on line, and the golden standard is represented in CSV form.
You can watch a demonstrating video of how the tool works.
Given a DBpedia entity associated with its definition extracted from Wikipedia e.g.,:
- Wind instrument: A wind instrument is a musical instrument that contains some type of resonator , in which a column of air is set into vibration by the player blowing into a mouthpiece set at the end of the resonator.
users were asked to (i) indicate in a text field the type of that entity as it was understandable from the definition, and to (ii) select from two lists the most appropriate types for such entity. The following two tables illustrate the two lists of types available to users (Wordnet Supersense and foundational ontology classes, respectively):
Wordnet supersense | Gloss | Examples |
---|---|---|
Act | Nouns denoting intentional acts or actions | Crying, predation, visit, shopping, etc. |
Animal | Nouns denoting animals | Dog, cat, snake, etc. |
Body part | Nouns denoting body parts | Gingiva, jaw, tissue, arm, etc. |
Characteristic | Nouns denoting attributes of people and objects (except feelings and cognitive capabilities) | Character, edibility, flexibility, alkalinity, identity, narcissistic personality, etc. |
Cognitive objects | Nouns denoting cognitive processes, content, and goals | Mind, common sense, super-ego, know-how, urge, reason, logorrea, incentive, etc. |
Communication entity | Nouns denoting communicative processes and contents | Message, dissemination, disagreement, preyer, film, press, language, etc. |
Economic entity | Nouns denoting possession and transfer of possession | Credit card, loan, insurance, debit, funds, etc. |
Feeling | Nouns denoting feelings and emotions | Passion, apathy, desire, nostalgia, etc. |
Food | nouns denoting foods and drinks | Cheese, milk, crab, mango, onion, etc. |
Group or organization | Nouns denoting groupings of people or objects | Array, kingdom, biological group, genotype, community, people, company, etc. |
Location | Nouns denoting spatial position | Yucatan, Costa Rica, latitude, etc. |
Natural Event | Nouns denoting natural events, processes, and phenomena | Avalanche, flash, storm, superconductivity, crystallization, absorption, aging, alluvion,etc. |
Object | Nouns denoting man-made as well as natural objects | Accommodation, acoustic guitar, nail polish, raincoat, nebula, quark, stalagmite, universe, etc. |
Person | Nouns denoting people | Cezanne, Cassandra, biologist, foreigner, etc. |
Plant | Nouns denoting plants | Betula, blueberry, willow, etc. |
Quantity | Nouns denoting quantities and units of measure | Nautical mile, british pound, quotient, etc. |
Relation | Nouns denoting relations between people or things or ideas | Social relation, legal relation, trigonometric function, causality, motherhood, etc. |
Shape | Nouns denoting two and three dimensional shapes | Cartesian plane, angle, circle, curve, etc. |
State | Nouns denoting stable states of affairs | Ornamentation, health, friendship, air pollution, etc. |
Substance | Nouns denoting substances | Asphalt, chemical, fuel, incense, lipid, etc. |
Time | Nouns denoting time and temporal relations | Julian calendar, autumnal equinox, 1960s, Ramadan, etc. |
Ontology class | Label | Gloss | Examples |
---|---|---|---|
dul:Abstract | Abstract | Anything that cannot be located in space-time. | Vectors, sets, fractals, equations, etc. |
d0:Activity | Action, activity or task | Any action or task planned or executed by an agent intentionally causing and participating in it. | Swimming, shopping, knowledge sharing, etc. |
dul:Amount | Amount, quantity | Any quantity, independently from how it is measured, computed, etc. | kelvin, angstrom, quarter mile, silver dollar, deadline, etc. |
d0:Characteristic | Quality, feature, attribute | An aspect or quality of a thing. | radial symmetry, poker face, alkalinity, attractiveness, darkness, etc. |
dul:Collection | Collection or social group | A container or group of things (or agents) that share one or more common properties. | coin collection, checkout line, public library, Milky Way, etc. |
d0:CognitiveEntity | Cognitive entity | Attitudes, cognitive abilities, ideologies, psychological phenomena, mind, etc. | discernment, homophobia, precognition, etc. |
dul:Description | Conceptualization, description, context | A descriptive context that creates a relational view on a set of data or observations. | hypothesis, danger, Avogadro's law, string theory, utopia, etc. |
d0:Event | Any natural event | Any natural event, independently of its possible causes. | avalanche, earthquake, brainwave, bonfire, etc. |
dul:Goal | Goal, aim, achievement | The description of a situation that is desired by an agent. | destination, purpose, intention |
dul:InformationEntity | Information entity, creative work, knowledge | A piece of information, be it concretely realized or not: linguistic expressions, works of art, knowledge objects. | data, string, message, novel, song, etc. |
d0:Location | Place or space | A location, in a very generic sense e.g. geo-political entities, or physical object that are inherently located. | Oslo, Australia, Inner Mongolia, resort area, intergalactic space, tundra, tunnel, etc. |
dul:Organism | Organism, animal, plant | A physical object with biological characteristics, typically able to self-reproduce. | Japanese banana, fox, fungus, etc. |
dul:Organization | Organization | An internally structured, conventionally created social entity such as enterprises, bands, political parties, etc. | mathematics department, headquarters, yakuza, The Beatles, etc. |
dul:Person | Person | Persons in commonsense intuition. | John Doe, Aristotle, Armenian, house guest, etc. |
dul:Personification | Fictional or imaginary agent | A social entity with agentive features, invented or conceived through a cultural process. | holy grail, deus ex machina, God, magic wands, etc. |
dul:PhysicalObject | Physical object | Any object that has a proper space region, and an associated mass: natural bodies, artifacts, substances. | Kleenex, beard, building, etc. |
dul:Process | Natural or social process | Any natural process, independently of its possible causes. | absorption, acidification, chemical process, condensation, etc. |
dul:Relation | Social, logical, or other relations | Any social, logical, or quantitative relation (usually quite elementary). | part, identity, homonymy, causality, reciprocality, etc. |
dul:Role | Role | A concept that classifies some entity: social positions, roles, statuses. | soldier, eminence, legal status, etc. |
dul:Situation | Case, condition, circumstance, state, situation | A unified view on a set of entities, e.g. physical or social facts or conditions, configurations, etc. | breaking point, circulatory failure, start topology, inflammation, alienation, etc. |
d0:System | System | Physical, social, political systems. | viticulture, non-linear system, democracy, water system, etc. |
dul:TimeInterval | Time interval | A time span. | January, Friday, 2011, Modern era, etc. |
d0:Topic | Area of knowledge | Any area, discipline, subject of knowledge. | algebra, avionics, ballet, theology, engineering, etc. |
ISWC reviewers can access the tool by using username iswc2012.reviewer and password reviewer. We suggest reviewers to use a IP tunneling service such as Tor if they are concerned about preserving anonymity.
The golden standard tool manages argumentation among users in order to support discussion for reaching agreement. 10 users with expertise in ontology design have participated in this task so far, and have reached agreement on 100 entities (of which 29 have no type in DBpedia).
Evaluation: user study
The tool used for assessing the quality of Tipalo results is available on line and can be accessed by reviewers with the account given above. It is possible to download the data resulting from the evaluation in CVS format.
A demonstrating video is available online as well. The evaluation has been conducted on a sample of 627 DBpedia resources 67.62% of them have a YAGO type in DBpedia, 15.47% have a DBPO type, and 30% have no type.
This tool guides users through three evaluation steps:
- validating the correctness of the types assigned to an entity;
- validating the soundness of the induced taxonomy of types for an entity;
- validating the correctness of the meaning of individual types (i.e. word sense disambiguation task).
RDF datasets download
The following resources have been produced
- Gold standard of 100 entities manually typed by a selected group of users (with agreement >70%)
- User evaluation data
- Wikipedia instance types (built for a sample of 627 DBpedia entities)
- Wikipedia taxonomy (built for a sample of 627 DBpedia entities)
- OntoWordnet 2012 (Including alignments between Wordnet and DUL)
- D0 (An ontology defining general classes, aligned to DUL)
- DUL plus (An extension to DUL)
- Wordnet3.0 supersenses (A RDF version of WordNet including alignments to Supersense)