LOD:PatternMeetup

From STLab

(Difference between revisions)
Jump to: navigation, search
(Soundtrack)
Current revision (17:44, 3 February 2011) (view source)
(F2F Minutes)
 
(29 intermediate revisions not shown.)
Line 26: Line 26:
= Research Questions =
= Research Questions =
== Research Questions and Objectives ==
== Research Questions and Objectives ==
-
 
-
Hypothesis: The use of Knolwedge Patterns (KPs) improves the user-interaction experience (when searching for relevant content) - the relevance of recommended content increases if its selection is based on KPs
 
-
 
-
Knowledge Patterns embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science [http://stlab.istc.cnr.it/documents/papers/KP_SWJ.pdf (cf. Gangemi and Presutti, 2010)]. This hypothesis is based on the assumption that each pattern conveys what a user would expect to find, the most relevant knowledge about a certain entity and in a certain context.
 
-
 
-
'''Cognitive relevance of patterns -> the pattern includes the most relevant relations about something -> it allows to generate good explanations associated with the recommended content
 
-
'''
 
=== Research questions (to be completed) ===
=== Research questions (to be completed) ===
Line 52: Line 45:
* To improve the experience for end users
* To improve the experience for end users
-
== Hypotheses ==
+
=== Hypotheses ===
 +
Hypothesis: The use of Knolwedge Patterns (KPs) improves the user-interaction experience (when searching for relevant content) - the relevance of recommended content increases if its selection is based on KPs
 +
Knowledge Patterns embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science [http://stlab.istc.cnr.it/documents/papers/KP_SWJ.pdf (cf. Gangemi and Presutti, 2010)]. This hypothesis is based on the assumption that each pattern conveys what a user would expect to find, the most relevant knowledge about a certain entity and in a certain context.
 +
 +
'''Cognitive relevance of patterns -> the pattern includes the most relevant relations about something -> it allows to generate good explanations associated with the recommended content
 +
'''
= Overview =
= Overview =
Line 211: Line 209:
== Resources ==
== Resources ==
 +
* [[LOD:KCAP analysis|Analysis of LinkedMDB and Jamendo/DBTunes for KCAP]]
* [[LOD:Scenarios|Scenarios and examples]] of Linked Data usage for recommendation pathfinding. These include examples of knowledge/content pattern extraction for generalising such paths.
* [[LOD:Scenarios|Scenarios and examples]] of Linked Data usage for recommendation pathfinding. These include examples of knowledge/content pattern extraction for generalising such paths.
* [[LOD:ontology statistics|Analysis and stats on Linked Data]]
* [[LOD:ontology statistics|Analysis and stats on Linked Data]]
Line 236: Line 235:
* '''Soundtrack''' pattern
* '''Soundtrack''' pattern
** Top-down OWL model : [http://ontologydesignpatterns.org/cp/owl/Soundtrack.owl Soundtrack.owl] (musical artist details to be addded)
** Top-down OWL model : [http://ontologydesignpatterns.org/cp/owl/Soundtrack.owl Soundtrack.owl] (musical artist details to be addded)
 +
** Essential OWL model : [http://ontologydesignpatterns.org/cp/owl/SoundtrackMinimal.owl SoundtrackMinimal.owl]
* '''Members of Ensemble''' pattern
* '''Members of Ensemble''' pattern
** TBD
** TBD
Line 306: Line 306:
= Minutes =
= Minutes =
-
== F2F Minutes ==
+
=== Valentina, Lora, Guus, Balthasar20110203 ===
-
=== Guus Schreiber, Rome, 20100923 ===
+
* possible interesting patterns:
 +
** '''groups''': can be detected in patterns as ?x foaf:made ?y, ?y foaf:maker ?x, ?x != ?y
 +
** '''part-of''', eg  _ mo:Record ?x, ?x mo:Track ?y -> ?y is part of ?x
 +
* assign 1 of 3 categories to properties? As in another project:
 +
** biographical: describes what someone did
 +
** material: in this case where the mp3 can be found or what format a song is in (dc:format)
 +
** symbiotic: the meaning of something, eg tags
 +
* some paths can contain duplicate entities / redundancy, eg ?x foaf:made ?y, ?y foaf:maker ?x. We might not want to repeat the SPARQL queries due to time restrictions, but we could make an estimate of the redundancy.
 +
 
 +
 
 +
=== Balthasar, Valentina, Lora 20110113 ===
 +
* going from mappings to patterns:
 +
** link properties in data-set to existing patterns
 +
** manually discover and define new patterns in the process
 +
** use complex mappings by giving homogenous view on data, while it can be represented in several ways, eg
 +
?p isActor ?m <=> ?p performance ?z, ?z hasRole actor, ?z inMovie ?m
 +
* see how applicable a pattern is for a data-set by looking at #triples for a kind of property, etc
 +
* link properties back to pattern... possibly improve pattern, make it more applicable to data-set
 +
* related work: http://www.scharffe.fr/pub/phd-thesis/manuscript.pdf
 +
* format of defining mappings
 +
* we must still do step 3b
 +
 
 +
=== Aldo, Alessandro, Balthasar, Valentina 20101222 ===
 +
Setup of meeting: almost F2F with Balthasar on skype and the rest at CNR.
 +
 
 +
New content available at [[LOD:workinprogress]].
 +
 
 +
Notes:
 +
* Notation: use a single prefix that encompasses the entire namespace before the fragment, e.g. lmdb:movie/ -> lmdb_movie:
 +
* If there is a partial coverage of the type e.g. only 64% of subjects of triples for lmdb_movie:language is lmdb_movie:film, then it is important to understand the motivation behind this partial coverage.
 +
** This could imply that different knowledge patterns are constructed for "overlapping" universes.
 +
* Summarize analysis results in a ready-to-use file.
 +
** Aldo: even an Excel file could do for an annotation schema in RDF could be created, at a later stage, for answering queries like "give me all the properties that have a domain coverage < 100%".
 +
* Partition results using the following rationale for partitions, reflecting their OWL equivalent:
 +
** data properties (range = rdfs:Literal, in turn typed or untyped)
 +
** object properties
 +
** annotation properties
 +
** links to external data-sets
 +
** mixed properties (range = typed + untyped literals)
 +
** rdf:type as a property per se
 +
* Check for the soundness of the universes in this partition. Ranges might be untyped rdf:Resources within the single "original" dataset, but their actual type might belong to the "target" dataset asserting for that resource. This can only be done manually at this stage.
 +
** For example, we see that <http://xmlns.com/foaf/0.1/based_near> has range rdf:resource, but they're all pointing to geonames entities. By loading geonames we could identify a more specific range
 +
* add a column that reports on the integrity of the property value (?)
 +
* untyped values could be a problem when reasoning with datasets, we have to keep this in mind
 +
* list names, locations, links to RDF dumps for each dataset
 +
* Valentina finally remarked that, as we didn't have the expected advancements within one week, it is needed to have a detailed report by January 5th (partly in a wiki page for reviewers, partly in the paper draft). Furthermore, her suggestion is to also have a telecon on January 6th (I know it's a holiday but I believe it's necessary): it seems to be the only day that both Alessandro and Balthasar are available (?). I will be available to spend around one hour during the day for discussing the results.
 +
 
 +
APs
 +
* Valentina: to send out the TeX sources of the paper draft (share via svn?)
 +
* Alessandro, Balthasar
 +
** list in the wiki all data-sets under analysis (associated with a link to a dump of them)
 +
** perform on such data-sets the whole method
 +
** report on the wiki the raw data and details about the executed procedure
 +
** include in the paper the results of, and discussion on, the analysis performed on the data-set according to the 6-step method
 +
 
 +
 
 +
=== Aldo, Alessandro, Balthasar, Lora, Valentina 20101215 ===
 +
Setup of meeting: almost F2F with Alessandro on skype and the rest at the VU
 +
 
 +
Agenda:
 +
* LOD analysis
 +
** to what extent and how patterns are represented
 +
** procedure \ methods \ criterea \ tools
 +
* plan
 +
* action points
 +
 
 +
Notes:
 +
* Alessandro: starting from domain specific data-sets (eg MusicBrainz) seems more useful than generic knowledge data-sets (dbpedia)
 +
** because the domain specific data-sets contain owlsameas links to generic data-sets
 +
* general purpose data-sets:
 +
** YAGO
 +
** DBPedia
 +
** Freebase (though less than the others) ---> NOT LINKED
 +
* domain specific datasets:
 +
** DBtune/MusicBrainz --> Jamendo
 +
** LinkedMDB
 +
 
 +
* bottom up approach: list all properties in data-sets and their frequencies
 +
* how about adding authorship? it usually fuses with roles in LOD (e.g. JohnCarpenter mdb:director EscapeFromNewYork)
 +
* interesting: different expression of relationships (with the same semantics)
 +
* Steps for analysis of Linked Data:
 +
* 1. List all properties per data source (frequency, URI)
 +
* 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
 +
* 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
 +
* 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
 +
* 3c. Compare the result of the two types of grouping in 3a and 3b
 +
* 4. Provide statistics on the tripples (frequenceis of usage of properties)
 +
* 5. Discover paths of properties
 +
* 6. match to existing knowledge patterns, or discover new patterns
 +
 
 +
* statistics:
 +
** a path is a sequence of relations that include the property P (i.e. distinguish the paths by their degrees, e.g. degree 0 is the property P itself, degree 1 is path of lenght 1 where the property P is in the starting point - in the middle)
 +
 
 +
* Aldo: we dont make a new top-lvl ontology, but an owl-file and map properties to ODPs if possible
 +
** possible alternative, link to dbpedia by default
 +
 
 +
 
 +
Action points for next telecon (Dec 22)
 +
 
 +
* Create a wiki page for raw data (for reviewers) and use the method steps as structure of the page (it's also the source of what goes in section 5 of the paper)
 +
* Perform the method steps
 +
* Fill the wiki page and section 5 with results and data from the execution of the method
 +
 
 +
 
 +
=== Alessandro, Balthasar, Lora, Valentina, 20101201 ===
 +
 
 +
* We are refining the notion of aggregation patterns to explore the LOD for as the following subtypes:
 +
*# collections/collectives, generally intransitive, implying membership. The least useful for meronomies, but the most common to be expected;
 +
*# part-whole, transitive. Expected for geographical data and alignment with geonames;
 +
*# componency, nontransitive.
 +
 
 +
* Mid-term objective is to aim towards a full paper submission at KCAP, due mid Feb.
 +
 
 +
[[Image:patternonwhiteboard.jpg]]
 +
 
 +
* What to look for next:
 +
** to what extent and how are these patterns represented in LOD?
 +
** Ale and Balth to describe the approach they are using for analyzing LOD. Should be drafted as well
 +
** on which source do you start? are u doing it manually/automatically?
 +
** in either case, what tools are you using?
 +
 
 +
 
 +
=== Rome, 20100923 ===
sample pattern: loccation -> subject class (e.g. a fish species) -<=> location linked to similar subject class
sample pattern: loccation -> subject class (e.g. a fish species) -<=> location linked to similar subject class
Line 328: Line 450:
* Domain-specific types are more specific.?
* Domain-specific types are more specific.?
-
=== Balthasar Schopman, Rome, 20100923 ===
+
=== Rome, 20100923 ===
knowledge pattern = general pattern that models knowledge
knowledge pattern = general pattern that models knowledge
Line 346: Line 468:
Discussion: what kind of restrictions should a pattern contain. EG: both node tyes & specific properties, or: only node types (being indifferent to the properties)
Discussion: what kind of restrictions should a pattern contain. EG: both node tyes & specific properties, or: only node types (being indifferent to the properties)
-
=== all, Rome, 20100924 ===
+
=== Rome, 20100924 ===
We can use 'tricks' to exploit the semantics, even within a single corpus. For example, in LinkedMDB there are two roles for Mel Brooks, ie that of [http://data.linkedmdb.org/page/actor/29583 Actor] and [http://data.linkedmdb.org/page/director/8458 Director], which are both linked to the [http://www.freebase.com/view/guid/9202a8c04000641f80000000000289f2 Freebase URI] using the foaf:page property. This Freebase URI can be used to go to other corpora:
We can use 'tricks' to exploit the semantics, even within a single corpus. For example, in LinkedMDB there are two roles for Mel Brooks, ie that of [http://data.linkedmdb.org/page/actor/29583 Actor] and [http://data.linkedmdb.org/page/director/8458 Director], which are both linked to the [http://www.freebase.com/view/guid/9202a8c04000641f80000000000289f2 Freebase URI] using the foaf:page property. This Freebase URI can be used to go to other corpora:
<http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/N17739970876672888293> .
<http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/N17739970876672888293> .
Line 352: Line 474:
<http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://www.bbc.co.uk/music/artists/29a62e70-a15b-4e58-a39d-377f0443eb2c#artist> .
<http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://www.bbc.co.uk/music/artists/29a62e70-a15b-4e58-a39d-377f0443eb2c#artist> .
-
=== Lora, Rome, 20100927 ===
+
=== Rome, 20100927 ===
* Action: relate each navigation pattern to a weight, which can express the suitability of the pattern for a given user profile, user context or other context/statistical restrictions
* Action: relate each navigation pattern to a weight, which can express the suitability of the pattern for a given user profile, user context or other context/statistical restrictions
Line 407: Line 529:
(III) target the first study in about 4 months
(III) target the first study in about 4 months
-
=== Balthasar, Rome, 20100927 ===
+
=== Rome, 20100927 ===
Knowledge patterns:
Knowledge patterns:
* composition between entities, eg between People and Companies by relation 'founder'
* composition between entities, eg between People and Companies by relation 'founder'
Line 416: Line 538:
* generalization of this composition: member-relation
* generalization of this composition: member-relation
-
=== all, Rome, 20100928 ===
+
=== Rome, 20100928 ===
* discussing the use case: Anthrax
* discussing the use case: Anthrax
** media item --> music
** media item --> music
Line 438: Line 560:
<!--[[Image:Pattern.png]]-->
<!--[[Image:Pattern.png]]-->
-
=== all, Rome, 20100929 ===
+
=== Rome, 20100929 ===
* we want to derive several versions of content patterns based on the pattern sketched yesterday, eg one with a media entity related to an organization and related to an event, that we can use to instantiate paths in LOD
* we want to derive several versions of content patterns based on the pattern sketched yesterday, eg one with a media entity related to an organization and related to an event, that we can use to instantiate paths in LOD
** (we don't aim to make a pattern that's as generic as possible)
** (we don't aim to make a pattern that's as generic as possible)
-
=== Valentina, Rome, 20100930===
+
=== Rome, 20100930===
* Next steps
* Next steps
** action: balthasar will look at LOD examples that fill the CP in order to fix the CP
** action: balthasar will look at LOD examples that fill the CP in order to fix the CP
Line 460: Line 582:
** user-based validation. experiment desing for validating and tuning candidate navigational patterns, and identifying new navigational patterns
** user-based validation. experiment desing for validating and tuning candidate navigational patterns, and identifying new navigational patterns
-
=== Alessandro, Balthasar, Lora, virtual, 20101020 ===
+
=== Alessandro, Balthasar, Lora, 20101020 ===
* The Prolog-like representation schema used by Balthasar is fine for the time being, provided that an instance example is provided for each pattern (in # comments).
* The Prolog-like representation schema used by Balthasar is fine for the time being, provided that an instance example is provided for each pattern (in # comments).
Line 481: Line 603:
* BibSonomy group '''lodpatterns''' has been created at [http://www.bibsonomy.org/group/lodpatterns]. This group should mirror the References section of this Wiki as faithfully as possible. For joining, either communicate your Bibsonomy username to Alessandro or login onto BibSonomy.org , browse [http://www.bibsonomy.org/groups] for "lodpatterns" and click the Join button (admin approval will be required).
* BibSonomy group '''lodpatterns''' has been created at [http://www.bibsonomy.org/group/lodpatterns]. This group should mirror the References section of this Wiki as faithfully as possible. For joining, either communicate your Bibsonomy username to Alessandro or login onto BibSonomy.org , browse [http://www.bibsonomy.org/groups] for "lodpatterns" and click the Join button (admin approval will be required).
-
=== Alessandro, Balthasar, Lora, Valentina, virtual, 20101027 ===
+
=== Alessandro, Balthasar, Lora, Valentina, 20101027 ===
* Recap on the chosen knowledge patterns and comment on their representation
* Recap on the chosen knowledge patterns and comment on their representation
Line 507: Line 629:
-
=== Alessandro, Balthasar, Valentina, virtual, 20101103 ===
+
=== Alessandro, Balthasar, Valentina, 20101103 ===
==== knowledge patterns (KP) ====
==== knowledge patterns (KP) ====
Line 557: Line 679:
** 1 OWL file containing the model and mappings to Ontology Design Patterns
** 1 OWL file containing the model and mappings to Ontology Design Patterns
** 1 Turtle file containing mappings from the model to Linked Data entities (eg Freebase, DBpedia)
** 1 Turtle file containing mappings from the model to Linked Data entities (eg Freebase, DBpedia)
-
 
Planning
Planning
Line 563: Line 684:
* ACTION Balthasar: look into using SPARQL 1.1 for the Pattern
* ACTION Balthasar: look into using SPARQL 1.1 for the Pattern
* ACTION both: (long term) find mappings between OWL model and LOD data-sets
* ACTION both: (long term) find mappings between OWL model and LOD data-sets
 +
 +
 +
=== Alessandro, Balthasar, Lora, Valentina 20101117 ===
 +
 +
Exploration of patterns:
 +
* Guus suggested that the implicit aggregation instantiated in the MoE pattern could be abstracted, thus augmenting the power of the pattern itself wrt recommendation. This abstraction could allow us to explore more meronomies such as those constructed in Geonames.
 +
 +
* We should not rule out that additional relations might be added for increasing recall, or finding alternate paths to follow in disharmonic LOD datasets. One such example is the MoE pattern: as instantiated in Freebase for the "The Social Network" movie, we are also able to reach founder Mark Zuckerberg by the depiction of the "fictional" Mark Zuckerberg.
 +
 +
* Other achievements can depend on substituting the membership relation with others such as participation into events. For the ST pattern, it could be represented by the recording event of the soundtrack. For the MoE pattern, the ensemble could be ported to an event, e.g. being able to recommend WWII movies even when their relationship with WWII is not made explicit but they depict figures who all had an involvement in WWII (e.g. Churchill, Eisenhower, Hitler and De Gaulle).
 +
 +
* Action Point (A, B): explore the presence in LOD of relations that express:
 +
*# role-of-objects
 +
*# participation-to-events
 +
*# about-ness
 +
*# different types of aggregations
 +
 +
** A possible distribution of efforts can be that Balthasar fetches relationships holding between entities of interest and Alessandro aggregates those related to these four patterns and maps them.
 +
 +
** Note about the research method we are following: we analyse LOD for extracting patterns and generalize them for encompassing different vocabularies (bottom-up), we take general content patterns (top-down) and see if they are someway represented in LOD.
 +
 +
* find variations of possible ways of traversing/instantiating a pattern
 +
 +
* make patterns more generic with different instantiations, eg: aggregation relation such as ensemble-person relation in Ensemble-pattern is analog to region-location relation in Geonames
= Misc results =
= Misc results =
Line 571: Line 716:
** presentation of interesting facts
** presentation of interesting facts
* the IMDB trivia do require some semantification of the plain text, but that may be worth the effort. Example [http://www.imdb.com/title/tt0084787/trivia The Thing] and its original movie
* the IMDB trivia do require some semantification of the plain text, but that may be worth the effort. Example [http://www.imdb.com/title/tt0084787/trivia The Thing] and its original movie
 +
 +
 +
= Method =
 +
== Method ==
 +
 +
* Steps for analysis of Linked Data:
 +
* 1. List all properties per data source (frequency, URI)
 +
* 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
 +
* 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
 +
* 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
 +
* 3c. Compare the result of the two types of grouping in 3a and 3b
 +
* 4. Provide statistics on the tripples (frequenceis of usage of properties)
 +
* 5. Discover paths of properties
 +
* 6. match to existing knowledge patterns, or discover new patterns
 +
 +
= Analysis =
 +
== Analysis ==
 +
 +
[[LOD:KCAP_analysis]]
 +
 +
[[LOD:KCAP_analysis_discussion_pictures]]
<headertabs/>
<headertabs/>

Current revision

All experiments, findings and what have you will fill this nice page.

Note: by default, this page and all pages prefixed with the LOD namespace are public.

Pattern finding

Linked Data

Several Linked Data corpora can be used in conjunction. A corpus can have a broad or narrow domain, eg:

  • General knowledge (Freebase, DBpedia)
  • Geographical knowledge (Geonames)
  • Movies (LinkedMDB)
  • People (foaf, GTAA)
  • Lexicon (wordNet)
  • Subjects (GTAA)
  • Music (dbtune)

Patterns

We study the usage of data of several repositories in order to determine statistically and semantically relevant patterns. Statistically relevant, because we want to use common patterns, as opposed to using properties that are only used once. Relevant patterns, since we aim to find interesting relations between entities.

First we define several patterns manually. Ideally we'll eventually find an automatic method to find interesting patterns, given the huge amount of Linked Data.

It remains unanswered what the archetype of a pattern looks like. For example, does it only specify the type of the nodes, or does it also restrict the relations between the nodes? We will start with making the patterns specific and study the effects of generalizing the patterns.

Personal tools