LOD:PatternMeetup

From STLab

(Difference between revisions)

Jump to: navigation, search

Current revision

All experiments, findings and what have you will fill this nice page.

Note: by default, this page and all pages prefixed with the LOD namespace are public.

Introduction
Research Questions
Overview
Terminology
Use cases
Patterns
Resources
Formalization
Minutes
Misc results
Method
Analysis

Pattern finding

Linked Data

Several Linked Data corpora can be used in conjunction. A corpus can have a broad or narrow domain, eg:

General knowledge (Freebase, DBpedia)
Geographical knowledge (Geonames)
Movies (LinkedMDB)
People (foaf, GTAA)
Lexicon (wordNet)
Subjects (GTAA)
Music (dbtune)

Patterns

We study the usage of data of several repositories in order to determine statistically and semantically relevant patterns. Statistically relevant, because we want to use common patterns, as opposed to using properties that are only used once. Relevant patterns, since we aim to find interesting relations between entities.

First we define several patterns manually. Ideally we'll eventually find an automatic method to find interesting patterns, given the huge amount of Linked Data.

It remains unanswered what the archetype of a pattern looks like. For example, does it only specify the type of the nodes, or does it also restrict the relations between the nodes? We will start with making the patterns specific and study the effects of generalizing the patterns.

Knowledge patterns

Soundtrack Pattern

Atomic knowledge patterns

Role KP specializations
- in music-performing entities (bass player)
- in organizations (founder)
- in media item authoring (director)
other classification KPs
- interest
Contribution KP specializations
- soundtrack (has music by). This may also indirectly result from the application of an authorship pattern to a composition pattern.
Adaptation KP specializations
- in movies (remake of)
Composition KP
- implicit usage of a componency pattern for Soundtrack as a component of audiovisual media.

Members of Ensemble Pattern

Atomic knowledge patterns

Ensemble? (organization). Can be a highly specialized collection pattern.
Role KP specializations
- Constituting role (founder). This can be seen as a combination of authorship and the plain agent/role pattern.
- Participation with a role (credit,role+dc:title combination)
Topic KP specializations
- Generic subject of a work (subject, but also a mention in a New York Times article).

Syntax

I (Balthasar) am using semi-Prolog syntax here, since it seems natural and easy to port to my implementation. Comments on a line start with a hash (#). Variables start with a capital, eg 'Movie' is a variable, movie is not a variable.

Method

Trying to get navigation patterns from existing examples.

Generic Pattern Design

Observations

Some properties specify direct relations between a movie and person, while others go from movie to performance to person. However, these are equally important movie-person patterns. We should be conscious of this for the implementation.

Navigation patterns

# most can be found in http://www.freebase.com/view/en/la_haine and http://dbpedia.org/resource/La_Haine
Film http://rdf.freebase.com/ns/film.film.cinematography Person

Film http://rdf.freebase.com/ns/film.film.edited_by Person

http://rdf.freebase.com/ns/film.film.genre

Film http://rdf.freebase.com/ns/film.film.directed_by Person

Film http://dbpedia.org/ontology/director Person

Film http://dbpedia.org/ontology/starring Person

Film http://rdf.freebase.com/ns/film.film.starring Performance
Performance http://rdf.freebase.com/ns/film.performance.film Film
Performance http://rdf.freebase.com/ns/film.performance.actor Person

Film http://rdf.freebase.com/ns/film.film.written_by Person

# MPE = Music Producing Entity
Film http://dbpedia.org/ontology/musicComposer MPE

Film http://dbpedia.org/property/music MPE

MPE http://rdf.freebase.com/ns/film.music_contributor.film Film
# <http://rdf.freebase.com/ns/en.air> <http://rdf.freebase.com/ns/film.music_contributor.film> <http://rdf.freebase.com/ns/en.the_virgin_suicides_2000> .

Navigation pattern on soundtrack contribution across media items

# From video game to soundtrack (not in Freebase, DBPedia perhaps? Not working now)
VG ?soundtrack_property OST
# <http://rdf.freebase.com/rdf/en.quake> ?soundtrack_property <http://rdf.freebase.com/rdf/m.01hh4gv>

# For recommending the soundtrack itself
OST http://rdf.freebase.com/ns/music.album.primary_release Release
# <http://rdf.freebase.com/rdf/m.01hh4gv> http://rdf.freebase.com/ns/music.album.primary_release <http://rdf.freebase.com/rdf/m.03694ry>

OST http://rdf.freebase.com/ns/music.album.artist MPE
# <http://rdf.freebase.com/rdf/m.01hh4gv> http://rdf.freebase.com/ns/music.album.artist <http://rdf.freebase.com/ns/en.trent.reznor>

# WARN: no inverse property seems instantiated in Freebase
MPE http://rdf.freebase.com/ns/film.music_contributor.film Film
# <http://rdf.freebase.com/ns/en.trent.reznor> <http://rdf.freebase.com/ns/film.music_contributor.film> <http://rdf.freebase.com/rdf/en.the_social_network>

# Required for "escaping" Freebase at any time (e.g. whenever Freebase provides insufficient information
Film owl:sameAs Film
# <http://rdf.freebase.com/rdf/en.the_social_network> owl:sameAs <http://dbpedia.org/resource/The_Social_Network>

VG http://rdf.freebase.com/ns/base.ontologies.ontology_instance_mapping Mapping
# <http://rdf.freebase.com/rdf/en.quake> http://rdf.freebase.com/ns/base.ontologies.ontology_instance_mapping Mapping <http://rdf.freebase.com/ns/m.07nfq9h>

Formalization

(increased the level of indentation, so we can edit this whole tab at once)

Model

Soundtrack pattern
- Top-down OWL model : Soundtrack.owl (musical artist details to be addded)
- Essential OWL model : SoundtrackMinimal.owl
Members of Ensemble pattern
- TBD

Mappings

@prefix     : <http://example.org/NoTubeCNR> .
@prefix  owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix   fb: <http://rdf.freebase.com/ns/> .
@prefix  dbp: <http://dbpedia.org/ontology/> .

# issue: not all properties are used in the same way across different data-sets. For example, foaf:page is used in LinkedMDB to link the entities to the IMDB URI.
# issue: not all relations can be captured in a single triple, eg
# note: freebase properties are bidirectional

:mapped skos:exactMatch skos:exactMatch .

:sameAs :mapped owl:sameAs .
:sameAs :mapped skos:exactMatch .  #-> also skos:closeMatch
:sameAs :mapped foaf:page .
:sameAs :mapped fb:base.ontologies.ontology_instance_mapping .

# Film -> MPE
:hasRole :mapped dbp:musicComposer .
:hasRole :mapped dbp:music .
:hasRole :mapped fb:film.music_contributor.film .


# note: the following mappings are not used in one of the two existing patterns
:hasRole :mapped fb:film.film.edited_by .
:hasRole :mapped fb:film.film.cinematography .
:hasRole :mapped fb:directed_by . 
:hasRole :mapped dbp:director . 
:hasRole :mapped dbp:starring .

Patterns

Using the SPARQL 1.1 spec

Soundtrack

PREFIX : <http://example.org/NoTubeCNR>

SELECT DISTINCT * WHERE {
  ?item    :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :avItem ;
           :sameAs* / :hasMusicBy ?mpe .
   ?mpe    ( (^rdfs:subClassOf)* / rdf:type ) :mpe .
   ?person :sameAs* / ( :hasRole | :hasPerformance/:hasRole ) ?mpe ;
           ( (^rdfs:subClassOf)* / rdf:type ) :person ;
           :sameAs* / :referencedIn ?item .
   OPTIONAL { ?item :sameAs* / :remakeOf ?originalItem }
}

"?mpe" = music producing entity
":sameAs*" to enable using facts from multiple data-set (tried to reduce it to minimum)
"?x (^rdfs:subClassOf)* / rdf:type ?t" means: ?x has type ?t, or any of the superclasses of ?x has type ?y
in this query, ?item is of type :avItem (audiovisual item), because it has a soundtrack and therefore must be audiovisual
we can use "OPTIONAL { ... }" to find interesting facts either though they're not vital for the pattern

Members of Ensemble

PREFIX : <http://example.org/NoTubeCNR>

SELECT DISTINCT * WHERE {
   ?item       :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :item ;
               :sameAs* / :isAbout ?ensemble .
   ?ensemble   :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :emsemble .
   ?person     :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :person ; 
               :sameAs* / :hasRole ?ensemble .
   OPTIONAL { ?item :sameAs* / :remakeOf ?originalItem }
}

Valentina, Lora, Guus, Balthasar20110203

possible interesting patterns:
- groups: can be detected in patterns as ?x foaf:made ?y, ?y foaf:maker ?x, ?x != ?y
- part-of, eg _ mo:Record ?x, ?x mo:Track ?y -> ?y is part of ?x
assign 1 of 3 categories to properties? As in another project:
- biographical: describes what someone did
- material: in this case where the mp3 can be found or what format a song is in (dc:format)
- symbiotic: the meaning of something, eg tags
some paths can contain duplicate entities / redundancy, eg ?x foaf:made ?y, ?y foaf:maker ?x. We might not want to repeat the SPARQL queries due to time restrictions, but we could make an estimate of the redundancy.

Balthasar, Valentina, Lora 20110113

going from mappings to patterns:
- link properties in data-set to existing patterns
- manually discover and define new patterns in the process
- use complex mappings by giving homogenous view on data, while it can be represented in several ways, eg

?p isActor ?m <=> ?p performance ?z, ?z hasRole actor, ?z inMovie ?m

see how applicable a pattern is for a data-set by looking at #triples for a kind of property, etc
link properties back to pattern... possibly improve pattern, make it more applicable to data-set
related work: http://www.scharffe.fr/pub/phd-thesis/manuscript.pdf
format of defining mappings
we must still do step 3b

Aldo, Alessandro, Balthasar, Valentina 20101222

Setup of meeting: almost F2F with Balthasar on skype and the rest at CNR.

New content available at LOD:workinprogress.

Notes:

Notation: use a single prefix that encompasses the entire namespace before the fragment, e.g. lmdb:movie/ -> lmdb_movie:
If there is a partial coverage of the type e.g. only 64% of subjects of triples for lmdb_movie:language is lmdb_movie:film, then it is important to understand the motivation behind this partial coverage.
- This could imply that different knowledge patterns are constructed for "overlapping" universes.
Summarize analysis results in a ready-to-use file.
- Aldo: even an Excel file could do for an annotation schema in RDF could be created, at a later stage, for answering queries like "give me all the properties that have a domain coverage < 100%".
Partition results using the following rationale for partitions, reflecting their OWL equivalent:
- data properties (range = rdfs:Literal, in turn typed or untyped)
- object properties
- annotation properties
- links to external data-sets
- mixed properties (range = typed + untyped literals)
- rdf:type as a property per se
Check for the soundness of the universes in this partition. Ranges might be untyped rdf:Resources within the single "original" dataset, but their actual type might belong to the "target" dataset asserting for that resource. This can only be done manually at this stage.
- For example, we see that <http://xmlns.com/foaf/0.1/based_near> has range rdf:resource, but they're all pointing to geonames entities. By loading geonames we could identify a more specific range
add a column that reports on the integrity of the property value (?)
untyped values could be a problem when reasoning with datasets, we have to keep this in mind
list names, locations, links to RDF dumps for each dataset
Valentina finally remarked that, as we didn't have the expected advancements within one week, it is needed to have a detailed report by January 5th (partly in a wiki page for reviewers, partly in the paper draft). Furthermore, her suggestion is to also have a telecon on January 6th (I know it's a holiday but I believe it's necessary): it seems to be the only day that both Alessandro and Balthasar are available (?). I will be available to spend around one hour during the day for discussing the results.

APs

Valentina: to send out the TeX sources of the paper draft (share via svn?)
Alessandro, Balthasar
- list in the wiki all data-sets under analysis (associated with a link to a dump of them)
- perform on such data-sets the whole method
- report on the wiki the raw data and details about the executed procedure
- include in the paper the results of, and discussion on, the analysis performed on the data-set according to the 6-step method

Aldo, Alessandro, Balthasar, Lora, Valentina 20101215

Setup of meeting: almost F2F with Alessandro on skype and the rest at the VU

Agenda:

LOD analysis
- to what extent and how patterns are represented
- procedure \ methods \ criterea \ tools
plan
action points

Notes:

Alessandro: starting from domain specific data-sets (eg MusicBrainz) seems more useful than generic knowledge data-sets (dbpedia)
- because the domain specific data-sets contain owlsameas links to generic data-sets
general purpose data-sets:
- YAGO
- DBPedia
- Freebase (though less than the others) ---> NOT LINKED
domain specific datasets:
- DBtune/MusicBrainz --> Jamendo
- LinkedMDB

bottom up approach: list all properties in data-sets and their frequencies
how about adding authorship? it usually fuses with roles in LOD (e.g. JohnCarpenter mdb:director EscapeFromNewYork)
interesting: different expression of relationships (with the same semantics)
Steps for analysis of Linked Data:
1. List all properties per data source (frequency, URI)
2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
3b. align types to top-level types and group properties according to their universes (also check datatype properties)
3c. Compare the result of the two types of grouping in 3a and 3b
4. Provide statistics on the tripples (frequenceis of usage of properties)
5. Discover paths of properties
6. match to existing knowledge patterns, or discover new patterns

statistics:
- a path is a sequence of relations that include the property P (i.e. distinguish the paths by their degrees, e.g. degree 0 is the property P itself, degree 1 is path of lenght 1 where the property P is in the starting point - in the middle)

Aldo: we dont make a new top-lvl ontology, but an owl-file and map properties to ODPs if possible
- possible alternative, link to dbpedia by default

Action points for next telecon (Dec 22)

Create a wiki page for raw data (for reviewers) and use the method steps as structure of the page (it's also the source of what goes in section 5 of the paper)
Perform the method steps
Fill the wiki page and section 5 with results and data from the execution of the method

Alessandro, Balthasar, Lora, Valentina, 20101201

We are refining the notion of aggregation patterns to explore the LOD for as the following subtypes:
1. collections/collectives, generally intransitive, implying membership. The least useful for meronomies, but the most common to be expected;
2. part-whole, transitive. Expected for geographical data and alignment with geonames;
3. componency, nontransitive.

Mid-term objective is to aim towards a full paper submission at KCAP, due mid Feb.

What to look for next:
- to what extent and how are these patterns represented in LOD?
- Ale and Balth to describe the approach they are using for analyzing LOD. Should be drafted as well
- on which source do you start? are u doing it manually/automatically?
- in either case, what tools are you using?

Rome, 20100923

sample pattern: loccation -> subject class (e.g. a fish species) -<=> location linked to similar subject class

pattern from E-culture: two artists woorking on the same style or related style

dimensions for these patterns: space, time, people and subject

knowledge vs. navigational pattern

intersting results from paterns that link different knowledge structures??! e.g. partof hierarchy of locations linked to subvlass hierarchy of subject types influence of levelat a certain level the generality gets too high (or specifificity may be too low: only interesting for subject geeks) of generality in hierarchy:

navigational pattern is a (partial path through a knowledge pattern/graph [if I understood it correctly]

We will take example useful for News and SocialWeb use case in NoTube weight of relations is present in navigational patterns not in knowledge patterns

navigational patterns are local, but can we generalize over these as well.? Well, this is a background hypothesis to explore can we use the level in the hierarch: say low = long tail? only if we have meta-info about the structure of the hierarchy, see analysis by Brockmuller 2003 Grouping terms supply attributes or classification dimensions to the terms grouped in the subhierarchy below them, but not to other grouping terms;

Natural categories are at the basic level of search and in principle divide general from specific terms;
Abstract classes are more general than the basic level, and
Domain-specific types are more specific.?

Rome, 20100923

knowledge pattern = general pattern that models knowledge

navigaton pattern = prototype path in LOD

types of patterns: between...

places (geonames)
people (IMDB people)
animals (geospecies)
topics (genres in TVA, IMDB, )

The UP can be used as heuristics in pathfinding... eg if the user has a biological background, the path towards geospecies is chosen, but when the user has IT background another direction may be chosen.

Roles can also be important. In general, people are interested in 'actors', but the user profile might specify the user is interested in correlations in the 'director' role.

Discussion: what kind of restrictions should a pattern contain. EG: both node tyes & specific properties, or: only node types (being indifferent to the properties)

Rome, 20100924

We can use 'tricks' to exploit the semantics, even within a single corpus. For example, in LinkedMDB there are two roles for Mel Brooks, ie that of Actor and Director, which are both linked to the Freebase URI using the foaf:page property. This Freebase URI can be used to go to other corpora: <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/N17739970876672888293> . <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://dbpedia.org/resource/Melvin_Kaminsky> . <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://www.bbc.co.uk/music/artists/29a62e70-a15b-4e58-a39d-377f0443eb2c#artist> .

Rome, 20100927

Action: relate each navigation pattern to a weight, which can express the suitability of the pattern for a given user profile, user context or other context/statistical restrictions

top part of the current instance of the pattern --> relates to the specific knowledge

bottom part of this pattern --> relates to the media metadata

search scenario will be: I need to find a media item (dc:work) with given properties
Action: include more property types (as dc;subject) to describe the media types

Action: introduce a property as a subproperty of dc:subject --> to describe the currently implicit the 'role'

consider whether there is enough property regularity

some properties might be just of provenance nature, which might not be important

but other properties, e.g. roles are important

so, you can define a number of sub-properties to destinguish between important and not important ones

Action: give a name of the current pattern (e.g. compositional)

Action (Balthasar): organize a meeting with Jan, Michiel, Guus and himself to discuss the implementation of the search prototype of such a pattern

Action: about generalization of the pattern

--> organization of the people

--> think/try out how general is the pattern; is that the right level of the generalization

--> some level of specificity might make the pattern more useful, but it would be less applicable

--> find the right trade off between specifity and generality of the pattern

--> use the User Profile, the Context and Statistical restrictions in order to filter the suitability of the pattern

--> use also the type of results you would like to achieve in order to determine the suitability of the pattern

(I) planning

indentify two patterns

implement a search strategy for these pattern

do an evaluation study

based on the results of this first study you would have to decide - (a) are those two patterns enough to cover the target result of the use cases; (b) do you need to identify (add) another knowledge pattern; or (c) is it going to be enough to just define an additonal navigation patterns

afterwards adjust the implementation

perform a large-scale study

(II) time frame of this planning --> 7-8 month from now on (III) target the first study in about 4 months

Rome, 20100927

Knowledge patterns:

composition between entities, eg between People and Companies by relation 'founder'
it's useful to define subproperties, that model interesting patterns, eg. the role pattern in the BBC EPG metadata

Composition pattern & generalization

composition = member-bunch or member-partnership, eg Person-Organisation
generalization of this composition: member-relation

Rome, 20100928

discussing the use case: Anthrax
- media item --> music
- media item --> person (+ role) --> location
- person --> media item + location
look at statistics on instances in order to make hypothesys on recall and precision
weights of patterns, eg 'creator' might be equal on a Content Pattern lvl, but on a Navigation pattern you might want to differentiate based on the application, user profile, etc.
we draw the two examples, extract the schema, and compare the two CPs
pattern 1: Movie --> subject --> Organization (Topic) --> founder --> Person --> sameAs --> NYT; --> sameAs --> dbpedia; --> sameAs --> BBC Programs --> retrieve other media items, e.g. Articles (NYT), Articles (dbpedia), Broadcast (BBC PO)
- Questions to still explore in this pattern:
  - (1) what other entities exist in Freebase (with a corresponding vocabulary to discover them) at the same level of 'Organization', e.g. 'Events'
  - (2) how is the role of the person determined depending on whether it is 'Organization' or anotehr entity 'Events'?
  - (3) which sources you would like to use in this pattern, e.g. NYT, dbpedia, EPG data, others?
- in the pattern can be the alignment, e.g. sameAs, exactMatch, which leads to a specific source, e.g. NYT, Broadcast
pattern 2: soundrack-centered --> Media Item --> has soundtrackBy --> Artist/Band --> founder --> Person; Media Item --> hasGenre; Media Item --> hasDirector --> Person; Artist --> hasGenre
Questions about the two patterns:
- when and how the user can switch from pattern 1 to pattern 2 and vise versa
Read the paper that Valentina sent per email today
let's start collecting bibliographic references. We might want to use bibsonomy.org

Rome, 20100929

we want to derive several versions of content patterns based on the pattern sketched yesterday, eg one with a media entity related to an organization and related to an event, that we can use to instantiate paths in LOD
- (we don't aim to make a pattern that's as generic as possible)

Rome, 20100930

Next steps
- action: balthasar will look at LOD examples that fill the CP in order to fix the CP
- we fix the CP, its structure and vocabulary, based on the actual data available on LOD
- we will have the first skype on October 27th 3.30pm, maybe one on Oct 22nd, with Ale, Balth, and Lora
- Alessandro will look at more specific patterns, the ones you can use when going out from the "starting CP", which is the more general one for Media Item recommendation
- both of them will look at LOD-compliant CPs, meaning they are instantiated in LOD
- define the CPs and identify candidate navigational patterns
- navigational patterns have a characterization
- implementation will be prototyped by reusing/extending NoTube-IKS frameworks
- let's keep updating the wiki
- we start collecting related work: lora, guus, val, aldo send pointers to ale and balth
- ale and balth will also look for related work and collect them
- action: ale creates a bibsonomy group for initially collecting references
- action (val): better characterize the news use case with Alex
- action (val): collect from Alex an example schema and source for the news example
- user-based validation. experiment desing for validating and tuning candidate navigational patterns, and identifying new navigational patterns

Alessandro, Balthasar, Lora, 20101020

The Prolog-like representation schema used by Balthasar is fine for the time being, provided that an instance example is provided for each pattern (in # comments).

Action (A,B) include the two patterns on the wiki (e.g. the two patterns that we decided on in Rome)
- with examples (instance-based)
- with the abstracted knowledge patterns

Naming: the names so far used for relations, knowledge and navigation patterns are ambiguous and deceiving (eg. mirroring of the Topic CP into the "topic" segment of the Film NP).
- Action (A,B, agreed) give a name of each of the patterns, e.g. in terms of goal of the pattern.
- Also, more suitable names for entities and variables e.g. Music Producing Entity

Implementation issues, such as representing patterns for reasoning and consumption, are frozen until the knowledge patterns are extracted and formalized.

Usage of additional patterns (such as the one originally sketched for geospecies/geonames-based documentary recommendation) are also frozen, but they could also be plugged into the more generic sub-patterns of the existing ones.

Alex from TXT has provided some example taxonomies and annotated content from their client news providers such as Rai and Mediaset. While the schemas themselves are custom and not much reusable, their content can identify relevant relationships to discover
- Action (A) to email Alex about permission to publish these data on the Wiki, or else to use a private space.

BibSonomy group lodpatterns has been created at [1]. This group should mirror the References section of this Wiki as faithfully as possible. For joining, either communicate your Bibsonomy username to Alessandro or login onto BibSonomy.org , browse [2] for "lodpatterns" and click the Join button (admin approval will be required).

Alessandro, Balthasar, Lora, Valentina, 20101027

Recap on the chosen knowledge patterns and comment on their representation
1. Pattern for linking media item with people via soundtrack (formerly Soundtrack/Media contribution)
  - Should be defined more generically, given an intuitive name, instantiated with an example and mapped to one or more actual navigation patterns.
2. Pattern for linking media item with people via organization (formerly Founders/Organization membership)
  - Also should be defined more generically, given an intuitive name, instantiated with an example and mapped to one or more actual navigation patterns.

By this telco the two knowledge patterns had been merged, or rather abstracted, to a single macroscopic knowledge pattern whose parts were mapped to instances that indicated two separate navigation pattern groups. This can be deceiving, so it's probably better to keep the two knowledge patterns separated.

Refinement/enrichment of the two knowledge patterns is not ruled out, but this depends on the outcome of analysing the related retrieved media items.
- e.g. including role for people, or including role for organization, or include typing of topics, or role of the topics, etc.

The IKS-related news use case will be dropped in favor of a travel recommendation use case, which however is isomorphic to the news one for most of its part.

Knowledge patterns should have an even more intuitive name, so that you can immediately understand from the name what their purpose is. Keep KP representation distinct from its associated navigation patterns representation. Also, the notation used for the diagrams is not ntirely clear and would require a legenda.

Next objectives
- Recommendations 1: what types of related media items can be retrieved using each navigation pattern
- Recommendations 2: see whether there are possible combinations of navigation patterns extracted from the two navigation patterns, so as to retrieve more interesting content.

Next two telcos: Alessandro is unable to attend at 3pm due to courses.
- Nov 3 moved forward to 11:30am
- Nov 10 TBD

Alessandro, Balthasar, Valentina, 20101103

knowledge patterns (KP)

arrows in KPs are implicitly bidirectional
- this should be explained in text or in legenda
- ACTION Balthasar: add to legenda
ACTION Balthasar: refine figures of KPs (directions of arrows)
we decide to work out the examples
- ACTION Alessandro,Balthasar: search for actual data that instantiate the two examples of KPs

expressing KPs

KPs will be expressed in OWL
- ACTION Alessandro: represent the KPs in OWL (long term action point)
- the OWL specification of the KPs is in our own namespace (eg http://ontologydesignatterns.org/cp/owl/<pattern-name>)
- these properties/classes in our own namespace will be linked to existing LOD properties
  - kp:directedBy owl:equivalentProperty <http://rdf.freebase.com/ns/film.film.directed_by>
  - or
  - kp:ensemble rdfs:subclassOf <http://dbpedia.org/resource/Metallica>
  - we have to decide on which equivalence properties to use, which is a minor but important issue: owl:equivalentProperty VS rdfs:subPropertyOf
- requirement: complex alignments, eg "Person ReferencedIn MediaItem" should be linked to "Person freebase:hasPerformance Performance" + "Performance freebase:performanceIn MediaItem"
  - possible solutions:
    - owl2 property chains
    - swirl
    - Prolog

KP model

we start referring to the knowledge patterns model:
the content patterns fill the logical + conceptual facades (see KP pictures) in the paper on knowledge patterns
the navigation patterns fill the interaction facade
in this way, we embed in a single model all aspects of a pattern: the owl, the graphical representation, the vocabulary, the navigation...
and we can move from one to the other
for example, we define possible entry points and weights of navigation paths in the navigation patterns
while we define alignements between entities in the owl definition
however both belongs to the same knowledge pattern

Alessandro, Balthasar 20101110

representing the Knowledge Pattern:

SPARQL 1.1's expressiveness seems very appropriate to represent Knowledge Patterns, eg
- { ?x !(rdf:type|^rdf:type) ?y } means relation between x and y, which is not "x rdf:type y" and also not "y rdf:type x"
- { ?x rdf:type/rdfs:subClassOf* ?type } means x is related to y via rdf:type and then 0 or more times rdfs:subClassOf

OWL model

the SPARQL query will use concepts and properties of an OWL model in our own namespace
Alessandro started on drafting this OWL model
discussion: keep the model as simple as possible with links to Ontology Design Patterns, which specify the exact position of the entities in a detailed hierarchy
modular approach:
- 1 OWL file containing the model and mappings to Ontology Design Patterns
- 1 Turtle file containing mappings from the model to Linked Data entities (eg Freebase, DBpedia)

Planning

ACTION Alessandro: continue working on OWL model
ACTION Balthasar: look into using SPARQL 1.1 for the Pattern
ACTION both: (long term) find mappings between OWL model and LOD data-sets

Alessandro, Balthasar, Lora, Valentina 20101117

Exploration of patterns:

Guus suggested that the implicit aggregation instantiated in the MoE pattern could be abstracted, thus augmenting the power of the pattern itself wrt recommendation. This abstraction could allow us to explore more meronomies such as those constructed in Geonames.

We should not rule out that additional relations might be added for increasing recall, or finding alternate paths to follow in disharmonic LOD datasets. One such example is the MoE pattern: as instantiated in Freebase for the "The Social Network" movie, we are also able to reach founder Mark Zuckerberg by the depiction of the "fictional" Mark Zuckerberg.

Other achievements can depend on substituting the membership relation with others such as participation into events. For the ST pattern, it could be represented by the recording event of the soundtrack. For the MoE pattern, the ensemble could be ported to an event, e.g. being able to recommend WWII movies even when their relationship with WWII is not made explicit but they depict figures who all had an involvement in WWII (e.g. Churchill, Eisenhower, Hitler and De Gaulle).

Action Point (A, B): explore the presence in LOD of relations that express:
1. role-of-objects
2. participation-to-events
3. about-ness
4. different types of aggregations

- A possible distribution of efforts can be that Balthasar fetches relationships holding between entities of interest and Alessandro aggregates those related to these four patterns and maps them.

- Note about the research method we are following: we analyse LOD for extracting patterns and generalize them for encompassing different vocabularies (bottom-up), we take general content patterns (top-down) and see if they are someway represented in LOD.

find variations of possible ways of traversing/instantiating a pattern

make patterns more generic with different instantiations, eg: aggregation relation such as ensemble-person relation in Ensemble-pattern is analog to region-location relation in Geonames

@@ Line 26: / Line 26: @@
 = Research Questions =
 == Research Questions and Objectives ==
-Hypothesis: The use of Knolwedge Patterns (KPs) improves the user-interaction experience (when searching for relevant content) - the relevance of recommended content increases if its selection is based on KPs
-Knowledge Patterns embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science [http://stlab.istc.cnr.it/documents/papers/KP_SWJ.pdf (cf. Gangemi and Presutti, 2010)]. This hypothesis is based on the assumption that each pattern conveys what a user would expect to find, the most relevant knowledge about a certain entity and in a certain context.
-'''Cognitive relevance of patterns -> the pattern includes the most relevant relations about something -> it allows to generate good explanations associated with the recommended content
-'''
 === Research questions (to be completed) ===
@@ Line 52: / Line 45: @@
 * To improve the experience for end users
-== Hypotheses ==
+=== Hypotheses ===
+Hypothesis: The use of Knolwedge Patterns (KPs) improves the user-interaction experience (when searching for relevant content) - the relevance of recommended content increases if its selection is based on KPs
+Knowledge Patterns embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science [http://stlab.istc.cnr.it/documents/papers/KP_SWJ.pdf (cf. Gangemi and Presutti, 2010)]. This hypothesis is based on the assumption that each pattern conveys what a user would expect to find, the most relevant knowledge about a certain entity and in a certain context.
+'''Cognitive relevance of patterns -> the pattern includes the most relevant relations about something -> it allows to generate good explanations associated with the recommended content
+'''
 = Overview =
@@ Line 211: / Line 209: @@
 == Resources ==
+* [[LOD:KCAP analysis|Analysis of LinkedMDB and Jamendo/DBTunes for KCAP]]
 * [[LOD:Scenarios|Scenarios and examples]] of Linked Data usage for recommendation pathfinding. These include examples of knowledge/content pattern extraction for generalising such paths.
 * [[LOD:ontology statistics|Analysis and stats on Linked Data]]
@@ Line 236: / Line 235: @@
 * '''Soundtrack''' pattern
 ** Top-down OWL model : [http://ontologydesignpatterns.org/cp/owl/Soundtrack.owl Soundtrack.owl] (musical artist details to be addded)
+** Essential OWL model : [http://ontologydesignpatterns.org/cp/owl/SoundtrackMinimal.owl SoundtrackMinimal.owl]
 * '''Members of Ensemble''' pattern
 ** TBD
@@ Line 306: / Line 306: @@
 = Minutes =
-== F2F Minutes ==
+=== Valentina, Lora, Guus, Balthasar20110203 ===
-=== Guus Schreiber, Rome, 20100923 ===
+* possible interesting patterns:
+** '''groups''': can be detected in patterns as ?x foaf:made ?y, ?y foaf:maker ?x, ?x != ?y
+** '''part-of''', eg  _ mo:Record ?x, ?x mo:Track ?y -> ?y is part of ?x
+* assign 1 of 3 categories to properties? As in another project:
+** biographical: describes what someone did
+** material: in this case where the mp3 can be found or what format a song is in (dc:format)
+** symbiotic: the meaning of something, eg tags
+* some paths can contain duplicate entities / redundancy, eg ?x foaf:made ?y, ?y foaf:maker ?x. We might not want to repeat the SPARQL queries due to time restrictions, but we could make an estimate of the redundancy.
+=== Balthasar, Valentina, Lora 20110113 ===
+* going from mappings to patterns:
+** link properties in data-set to existing patterns
+** manually discover and define new patterns in the process
+** use complex mappings by giving homogenous view on data, while it can be represented in several ways, eg
+ ?p isActor ?m <=> ?p performance ?z, ?z hasRole actor, ?z inMovie ?m
+* see how applicable a pattern is for a data-set by looking at #triples for a kind of property, etc
+* link properties back to pattern... possibly improve pattern, make it more applicable to data-set
+* related work: http://www.scharffe.fr/pub/phd-thesis/manuscript.pdf
+* format of defining mappings
+* we must still do step 3b
+=== Aldo, Alessandro, Balthasar, Valentina 20101222 ===
+Setup of meeting: almost F2F with Balthasar on skype and the rest at CNR.
+New content available at [[LOD:workinprogress]].
+Notes:
+* Notation: use a single prefix that encompasses the entire namespace before the fragment, e.g. lmdb:movie/ -> lmdb_movie:
+* If there is a partial coverage of the type e.g. only 64% of subjects of triples for lmdb_movie:language is lmdb_movie:film, then it is important to understand the motivation behind this partial coverage.
+** This could imply that different knowledge patterns are constructed for "overlapping" universes.
+* Summarize analysis results in a ready-to-use file.
+** Aldo: even an Excel file could do for an annotation schema in RDF could be created, at a later stage, for answering queries like "give me all the properties that have a domain coverage < 100%".
+* Partition results using the following rationale for partitions, reflecting their OWL equivalent:
+** data properties (range = rdfs:Literal, in turn typed or untyped)
+** object properties
+** annotation properties
+** links to external data-sets
+** mixed properties (range = typed + untyped literals)
+** rdf:type as a property per se
+* Check for the soundness of the universes in this partition. Ranges might be untyped rdf:Resources within the single "original" dataset, but their actual type might belong to the "target" dataset asserting for that resource. This can only be done manually at this stage.
+** For example, we see that <http://xmlns.com/foaf/0.1/based_near> has range rdf:resource, but they're all pointing to geonames entities. By loading geonames we could identify a more specific range
+* add a column that reports on the integrity of the property value (?)
+* untyped values could be a problem when reasoning with datasets, we have to keep this in mind
+* list names, locations, links to RDF dumps for each dataset
+* Valentina finally remarked that, as we didn't have the expected advancements within one week, it is needed to have a detailed report by January 5th (partly in a wiki page for reviewers, partly in the paper draft). Furthermore, her suggestion is to also have a telecon on January 6th (I know it's a holiday but I believe it's necessary): it seems to be the only day that both Alessandro and Balthasar are available (?). I will be available to spend around one hour during the day for discussing the results.
+APs
+* Valentina: to send out the TeX sources of the paper draft (share via svn?)
+* Alessandro, Balthasar
+** list in the wiki all data-sets under analysis (associated with a link to a dump of them)
+** perform on such data-sets the whole method
+** report on the wiki the raw data and details about the executed procedure
+** include in the paper the results of, and discussion on, the analysis performed on the data-set according to the 6-step method
+=== Aldo, Alessandro, Balthasar, Lora, Valentina 20101215 ===
+Setup of meeting: almost F2F with Alessandro on skype and the rest at the VU
+Agenda:
+* LOD analysis
+** to what extent and how patterns are represented
+** procedure \ methods \ criterea \ tools
+* plan
+* action points
+Notes:
+* Alessandro: starting from domain specific data-sets (eg MusicBrainz) seems more useful than generic knowledge data-sets (dbpedia)
+** because the domain specific data-sets contain owlsameas links to generic data-sets
+* general purpose data-sets:
+** YAGO
+** DBPedia
+** Freebase (though less than the others) ---> NOT LINKED
+* domain specific datasets:
+** DBtune/MusicBrainz --> Jamendo
+** LinkedMDB
+* bottom up approach: list all properties in data-sets and their frequencies
+* how about adding authorship? it usually fuses with roles in LOD (e.g. JohnCarpenter mdb:director EscapeFromNewYork)
+* interesting: different expression of relationships (with the same semantics)
+* Steps for analysis of Linked Data:
+* 1. List all properties per data source (frequency, URI)
+* 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
+* 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
+* 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
+* 3c. Compare the result of the two types of grouping in 3a and 3b
+* 4. Provide statistics on the tripples (frequenceis of usage of properties)
+* 5. Discover paths of properties
+* 6. match to existing knowledge patterns, or discover new patterns
+* statistics:
+** a path is a sequence of relations that include the property P (i.e. distinguish the paths by their degrees, e.g. degree 0 is the property P itself, degree 1 is path of lenght 1 where the property P is in the starting point - in the middle)
+* Aldo: we dont make a new top-lvl ontology, but an owl-file and map properties to ODPs if possible
+** possible alternative, link to dbpedia by default
+Action points for next telecon (Dec 22)
+* Create a wiki page for raw data (for reviewers) and use the method steps as structure of the page (it's also the source of what goes in section 5 of the paper)
+* Perform the method steps
+* Fill the wiki page and section 5 with results and data from the execution of the method
+=== Alessandro, Balthasar, Lora, Valentina, 20101201 ===
+* We are refining the notion of aggregation patterns to explore the LOD for as the following subtypes:
+*# collections/collectives, generally intransitive, implying membership. The least useful for meronomies, but the most common to be expected;
+*# part-whole, transitive. Expected for geographical data and alignment with geonames;
+*# componency, nontransitive.
+* Mid-term objective is to aim towards a full paper submission at KCAP, due mid Feb.
+[[Image:patternonwhiteboard.jpg]]
+* What to look for next:
+** to what extent and how are these patterns represented in LOD?
+** Ale and Balth to describe the approach they are using for analyzing LOD. Should be drafted as well
+** on which source do you start? are u doing it manually/automatically?
+** in either case, what tools are you using?
+=== Rome, 20100923 ===
 sample pattern: loccation -> subject class (e.g. a fish species) -<=> location linked to similar subject class
@@ Line 328: / Line 450: @@
 * Domain-specific types are more specific.?
-=== Balthasar Schopman, Rome, 20100923 ===
+=== Rome, 20100923 ===
 knowledge pattern = general pattern that models knowledge
@@ Line 346: / Line 468: @@
 Discussion: what kind of restrictions should a pattern contain. EG: both node tyes & specific properties, or: only node types (being indifferent to the properties)
-=== all, Rome, 20100924 ===
+=== Rome, 20100924 ===
 We can use 'tricks' to exploit the semantics, even within a single corpus. For example, in LinkedMDB there are two roles for Mel Brooks, ie that of [http://data.linkedmdb.org/page/actor/29583 Actor] and [http://data.linkedmdb.org/page/director/8458 Director], which are both linked to the [http://www.freebase.com/view/guid/9202a8c04000641f80000000000289f2 Freebase URI] using the foaf:page property. This Freebase URI can be used to go to other corpora:
 <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/N17739970876672888293> .
@@ Line 352: / Line 474: @@
 <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://www.bbc.co.uk/music/artists/29a62e70-a15b-4e58-a39d-377f0443eb2c#artist> .
-=== Lora, Rome, 20100927 ===
+=== Rome, 20100927 ===
 * Action: relate each navigation pattern to a weight, which can express the suitability of the pattern for a given user profile, user context or other context/statistical restrictions
@@ Line 407: / Line 529: @@
 (III) target the first study in about 4 months
-=== Balthasar, Rome, 20100927 ===
+=== Rome, 20100927 ===
 Knowledge patterns:
 * composition between entities, eg between People and Companies by relation 'founder'
@@ Line 416: / Line 538: @@
 * generalization of this composition: member-relation
-=== all, Rome, 20100928 ===
+=== Rome, 20100928 ===
 * discussing the use case: Anthrax
 ** media item --> music
@@ Line 438: / Line 560: @@
 <!--[[Image:Pattern.png]]-->
-=== all, Rome, 20100929 ===
+=== Rome, 20100929 ===
 * we want to derive several versions of content patterns based on the pattern sketched yesterday, eg one with a media entity related to an organization and related to an event, that we can use to instantiate paths in LOD
 ** (we don't aim to make a pattern that's as generic as possible)
-=== Valentina, Rome, 20100930===
+=== Rome, 20100930===
 * Next steps
 ** action: balthasar will look at LOD examples that fill the CP in order to fix the CP
@@ Line 460: / Line 582: @@
 ** user-based validation. experiment desing for validating and tuning candidate navigational patterns, and identifying new navigational patterns
-=== Alessandro, Balthasar, Lora, virtual, 20101020 ===
+=== Alessandro, Balthasar, Lora, 20101020 ===
 * The Prolog-like representation schema used by Balthasar is fine for the time being, provided that an instance example is provided for each pattern (in # comments).
@@ Line 481: / Line 603: @@
 * BibSonomy group '''lodpatterns''' has been created at [http://www.bibsonomy.org/group/lodpatterns]. This group should mirror the References section of this Wiki as faithfully as possible. For joining, either communicate your Bibsonomy username to Alessandro or login onto BibSonomy.org , browse [http://www.bibsonomy.org/groups] for "lodpatterns" and click the Join button (admin approval will be required).
-=== Alessandro, Balthasar, Lora, Valentina, virtual, 20101027 ===
+=== Alessandro, Balthasar, Lora, Valentina, 20101027 ===
 * Recap on the chosen knowledge patterns and comment on their representation
@@ Line 507: / Line 629: @@
-=== Alessandro, Balthasar, Valentina, virtual, 20101103 ===
+=== Alessandro, Balthasar, Valentina, 20101103 ===
 ==== knowledge patterns (KP) ====
@@ Line 557: / Line 679: @@
 ** 1 OWL file containing the model and mappings to Ontology Design Patterns
 ** 1 Turtle file containing mappings from the model to Linked Data entities (eg Freebase, DBpedia)
 Planning
@@ Line 563: / Line 684: @@
 * ACTION Balthasar: look into using SPARQL 1.1 for the Pattern
 * ACTION both: (long term) find mappings between OWL model and LOD data-sets
+=== Alessandro, Balthasar, Lora, Valentina 20101117 ===
+Exploration of patterns:
+* Guus suggested that the implicit aggregation instantiated in the MoE pattern could be abstracted, thus augmenting the power of the pattern itself wrt recommendation. This abstraction could allow us to explore more meronomies such as those constructed in Geonames.
+* We should not rule out that additional relations might be added for increasing recall, or finding alternate paths to follow in disharmonic LOD datasets. One such example is the MoE pattern: as instantiated in Freebase for the "The Social Network" movie, we are also able to reach founder Mark Zuckerberg by the depiction of the "fictional" Mark Zuckerberg.
+* Other achievements can depend on substituting the membership relation with others such as participation into events. For the ST pattern, it could be represented by the recording event of the soundtrack. For the MoE pattern, the ensemble could be ported to an event, e.g. being able to recommend WWII movies even when their relationship with WWII is not made explicit but they depict figures who all had an involvement in WWII (e.g. Churchill, Eisenhower, Hitler and De Gaulle).
+* Action Point (A, B): explore the presence in LOD of relations that express:
+*# role-of-objects
+*# participation-to-events
+*# about-ness
+*# different types of aggregations
+** A possible distribution of efforts can be that Balthasar fetches relationships holding between entities of interest and Alessandro aggregates those related to these four patterns and maps them.
+** Note about the research method we are following: we analyse LOD for extracting patterns and generalize them for encompassing different vocabularies (bottom-up), we take general content patterns (top-down) and see if they are someway represented in LOD.
+* find variations of possible ways of traversing/instantiating a pattern
+* make patterns more generic with different instantiations, eg: aggregation relation such as ensemble-person relation in Ensemble-pattern is analog to region-location relation in Geonames
 = Misc results =
@@ Line 571: / Line 716: @@
 ** presentation of interesting facts
 * the IMDB trivia do require some semantification of the plain text, but that may be worth the effort. Example [http://www.imdb.com/title/tt0084787/trivia The Thing] and its original movie
+= Method =
+== Method ==
+* Steps for analysis of Linked Data:
+* 1. List all properties per data source (frequency, URI)
+* 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
+* 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
+* 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
+* 3c. Compare the result of the two types of grouping in 3a and 3b
+* 4. Provide statistics on the tripples (frequenceis of usage of properties)
+* 5. Discover paths of properties
+* 6. match to existing knowledge patterns, or discover new patterns
+= Analysis =
+== Analysis ==
+[[LOD:KCAP_analysis]]
+[[LOD:KCAP_analysis_discussion_pictures]]
 <headertabs/>

LOD:PatternMeetup

From STLab

Current revision

Pattern finding

Linked Data

Patterns

Research Questions and Objectives

Research questions (to be completed)

Objectives (to be completed)

Hypotheses

Knowledge Patterns

Soundtrack Pattern

Members of Ensemble Pattern

legenda

Patterns

Data

Use case : TV Recommendation

Scope

Recommendation

Use case : News Recommendation

Scope

Recommendation

Knowledge patterns

Soundtrack Pattern

Atomic knowledge patterns

Members of Ensemble Pattern

Atomic knowledge patterns

Syntax

Method

Generic Pattern Design

Observations

Navigation patterns

Resources

General remarks about resources

Bibliography

Formalization

Model

Mappings

Patterns

Soundtrack

Members of Ensemble

Valentina, Lora, Guus, Balthasar20110203

Balthasar, Valentina, Lora 20110113

Aldo, Alessandro, Balthasar, Valentina 20101222

Aldo, Alessandro, Balthasar, Lora, Valentina 20101215

Alessandro, Balthasar, Lora, Valentina, 20101201

Rome, 20100923

Rome, 20100923

Rome, 20100924

Rome, 20100927

Rome, 20100927

Rome, 20100928

Rome, 20100929

Rome, 20100930

Alessandro, Balthasar, Lora, 20101020

Alessandro, Balthasar, Lora, Valentina, 20101027

Alessandro, Balthasar, Valentina, 20101103

knowledge patterns (KP)

expressing KPs

KP model

Alessandro, Balthasar 20101110

Alessandro, Balthasar, Lora, Valentina 20101117

IMDB trivia

Method

Analysis

Views

Personal tools

Contents

Private area

Search

Toolbox