LOD:PatternMeetup
From STLab
All experiments, findings and what have you will fill this nice page.
Note: by default, this page and all pages prefixed with the LOD namespace are public.
- Introduction
- Research Questions
- Overview
- Terminology
- Use cases
- Patterns
- Resources
- Formalization
- Minutes
- Misc results
- Method
- Analysis
Pattern finding
Linked Data
Several Linked Data corpora can be used in conjunction. A corpus can have a broad or narrow domain, eg:
- General knowledge (Freebase, DBpedia)
- Geographical knowledge (Geonames)
- Movies (LinkedMDB)
- People (foaf, GTAA)
- Lexicon (wordNet)
- Subjects (GTAA)
- Music (dbtune)
Patterns
We study the usage of data of several repositories in order to determine statistically and semantically relevant patterns. Statistically relevant, because we want to use common patterns, as opposed to using properties that are only used once. Relevant patterns, since we aim to find interesting relations between entities.
First we define several patterns manually. Ideally we'll eventually find an automatic method to find interesting patterns, given the huge amount of Linked Data.
It remains unanswered what the archetype of a pattern looks like. For example, does it only specify the type of the nodes, or does it also restrict the relations between the nodes? We will start with making the patterns specific and study the effects of generalizing the patterns.
Research Questions and Objectives
Research questions (to be completed)
- What criteria do we use for selecting the patterns?
- What criteria do we use for identifying the navigation patterns?
- How do we define 'relevance'?
- Do we assign weights to the single relations, to a single whole navigation pattern, or both?
- What is the relation between the weight of a navigation pattern and the weights of its composing relations?
- How do we compute weights?
- What are the goals for each of the patterns -> to find related entities
- Evaluation scheme for the patterns: how do we evaluate that they are good for that goal?
- Can we use cognitive principles for together with experimental observations for evaluating the patterns?
- How do we generate explanations?
Objectives (to be completed)
- To increase the relevance value of results to end users
- To identify criteria for selecting the best patterns for searching (exploring) a certain domain
- To measure the cognitive relevance of patterns
- To improve the experience for end users
Hypotheses
Hypothesis: The use of Knolwedge Patterns (KPs) improves the user-interaction experience (when searching for relevant content) - the relevance of recommended content increases if its selection is based on KPs
Knowledge Patterns embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science (cf. Gangemi and Presutti, 2010). This hypothesis is based on the assumption that each pattern conveys what a user would expect to find, the most relevant knowledge about a certain entity and in a certain context.
Cognitive relevance of patterns -> the pattern includes the most relevant relations about something -> it allows to generate good explanations associated with the recommended content
Knowledge Patterns
We designed two knowledge patterns, by brainstorming and looking at existing LOD entities. We haven't been able to find all semantic relation needed for the first example in the Linked Data cloud, but it's valid patterns for which other instantiations may exist.
Soundtrack Pattern
Named as such, because it links a media entity through its soundtrack to interests of the user.
Members of Ensemble Pattern
Named as such, because it links to user interests through individuals that are members of an ensemble. The term ensemble is used here, because it is more generic than organization and band, and can be used for both.
legenda
Patterns
- knowledge pattern = general pattern that models knowledge, aka content pattern.
- navigation pattern = archetype of path in LOD, a class of recurring paths that can be (potentially) followed within one or more knowledge patterns.
If said otherwise, with pattern we mean navigation pattern
Data
- corpus : the term here is used for referring to Linked Data sets. Text corpora will explicitly be referred to as such.
Use case : TV Recommendation
Scope
The goal is to find interesting patterns in Linked Data between a User Profile (UP) and a number of TV broadcasts. The UP and TV broadcast metadata are considered a given, the focus is on the process of path finding in Linked Data.
The UP is a set of weighted interests, ie topics of interests (URI's) which are assigned a weight between 0 and 1. A higher weight implies a more relevant interest.
The TV broadcast metadata is a TV-Anytime description of a TV broadcast, which provides useful information about the broadcast. For example, it identifies the subject(s) of the broadcast, people involved and their roles, and links to semantic repositories that give more information on the broadcast, eg dbpedia.
Recommendation
Content-based recommendations are expected to be good to recommend items from the long tail. The widely used collaborative filtering recommendation strategy is very efficient and effective in finding recommendations of items that many people consume, ie of which sufficient taste data is available. Of items in the long tail there is not much taste data available, rendering content-based recommendation a more viable strategy to recommend those items.
Another advantage of content-based recommendation, especially using Linked Data, is the ability to explain the recommendation. A semantic path that lead to the recommendation can be presented to the user, which can be multifunctional:
- the user possibly learns something interesting
- the explicit reason of the recommendation can have a positive influence on the usage of the system. It is known that when a user doesn't understand a recommendation and disapproves of it, the user often abandons the system. Explanations of recommendations can prevent this negative influence.
Use case : News Recommendation
A second use case broadens the range of (rendered) content item types available for recommendation, whilst narrowing that of their intrinsic nature, or genre.
Scope
As with the TV recommendation use case, the goal is LOD-based, user-centered pattern discovery for content recommendation.
The range of acceptable content types in this use case encompasses pieces of news, regardless whether they are rendered as TV or radio broadcasts or text articles. The User Profile (UP) is modelled in the same fashion as in the previous use case.
The annotation schema and metadata vocabularies for news is as yet undiscovered. The retrieval an existing optimal schema is part of the tasks for this use case, although some degree of overlap with the TV-Anytime schema can be reasonably expected.
Recommendation
As opposed to the TV use case, we cannot completely rely upon long-tail items for news items, which are mostly designed for large-scale consumption. However, information content about niche topics and themes can be treated as a subset of recommendation with a greater weight.
Relaxing the restriction on TV broadcasts remarkably increases the range of "good candidates" for recommendation. However, restricting to news content is not expected to hamper precision significantly, since we can reasonably expect news content to be appropriately classified as such. Therefore, the first assumption is expected to hold at the expense of recall.
Knowledge patterns
Soundtrack Pattern
Atomic knowledge patterns
- Role KP specializations
- in music-performing entities (bass player)
- in organizations (founder)
- in media item authoring (director)
- other classification KPs
- interest
- Contribution KP specializations
- soundtrack (has music by). This may also indirectly result from the application of an authorship pattern to a composition pattern.
- Adaptation KP specializations
- in movies (remake of)
- Composition KP
- implicit usage of a componency pattern for Soundtrack as a component of audiovisual media.
Members of Ensemble Pattern
Atomic knowledge patterns
- Ensemble? (organization). Can be a highly specialized collection pattern.
- Role KP specializations
- Constituting role (founder). This can be seen as a combination of authorship and the plain agent/role pattern.
- Participation with a role (credit,role+dc:title combination)
- Topic KP specializations
- Generic subject of a work (subject, but also a mention in a New York Times article).
Syntax
I (Balthasar) am using semi-Prolog syntax here, since it seems natural and easy to port to my implementation. Comments on a line start with a hash (#). Variables start with a capital, eg 'Movie' is a variable, movie is not a variable.
Method
Trying to get navigation patterns from existing examples.
Generic Pattern Design
Observations
- Some properties specify direct relations between a movie and person, while others go from movie to performance to person. However, these are equally important movie-person patterns. We should be conscious of this for the implementation.
Navigation patterns
# most can be found in http://www.freebase.com/view/en/la_haine and http://dbpedia.org/resource/La_Haine Film http://rdf.freebase.com/ns/film.film.cinematography Person Film http://rdf.freebase.com/ns/film.film.edited_by Person http://rdf.freebase.com/ns/film.film.genre Film http://rdf.freebase.com/ns/film.film.directed_by Person Film http://dbpedia.org/ontology/director Person Film http://dbpedia.org/ontology/starring Person Film http://rdf.freebase.com/ns/film.film.starring Performance Performance http://rdf.freebase.com/ns/film.performance.film Film Performance http://rdf.freebase.com/ns/film.performance.actor Person Film http://rdf.freebase.com/ns/film.film.written_by Person # MPE = Music Producing Entity Film http://dbpedia.org/ontology/musicComposer MPE Film http://dbpedia.org/property/music MPE MPE http://rdf.freebase.com/ns/film.music_contributor.film Film # <http://rdf.freebase.com/ns/en.air> <http://rdf.freebase.com/ns/film.music_contributor.film> <http://rdf.freebase.com/ns/en.the_virgin_suicides_2000> .
Navigation pattern on soundtrack contribution across media items
# From video game to soundtrack (not in Freebase, DBPedia perhaps? Not working now) VG ?soundtrack_property OST # <http://rdf.freebase.com/rdf/en.quake> ?soundtrack_property <http://rdf.freebase.com/rdf/m.01hh4gv> # For recommending the soundtrack itself OST http://rdf.freebase.com/ns/music.album.primary_release Release # <http://rdf.freebase.com/rdf/m.01hh4gv> http://rdf.freebase.com/ns/music.album.primary_release <http://rdf.freebase.com/rdf/m.03694ry> OST http://rdf.freebase.com/ns/music.album.artist MPE # <http://rdf.freebase.com/rdf/m.01hh4gv> http://rdf.freebase.com/ns/music.album.artist <http://rdf.freebase.com/ns/en.trent.reznor> # WARN: no inverse property seems instantiated in Freebase MPE http://rdf.freebase.com/ns/film.music_contributor.film Film # <http://rdf.freebase.com/ns/en.trent.reznor> <http://rdf.freebase.com/ns/film.music_contributor.film> <http://rdf.freebase.com/rdf/en.the_social_network> # Required for "escaping" Freebase at any time (e.g. whenever Freebase provides insufficient information Film owl:sameAs Film # <http://rdf.freebase.com/rdf/en.the_social_network> owl:sameAs <http://dbpedia.org/resource/The_Social_Network> VG http://rdf.freebase.com/ns/base.ontologies.ontology_instance_mapping Mapping # <http://rdf.freebase.com/rdf/en.quake> http://rdf.freebase.com/ns/base.ontologies.ontology_instance_mapping Mapping <http://rdf.freebase.com/ns/m.07nfq9h>
Resources
- Analysis of LinkedMDB and Jamendo/DBTunes for KCAP
- Scenarios and examples of Linked Data usage for recommendation pathfinding. These include examples of knowledge/content pattern extraction for generalising such paths.
- Analysis and stats on Linked Data
General remarks about resources
- linkedMDB has many links between movies and people, but no 'fun facts', such as the trivia in IMDB (eg the reference of The Thing to its original)
- the best resources of media are user sourced, ie DBpedia and Freebase.
Freebase vs. DBpedia:
- Freebase (FB) can be considered superior to DBpedia (DBp), since
- FB it is constantly updated by users, DBp only 4 times per year
- FB has a better hierarchy, eg the main DBp ontology has on the same level the concepts Place, Planet, Protein, Sales, Species and SupremeCourtOfTheUnitedStatesCase. Also, Rob_Zombie has both 'is dbpedia-owl:director of' and 'is dbpprop:director of' relations to movies
- BUT: FB has a flattened property hierarchy, eg Mark Zuckerberg is both en.founder_and_ceo and organizations_founded of Facebook (both properties are on the same level)
- BUT: FB does not adapt SW standards as much as other ontologies. FB seems to have a different culture, eg it doesn't adopt foaf, skos or any other generally applicable ontologies, unlike DBp which uses both skos and foaf
Bibliography
- Aldo Gangemi, and Valentina Presutti, Handbook on Ontologies, chapter Ontology Design Patterns, Springer, 2nd edition, (2009)
- Towards a Pattern Science for the Semantic Web, Gangemi and Presutti, 2010
Formalization
(increased the level of indentation, so we can edit this whole tab at once)
Model
- Soundtrack pattern
- Top-down OWL model : Soundtrack.owl (musical artist details to be addded)
- Essential OWL model : SoundtrackMinimal.owl
- Members of Ensemble pattern
- TBD
Mappings
@prefix : <http://example.org/NoTubeCNR> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix fb: <http://rdf.freebase.com/ns/> . @prefix dbp: <http://dbpedia.org/ontology/> . # issue: not all properties are used in the same way across different data-sets. For example, foaf:page is used in LinkedMDB to link the entities to the IMDB URI. # issue: not all relations can be captured in a single triple, eg # note: freebase properties are bidirectional :mapped skos:exactMatch skos:exactMatch . :sameAs :mapped owl:sameAs . :sameAs :mapped skos:exactMatch . #-> also skos:closeMatch :sameAs :mapped foaf:page . :sameAs :mapped fb:base.ontologies.ontology_instance_mapping . # Film -> MPE :hasRole :mapped dbp:musicComposer . :hasRole :mapped dbp:music . :hasRole :mapped fb:film.music_contributor.film . # note: the following mappings are not used in one of the two existing patterns :hasRole :mapped fb:film.film.edited_by . :hasRole :mapped fb:film.film.cinematography . :hasRole :mapped fb:directed_by . :hasRole :mapped dbp:director . :hasRole :mapped dbp:starring .
Patterns
Using the SPARQL 1.1 spec
Soundtrack
PREFIX : <http://example.org/NoTubeCNR> SELECT DISTINCT * WHERE { ?item :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :avItem ; :sameAs* / :hasMusicBy ?mpe . ?mpe ( (^rdfs:subClassOf)* / rdf:type ) :mpe . ?person :sameAs* / ( :hasRole | :hasPerformance/:hasRole ) ?mpe ; ( (^rdfs:subClassOf)* / rdf:type ) :person ; :sameAs* / :referencedIn ?item . OPTIONAL { ?item :sameAs* / :remakeOf ?originalItem } }
- "?mpe" = music producing entity
- ":sameAs*" to enable using facts from multiple data-set (tried to reduce it to minimum)
- "?x (^rdfs:subClassOf)* / rdf:type ?t" means: ?x has type ?t, or any of the superclasses of ?x has type ?y
- in this query, ?item is of type :avItem (audiovisual item), because it has a soundtrack and therefore must be audiovisual
- we can use "OPTIONAL { ... }" to find interesting facts either though they're not vital for the pattern
Members of Ensemble
PREFIX : <http://example.org/NoTubeCNR> SELECT DISTINCT * WHERE { ?item :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :item ; :sameAs* / :isAbout ?ensemble . ?ensemble :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :emsemble . ?person :sameAs* / ( (^rdfs:subClassOf)* / rdf:type ) :person ; :sameAs* / :hasRole ?ensemble . OPTIONAL { ?item :sameAs* / :remakeOf ?originalItem } }
Valentina, Lora, Guus, Balthasar20110203
- possible interesting patterns:
- groups: can be detected in patterns as ?x foaf:made ?y, ?y foaf:maker ?x, ?x != ?y
- part-of, eg _ mo:Record ?x, ?x mo:Track ?y -> ?y is part of ?x
- assign 1 of 3 categories to properties? As in another project:
- biographical: describes what someone did
- material: in this case where the mp3 can be found or what format a song is in (dc:format)
- symbiotic: the meaning of something, eg tags
- some paths can contain duplicate entities / redundancy, eg ?x foaf:made ?y, ?y foaf:maker ?x. We might not want to repeat the SPARQL queries due to time restrictions, but we could make an estimate of the redundancy.
Balthasar, Valentina, Lora 20110113
- going from mappings to patterns:
- link properties in data-set to existing patterns
- manually discover and define new patterns in the process
- use complex mappings by giving homogenous view on data, while it can be represented in several ways, eg
?p isActor ?m <=> ?p performance ?z, ?z hasRole actor, ?z inMovie ?m
- see how applicable a pattern is for a data-set by looking at #triples for a kind of property, etc
- link properties back to pattern... possibly improve pattern, make it more applicable to data-set
- related work: http://www.scharffe.fr/pub/phd-thesis/manuscript.pdf
- format of defining mappings
- we must still do step 3b
Aldo, Alessandro, Balthasar, Valentina 20101222
Setup of meeting: almost F2F with Balthasar on skype and the rest at CNR.
New content available at LOD:workinprogress.
Notes:
- Notation: use a single prefix that encompasses the entire namespace before the fragment, e.g. lmdb:movie/ -> lmdb_movie:
- If there is a partial coverage of the type e.g. only 64% of subjects of triples for lmdb_movie:language is lmdb_movie:film, then it is important to understand the motivation behind this partial coverage.
- This could imply that different knowledge patterns are constructed for "overlapping" universes.
- Summarize analysis results in a ready-to-use file.
- Aldo: even an Excel file could do for an annotation schema in RDF could be created, at a later stage, for answering queries like "give me all the properties that have a domain coverage < 100%".
- Partition results using the following rationale for partitions, reflecting their OWL equivalent:
- data properties (range = rdfs:Literal, in turn typed or untyped)
- object properties
- annotation properties
- links to external data-sets
- mixed properties (range = typed + untyped literals)
- rdf:type as a property per se
- Check for the soundness of the universes in this partition. Ranges might be untyped rdf:Resources within the single "original" dataset, but their actual type might belong to the "target" dataset asserting for that resource. This can only be done manually at this stage.
- For example, we see that <http://xmlns.com/foaf/0.1/based_near> has range rdf:resource, but they're all pointing to geonames entities. By loading geonames we could identify a more specific range
- add a column that reports on the integrity of the property value (?)
- untyped values could be a problem when reasoning with datasets, we have to keep this in mind
- list names, locations, links to RDF dumps for each dataset
- Valentina finally remarked that, as we didn't have the expected advancements within one week, it is needed to have a detailed report by January 5th (partly in a wiki page for reviewers, partly in the paper draft). Furthermore, her suggestion is to also have a telecon on January 6th (I know it's a holiday but I believe it's necessary): it seems to be the only day that both Alessandro and Balthasar are available (?). I will be available to spend around one hour during the day for discussing the results.
APs
- Valentina: to send out the TeX sources of the paper draft (share via svn?)
- Alessandro, Balthasar
- list in the wiki all data-sets under analysis (associated with a link to a dump of them)
- perform on such data-sets the whole method
- report on the wiki the raw data and details about the executed procedure
- include in the paper the results of, and discussion on, the analysis performed on the data-set according to the 6-step method
Aldo, Alessandro, Balthasar, Lora, Valentina 20101215
Setup of meeting: almost F2F with Alessandro on skype and the rest at the VU
Agenda:
- LOD analysis
- to what extent and how patterns are represented
- procedure \ methods \ criterea \ tools
- plan
- action points
Notes:
- Alessandro: starting from domain specific data-sets (eg MusicBrainz) seems more useful than generic knowledge data-sets (dbpedia)
- because the domain specific data-sets contain owlsameas links to generic data-sets
- general purpose data-sets:
- YAGO
- DBPedia
- Freebase (though less than the others) ---> NOT LINKED
- domain specific datasets:
- DBtune/MusicBrainz --> Jamendo
- LinkedMDB
- bottom up approach: list all properties in data-sets and their frequencies
- how about adding authorship? it usually fuses with roles in LOD (e.g. JohnCarpenter mdb:director EscapeFromNewYork)
- interesting: different expression of relationships (with the same semantics)
- Steps for analysis of Linked Data:
- 1. List all properties per data source (frequency, URI)
- 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
- 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
- 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
- 3c. Compare the result of the two types of grouping in 3a and 3b
- 4. Provide statistics on the tripples (frequenceis of usage of properties)
- 5. Discover paths of properties
- 6. match to existing knowledge patterns, or discover new patterns
- statistics:
- a path is a sequence of relations that include the property P (i.e. distinguish the paths by their degrees, e.g. degree 0 is the property P itself, degree 1 is path of lenght 1 where the property P is in the starting point - in the middle)
- Aldo: we dont make a new top-lvl ontology, but an owl-file and map properties to ODPs if possible
- possible alternative, link to dbpedia by default
Action points for next telecon (Dec 22)
- Create a wiki page for raw data (for reviewers) and use the method steps as structure of the page (it's also the source of what goes in section 5 of the paper)
- Perform the method steps
- Fill the wiki page and section 5 with results and data from the execution of the method
Alessandro, Balthasar, Lora, Valentina, 20101201
- We are refining the notion of aggregation patterns to explore the LOD for as the following subtypes:
- collections/collectives, generally intransitive, implying membership. The least useful for meronomies, but the most common to be expected;
- part-whole, transitive. Expected for geographical data and alignment with geonames;
- componency, nontransitive.
- Mid-term objective is to aim towards a full paper submission at KCAP, due mid Feb.
- What to look for next:
- to what extent and how are these patterns represented in LOD?
- Ale and Balth to describe the approach they are using for analyzing LOD. Should be drafted as well
- on which source do you start? are u doing it manually/automatically?
- in either case, what tools are you using?
Rome, 20100923
sample pattern: loccation -> subject class (e.g. a fish species) -<=> location linked to similar subject class
pattern from E-culture: two artists woorking on the same style or related style
dimensions for these patterns: space, time, people and subject
knowledge vs. navigational pattern
intersting results from paterns that link different knowledge structures??! e.g. partof hierarchy of locations linked to subvlass hierarchy of subject types influence of levelat a certain level the generality gets too high (or specifificity may be too low: only interesting for subject geeks) of generality in hierarchy:
navigational pattern is a (partial path through a knowledge pattern/graph [if I understood it correctly]
We will take example useful for News and SocialWeb use case in NoTube weight of relations is present in navigational patterns not in knowledge patterns
navigational patterns are local, but can we generalize over these as well.? Well, this is a background hypothesis to explore can we use the level in the hierarch: say low = long tail? only if we have meta-info about the structure of the hierarchy, see analysis by Brockmuller 2003 Grouping terms supply attributes or classification dimensions to the terms grouped in the subhierarchy below them, but not to other grouping terms;
- Natural categories are at the basic level of search and in principle divide general from specific terms;
- Abstract classes are more general than the basic level, and
- Domain-specific types are more specific.?
Rome, 20100923
knowledge pattern = general pattern that models knowledge
navigaton pattern = prototype path in LOD
types of patterns: between...
- places (geonames)
- people (IMDB people)
- animals (geospecies)
- topics (genres in TVA, IMDB, )
The UP can be used as heuristics in pathfinding... eg if the user has a biological background, the path towards geospecies is chosen, but when the user has IT background another direction may be chosen.
Roles can also be important. In general, people are interested in 'actors', but the user profile might specify the user is interested in correlations in the 'director' role.
Discussion: what kind of restrictions should a pattern contain. EG: both node tyes & specific properties, or: only node types (being indifferent to the properties)
Rome, 20100924
We can use 'tricks' to exploit the semantics, even within a single corpus. For example, in LinkedMDB there are two roles for Mel Brooks, ie that of Actor and Director, which are both linked to the Freebase URI using the foaf:page property. This Freebase URI can be used to go to other corpora: <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/N17739970876672888293> . <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://dbpedia.org/resource/Melvin_Kaminsky> . <http://rdf.freebase.com/ns/en.mel_brooks> <http://www.w3.org/2002/07/owl#sameAs> <http://www.bbc.co.uk/music/artists/29a62e70-a15b-4e58-a39d-377f0443eb2c#artist> .
Rome, 20100927
- Action: relate each navigation pattern to a weight, which can express the suitability of the pattern for a given user profile, user context or other context/statistical restrictions
- top part of the current instance of the pattern --> relates to the specific knowledge
- bottom part of this pattern --> relates to the media metadata
- search scenario will be: I need to find a media item (dc:work) with given properties
- Action: include more property types (as dc;subject) to describe the media types
- Action: introduce a property as a subproperty of dc:subject --> to describe the currently implicit the 'role'
- consider whether there is enough property regularity
- some properties might be just of provenance nature, which might not be important
- but other properties, e.g. roles are important
- so, you can define a number of sub-properties to destinguish between important and not important ones
- Action: give a name of the current pattern (e.g. compositional)
- Action (Balthasar): organize a meeting with Jan, Michiel, Guus and himself to discuss the implementation of the search prototype of such a pattern
- Action: about generalization of the pattern
--> organization of the people
--> think/try out how general is the pattern; is that the right level of the generalization
--> some level of specificity might make the pattern more useful, but it would be less applicable
--> find the right trade off between specifity and generality of the pattern
--> use the User Profile, the Context and Statistical restrictions in order to filter the suitability of the pattern
--> use also the type of results you would like to achieve in order to determine the suitability of the pattern
(I) planning
- indentify two patterns
- implement a search strategy for these pattern
- do an evaluation study
- based on the results of this first study you would have to decide - (a) are those two patterns enough to cover the target result of the use cases; (b) do you need to identify (add) another knowledge pattern; or (c) is it going to be enough to just define an additonal navigation patterns
- afterwards adjust the implementation
- perform a large-scale study
(II) time frame of this planning --> 7-8 month from now on (III) target the first study in about 4 months
Rome, 20100927
Knowledge patterns:
- composition between entities, eg between People and Companies by relation 'founder'
- it's useful to define subproperties, that model interesting patterns, eg. the role pattern in the BBC EPG metadata
Composition pattern & generalization
- composition = member-bunch or member-partnership, eg Person-Organisation
- generalization of this composition: member-relation
Rome, 20100928
- discussing the use case: Anthrax
- media item --> music
- media item --> person (+ role) --> location
- person --> media item + location
- look at statistics on instances in order to make hypothesys on recall and precision
- weights of patterns, eg 'creator' might be equal on a Content Pattern lvl, but on a Navigation pattern you might want to differentiate based on the application, user profile, etc.
- we draw the two examples, extract the schema, and compare the two CPs
- pattern 1: Movie --> subject --> Organization (Topic) --> founder --> Person --> sameAs --> NYT; --> sameAs --> dbpedia; --> sameAs --> BBC Programs --> retrieve other media items, e.g. Articles (NYT), Articles (dbpedia), Broadcast (BBC PO)
- Questions to still explore in this pattern:
- (1) what other entities exist in Freebase (with a corresponding vocabulary to discover them) at the same level of 'Organization', e.g. 'Events'
- (2) how is the role of the person determined depending on whether it is 'Organization' or anotehr entity 'Events'?
- (3) which sources you would like to use in this pattern, e.g. NYT, dbpedia, EPG data, others?
- in the pattern can be the alignment, e.g. sameAs, exactMatch, which leads to a specific source, e.g. NYT, Broadcast
- Questions to still explore in this pattern:
- pattern 2: soundrack-centered --> Media Item --> has soundtrackBy --> Artist/Band --> founder --> Person; Media Item --> hasGenre; Media Item --> hasDirector --> Person; Artist --> hasGenre
- Questions about the two patterns:
- when and how the user can switch from pattern 1 to pattern 2 and vise versa
- Read the paper that Valentina sent per email today
- let's start collecting bibliographic references. We might want to use bibsonomy.org
Rome, 20100929
- we want to derive several versions of content patterns based on the pattern sketched yesterday, eg one with a media entity related to an organization and related to an event, that we can use to instantiate paths in LOD
- (we don't aim to make a pattern that's as generic as possible)
Rome, 20100930
- Next steps
- action: balthasar will look at LOD examples that fill the CP in order to fix the CP
- we fix the CP, its structure and vocabulary, based on the actual data available on LOD
- we will have the first skype on October 27th 3.30pm, maybe one on Oct 22nd, with Ale, Balth, and Lora
- Alessandro will look at more specific patterns, the ones you can use when going out from the "starting CP", which is the more general one for Media Item recommendation
- both of them will look at LOD-compliant CPs, meaning they are instantiated in LOD
- define the CPs and identify candidate navigational patterns
- navigational patterns have a characterization
- implementation will be prototyped by reusing/extending NoTube-IKS frameworks
- let's keep updating the wiki
- we start collecting related work: lora, guus, val, aldo send pointers to ale and balth
- ale and balth will also look for related work and collect them
- action: ale creates a bibsonomy group for initially collecting references
- action (val): better characterize the news use case with Alex
- action (val): collect from Alex an example schema and source for the news example
- user-based validation. experiment desing for validating and tuning candidate navigational patterns, and identifying new navigational patterns
Alessandro, Balthasar, Lora, 20101020
- The Prolog-like representation schema used by Balthasar is fine for the time being, provided that an instance example is provided for each pattern (in # comments).
- Action (A,B) include the two patterns on the wiki (e.g. the two patterns that we decided on in Rome)
- with examples (instance-based)
- with the abstracted knowledge patterns
- Naming: the names so far used for relations, knowledge and navigation patterns are ambiguous and deceiving (eg. mirroring of the Topic CP into the "topic" segment of the Film NP).
- Action (A,B, agreed) give a name of each of the patterns, e.g. in terms of goal of the pattern.
- Also, more suitable names for entities and variables e.g. Music Producing Entity
- Implementation issues, such as representing patterns for reasoning and consumption, are frozen until the knowledge patterns are extracted and formalized.
- Usage of additional patterns (such as the one originally sketched for geospecies/geonames-based documentary recommendation) are also frozen, but they could also be plugged into the more generic sub-patterns of the existing ones.
- Alex from TXT has provided some example taxonomies and annotated content from their client news providers such as Rai and Mediaset. While the schemas themselves are custom and not much reusable, their content can identify relevant relationships to discover
- Action (A) to email Alex about permission to publish these data on the Wiki, or else to use a private space.
- BibSonomy group lodpatterns has been created at [1]. This group should mirror the References section of this Wiki as faithfully as possible. For joining, either communicate your Bibsonomy username to Alessandro or login onto BibSonomy.org , browse [2] for "lodpatterns" and click the Join button (admin approval will be required).
Alessandro, Balthasar, Lora, Valentina, 20101027
- Recap on the chosen knowledge patterns and comment on their representation
- Pattern for linking media item with people via soundtrack (formerly Soundtrack/Media contribution)
- Should be defined more generically, given an intuitive name, instantiated with an example and mapped to one or more actual navigation patterns.
- Pattern for linking media item with people via organization (formerly Founders/Organization membership)
- Also should be defined more generically, given an intuitive name, instantiated with an example and mapped to one or more actual navigation patterns.
- Pattern for linking media item with people via soundtrack (formerly Soundtrack/Media contribution)
- By this telco the two knowledge patterns had been merged, or rather abstracted, to a single macroscopic knowledge pattern whose parts were mapped to instances that indicated two separate navigation pattern groups. This can be deceiving, so it's probably better to keep the two knowledge patterns separated.
- Refinement/enrichment of the two knowledge patterns is not ruled out, but this depends on the outcome of analysing the related retrieved media items.
- e.g. including role for people, or including role for organization, or include typing of topics, or role of the topics, etc.
- The IKS-related news use case will be dropped in favor of a travel recommendation use case, which however is isomorphic to the news one for most of its part.
- Knowledge patterns should have an even more intuitive name, so that you can immediately understand from the name what their purpose is. Keep KP representation distinct from its associated navigation patterns representation. Also, the notation used for the diagrams is not ntirely clear and would require a legenda.
- Next objectives
- Recommendations 1: what types of related media items can be retrieved using each navigation pattern
- Recommendations 2: see whether there are possible combinations of navigation patterns extracted from the two navigation patterns, so as to retrieve more interesting content.
- Next two telcos: Alessandro is unable to attend at 3pm due to courses.
- Nov 3 moved forward to 11:30am
- Nov 10 TBD
Alessandro, Balthasar, Valentina, 20101103
knowledge patterns (KP)
- arrows in KPs are implicitly bidirectional
- this should be explained in text or in legenda
- ACTION Balthasar: add to legenda
- ACTION Balthasar: refine figures of KPs (directions of arrows)
- we decide to work out the examples
- ACTION Alessandro,Balthasar: search for actual data that instantiate the two examples of KPs
expressing KPs
- KPs will be expressed in OWL
- ACTION Alessandro: represent the KPs in OWL (long term action point)
- the OWL specification of the KPs is in our own namespace (eg http://ontologydesignatterns.org/cp/owl/<pattern-name>)
- these properties/classes in our own namespace will be linked to existing LOD properties
- kp:directedBy owl:equivalentProperty <http://rdf.freebase.com/ns/film.film.directed_by>
- or
- kp:ensemble rdfs:subclassOf <http://dbpedia.org/resource/Metallica>
- we have to decide on which equivalence properties to use, which is a minor but important issue: owl:equivalentProperty VS rdfs:subPropertyOf
- requirement: complex alignments, eg "Person ReferencedIn MediaItem" should be linked to "Person freebase:hasPerformance Performance" + "Performance freebase:performanceIn MediaItem"
- possible solutions:
- owl2 property chains
- swirl
- Prolog
- possible solutions:
KP model
- we start referring to the knowledge patterns model:
- the content patterns fill the logical + conceptual facades (see KP pictures) in the paper on knowledge patterns
- the navigation patterns fill the interaction facade
- in this way, we embed in a single model all aspects of a pattern: the owl, the graphical representation, the vocabulary, the navigation...
- and we can move from one to the other
- for example, we define possible entry points and weights of navigation paths in the navigation patterns
- while we define alignements between entities in the owl definition
- however both belongs to the same knowledge pattern
Alessandro, Balthasar 20101110
representing the Knowledge Pattern:
- SPARQL 1.1's expressiveness seems very appropriate to represent Knowledge Patterns, eg
- { ?x !(rdf:type|^rdf:type) ?y } means relation between x and y, which is not "x rdf:type y" and also not "y rdf:type x"
- { ?x rdf:type/rdfs:subClassOf* ?type } means x is related to y via rdf:type and then 0 or more times rdfs:subClassOf
OWL model
- the SPARQL query will use concepts and properties of an OWL model in our own namespace
- Alessandro started on drafting this OWL model
- discussion: keep the model as simple as possible with links to Ontology Design Patterns, which specify the exact position of the entities in a detailed hierarchy
- modular approach:
- 1 OWL file containing the model and mappings to Ontology Design Patterns
- 1 Turtle file containing mappings from the model to Linked Data entities (eg Freebase, DBpedia)
Planning
- ACTION Alessandro: continue working on OWL model
- ACTION Balthasar: look into using SPARQL 1.1 for the Pattern
- ACTION both: (long term) find mappings between OWL model and LOD data-sets
Alessandro, Balthasar, Lora, Valentina 20101117
Exploration of patterns:
- Guus suggested that the implicit aggregation instantiated in the MoE pattern could be abstracted, thus augmenting the power of the pattern itself wrt recommendation. This abstraction could allow us to explore more meronomies such as those constructed in Geonames.
- We should not rule out that additional relations might be added for increasing recall, or finding alternate paths to follow in disharmonic LOD datasets. One such example is the MoE pattern: as instantiated in Freebase for the "The Social Network" movie, we are also able to reach founder Mark Zuckerberg by the depiction of the "fictional" Mark Zuckerberg.
- Other achievements can depend on substituting the membership relation with others such as participation into events. For the ST pattern, it could be represented by the recording event of the soundtrack. For the MoE pattern, the ensemble could be ported to an event, e.g. being able to recommend WWII movies even when their relationship with WWII is not made explicit but they depict figures who all had an involvement in WWII (e.g. Churchill, Eisenhower, Hitler and De Gaulle).
- Action Point (A, B): explore the presence in LOD of relations that express:
- role-of-objects
- participation-to-events
- about-ness
- different types of aggregations
- A possible distribution of efforts can be that Balthasar fetches relationships holding between entities of interest and Alessandro aggregates those related to these four patterns and maps them.
- Note about the research method we are following: we analyse LOD for extracting patterns and generalize them for encompassing different vocabularies (bottom-up), we take general content patterns (top-down) and see if they are someway represented in LOD.
- find variations of possible ways of traversing/instantiating a pattern
- make patterns more generic with different instantiations, eg: aggregation relation such as ensemble-person relation in Ensemble-pattern is analog to region-location relation in Geonames
IMDB trivia
- IMDB trivia can be useful for:
- recommendations
- explanations of recommendations
- presentation of interesting facts
- the IMDB trivia do require some semantification of the plain text, but that may be worth the effort. Example The Thing and its original movie
Method
- Steps for analysis of Linked Data:
- 1. List all properties per data source (frequency, URI)
- 2. List all types used with those properties (i.e. the universe of properties (universe of a property = domain and range))
- 3a. align properties to top-level properties (eg use ODP from DOLCE, DnS, ...)
- 3b. align types to top-level types and group properties according to their universes (also check datatype properties)
- 3c. Compare the result of the two types of grouping in 3a and 3b
- 4. Provide statistics on the tripples (frequenceis of usage of properties)
- 5. Discover paths of properties
- 6. match to existing knowledge patterns, or discover new patterns