Summary of week sic for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.
Linked Data Engineering
In general it is difficult to get data, because it is distributed into different databases and you need different APIs to get the data -> data islands. Applying semantic web technologies allows you a standardized interface to access this data. This allows easier reuse and sharing of data.Tim Burners-Lee: “Value of data increases if data is connected to other sources.”There are four principles for linked data:
- Use URIs as names (not only web pages, but also real objects, abstract concepts and so on)
- Use HTTP, so people can look up the names, but also machines
- Provide useful information using RDF and SPARQL
- Include links to other URIs, so people can discover more things
If you want to create linked open data, you should have:
- data available with an open licence
- machine-readable format
- non-proprietary format
- use open standards from W3C
- link to other data sources
Tour through the linked data cloud (Uni Mannheim):
- Government data (data.gov.uk)
- media data
- user-generated content (semanticweb.org)
- linguistic data
- bibliographic data (bibsonomy.org)
- life sciences
- cross-domain (dbpedia.org, w3.org, lexvo.org)
- social networking (quitter.se)
- geographic (geonames.org)
All these sources are hold together by ontologies. Examples for ontologies:
- OWL (owl:sameAs or owl:equivalentClass)
- SKOS (simple knowledge organization system) applied for definitions and mappings of vocabularies and ontologies. Allows you to give relations like narrower or broader, relations and matches.
- umbel (upper mapping and binding exchange layer) maps into DBPedia, geonames and Wikipedia
Linked Data Programming
How to publish data for Semantic Web? The best way is via a SPARQL endpoint via OpenLink Virtuoso, Sesame, Fuseki. These endpoints are RESTful Web Services, that you can query via JSON, XML and so on. Another way is via Linked Data Endpoints (Pubby, Jetty). There are overlays over the SPARQL endpoint. Another way is via D2R servers, that translate data from non-RDF databases into RDF data. A source for availability is datahub.io.
Metadata and Semantic Annotation
Semantic Annotation: you attach semantic data to your source. Formal:
- subject of the annotation, (a book, represented by isbn-number)
- object of the annotation, the author
- predicate that defines the type of relationship,relationship, that the author is author of the book
- context, in which the annotation is made (who did the annotation and when?)
Open Annotation Ontology (developed by W3C)
Named Entity Resolution
When we do semantic annotation we want to get the meaning of this string, like additional information (you annotate Neil Armstrong and get more information about him). The main problem is ambiguity, if you enter “Armstrong” in a search engine, you also get pictures of Lance Armstrong and Louis Armstrong. Context helps us to specify the search and overcome this problem.
Resolution: mapping the word to a knowledge base in order to solve ambiguity
Recognition: locating and classifying entities into predefined categories like names, persons, organizations
Example: Armstrong landed the eagle on the moon
From that you do every kind of combination for these entities and if there are co-occurences in the texts, you can find the best matches. Another way is to look at dbpedia where you can see which of the possible options do have connections to each other.
When you use a search engine, you will also find ambiguious results. With semantic annotated texts, you can overcome ambiguity. Based on this you can do entity-based IR, so it is language independent. You could also include information from the underlying knowledge base or use content-based navigation and filtering (filter pictures vs. drawings).
You can use it for:
Query String refinement (like auto-completion, query enrichment)
cross referencing (additional information for the user taken from knowledge base)
Fuzzy search (give nearby results, helpful if you have very few results)
exploratory search (visualize and navigate in search space)
resoning (complement search results with implicitly given information)
Another example is entity based search. You match a query against semantically annotated documents (simple entity matching). You can also get similarities, like between Buzz Aldrin and Neil Armstrong (similarity-based entity matching)
relationship-based entity matching: You have the entities astronaut and apollo 11. There are also relationships between astronaut, apollo 11 and Neil Armstrong.
—> these results can complement your search!
Another approach is directly selecting named Entities. So you directly click in the entity that you want. Example are the articles at blog.yovisto.com
Extension of traditional search and Semantic Search
- Retrieval: You look for something specific (like a book) and know how to specify it
- Exploration: You already read “1984” and want to read a book that is close to this one. In a library you would ask the librarian and he will tell you what to read next. We want to have this in our search system as well!! In a traditional library you can look at the shelves and can maybe also find another book that is similar.
For whom is it made?
- People that are unfamiliar with the domain
- People who are unsure about the ways to archive their goals
- People who are unsure about their goals, you want to find something, but you cannot specify it
You can make graphs with the semantic information you have in order to give the user more information about the original result (more books by one author). You could also get broader results (you read a book by Jules Verne and get as a recommendation books by H.G. Wells, who was influenced by Jules Verne). Another example: start with Neil Armstrong — Apollo 11 and other crew members — Apollo 11 is part of apollo program and you find other apollo programs — you find apollo 13 and find out that there was an accident — you find the crash of the space ship “challenger”.