Semantic MediaWiki Conference (SMWCon) Fall 2017

Last week there was the SMWCon, the european conference on Semantic MediaWiki. It was held in Rotterdam. Our venue was directly in the Zoo in the middle of the aquarium. So we could watch sharks and turtles during the talks!!

But there were also very interesting talks. The information to most of the talks you can find here. I will describe some of the talks that were interesting for me because they dealt with stuff I might use in the future.

The keynote on the first day dealt with firefighters and their problem with information overload. Also fire fighters have the problem: you have a lot of information, it is hard to find it, it is in different formats (GPS, Information Systems, paper copies). But fire fighters do only have limited time until they reach the burning building and have to act then and cannot loose even more time reading documentation. So they need the right information in time, which is quite difficult.

He also stressed that machine learning and reasoning over knowledge are nice, but you sometimes especially as fire fighters you have completely new cases, but actually the world and technology changes, so you still will face new obstacles. An example could be the case of a burning of a car with and electric engine.

Karsten then introduced the new stuff they develop for SMW 3.0, which will be a major release. He also stressed that the software needs better documentation, something I also encountered when I tried to introduce new people to SMW. But this is a problem in lots of Open Source projects: People like to code, but not to write documentation. This also shows that for Open Source projects you do not only need coders ;))

Tobias introduced annotation tools for images, text and videos. This was developed as part of our projects and we would be excited to see use-cases and of course feedback to the extensions.

The keynote on Friday was about a project called slidewiki. This is basically a wiki that allows you to create and re-use presentation slides, annotate them, link them to topics and so on. It is really cool because other projects like Slideshare do note allow this and also do not allow forking of that.

The second talk was by Cindy Cicalese, who works for Wikimedia Foundation. She introduced that she will be advocating the 3rd party developers more in the project management of MediaWiki. You can go to their site to see stuff they want to do. In short:

  • They want to do content revision, so making more than one slot on a wiki page, a functionality that right now you can only get with SMW and PageForms
  • They also tackle the installation, updating and maintaing of wikis. This is actually a very important topic that basically everyone in the community faces. We normally do not have one wiki, we have a lot more. And updating every single one and also setting up is cumbersome.
  • They want to introduce a roadmap to make the development of MediaWiki more predictable. This would also help 3rd partys because we can tell our customers if a certain feature will be implemented soon or not

After that, Alexander Gesinn introduced a pre-configured virtual machine. Actually this might be nice for people who only want to try out SMW, with productive usage you still face the problems with maintenance. He named three things every enterprise-wiki has to have (and I agree with him):

  • Semantics (SMW + PageForms)
  • VisualEditor to not torture users with Wikisyntax
  • a responsible skin to have nice-looking wiki on mobile devices

Remco hat on thursday also a talk about a similiar topic, he called it wiki product lines. A product line is similar to the industry where you have different TVs that are all basically the same, only the screen size changes with the different products. He explained from a little bit more theoretical standpoint where he sees potential. To me this looks like a problem that will be tailored and we might have some (hopefully) completely free and documented solutions for this.

End the end of the day there was also a workshop organized by my colleague Lia and me. We said to set up a page on the SMW-Wiki where we collect projects and how we might use the stuff in the future.

Overall, it was a nice conference and I got to know many nice people. Also thanks to the organizers 😉

Semantic Web Technologies – RDFS and SPARQL

Summary of week two for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.

Reification allows you to make reference statements. Therefore a statement also gets an URL. It also allows you to make statements about statements and assumptions about assumptions (e.g. Sherlock Holmes thinks that the gardener murdered the butler).

RDFS or RDF Schema puts this further. It adds more semantic expressivity, you can get more knowledge out of the graph. It is also the simplest of the modelling languages (OWL is another, but will be covered later) and describes vocabularies for RDF. What can we do with it? We can build classes to model structures (Planet is class, satellite is subclass of planet, artificical satellite is subclass…, earth is planet, moon is planet and planet of earth, sputnik is artifical satellite and satellite of earth). From this we can infer some information like: an artificial satellite, is also a satellite. Sputnik is an artificial satellite of earth, so it is also a satellite of Earth!

The rest of the lecture focused on SPARQL. This is the language to query knowledge bases stored in RDF. It is similar to SQL syntactacally, but works somehow different because in RDF we are dealing with graphs. You can use it via an endpoint, which is an RDF database that has a SPARQL protocol layer and gives back HTML. It offers you:

  • extraction of data
  • exploration
  • transformation
  • construction of new graphs
  • update graphs
  • logical entailment (inferences)

Results are returned as a triple pattern in turtle + variables. You somehow define a subgraph that the query has to match. Query example at DBPedia. As you can see the syntax is close to SQL, it also offers you filters to reduce the amount of results. If you want to try out SPARQL, OpenHPI recommend the use of Fuseki  or Wikidata.

 

 

 

Semantic Web Technologies – Knowledge Engineering

Summary of week sic for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.

Linked Data Engineering

In general it is difficult to get data, because it is distributed into different databases and you need different APIs to get the data -> data islands. Applying semantic web technologies allows you a standardized interface to access this data. This allows easier reuse and sharing of data.Tim Burners-Lee: “Value of data increases if data is connected to other sources.”There are four principles for linked data:

  1. Use URIs as names (not only web pages, but also real objects, abstract concepts and so on)
  2. Use HTTP, so people can look up the names, but also machines
  3. Provide useful information using RDF and SPARQL
  4. Include links to other URIs, so people can discover more things

If you want to create linked open data, you should have:

  • data available with an open licence
  • machine-readable format
  • non-proprietary format
  • use open standards from W3C
  • link to other data sources

Tour through the linked data cloud (Uni Mannheim):

All these sources are hold together by ontologies. Examples for ontologies:

  • OWL (owl:sameAs or owl:equivalentClass)
  • SKOS (simple knowledge organization system) applied for definitions and mappings of vocabularies and ontologies. Allows you to give relations like narrower or broader, relations and matches.
  • umbel (upper mapping and binding exchange layer) maps into DBPedia, geonames and Wikipedia

Linked Data Programming

How to publish data for Semantic Web? The best way is via a SPARQL endpoint via OpenLink Virtuoso, Sesame, Fuseki. These endpoints are RESTful Web Services, that you can query via JSON, XML and so on. Another way is via Linked Data Endpoints (Pubby, Jetty). There are overlays over the SPARQL endpoint. Another way is via D2R servers, that translate data from non-RDF databases into RDF data. A source for availability is datahub.io.

Metadata and Semantic Annotation

Semantic Annotation: you attach semantic data to your source. Formal:

  • subject of the annotation,  (a book, represented by isbn-number)
  • object of the annotation, the author
  • predicate that defines the type of relationship,relationship, that the author is author of the book
  • context, in which the annotation is made (who did the annotation and when?)

Examples:
Open Annotation Ontology (developed by W3C)

Named Entity Resolution

When we do semantic annotation we want to get the meaning of this string, like additional information (you annotate Neil Armstrong and get more information about him). The main problem is ambiguity, if you enter “Armstrong” in a search engine, you also get pictures of Lance Armstrong and Louis Armstrong. Context helps us to specify the search and overcome this problem.

Resolution: mapping the word to a knowledge base in order to solve ambiguity
Recognition: locating and classifying entities into predefined categories like names, persons, organizations

Example: Armstrong landed the eagle on the moon
From that you do every kind of combination for these entities and if there are co-occurences in the texts, you can find the best matches. Another way is to look at dbpedia where you can see which of the possible options do have connections to each other.

Semantic Search

When you use a search engine, you will also find ambiguious results. With semantic annotated texts, you can overcome ambiguity. Based on this you can do entity-based IR, so it is language independent. You could also include information from the underlying knowledge base or use content-based navigation and filtering (filter pictures vs. drawings).

You can use it for:
Query String refinement (like auto-completion, query enrichment)
cross referencing (additional information for the user taken from knowledge base)
Fuzzy search (give nearby results, helpful if you have very few results)
exploratory search (visualize and navigate in search space)
resoning (complement search results with implicitly given information)

Another example is entity based search. You match a query against semantically annotated documents (simple entity matching). You can also get similarities, like between Buzz Aldrin and Neil Armstrong (similarity-based entity matching)
relationship-based entity matching: You have the entities astronaut and apollo 11. There are also relationships between astronaut, apollo 11 and Neil Armstrong.
—> these results can complement your search!

Another approach is directly selecting named Entities. So you directly click in the entity that you want. Example are the articles at blog.yovisto.com

Exploratory Search

Extension of traditional search and Semantic Search

  • Retrieval: You look for something specific (like a book) and know how to specify it
  • Exploration: You already read “1984” and want to read a book that is close to this one. In a library you would ask the librarian and he will tell you what to read next. We want to have this in our search system as well!! In a traditional library you can look at the shelves and can maybe also find another book that is similar.

For whom is it made?

  • People that are unfamiliar with the domain
  • People who are unsure about the ways to archive their goals
  • People who are unsure about their goals, you want to find something, but you cannot specify it

You can make graphs with the semantic information you have in order to give the user more information about the original result (more books by one author). You could also get broader results (you read a book by Jules Verne and get as a recommendation books by H.G. Wells, who was influenced by Jules Verne). Another example: start with Neil Armstrong — Apollo 11 and other crew members — Apollo 11 is part of apollo program and you find other apollo programs — you find apollo 13 and find out that there was an accident — you find the crash of the space ship “challenger”.

Semantic Web Technologies – Ontological Engineering

Summary of week five for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.

Ontology Engineering

Pyramid for knowledge management:

  • Data: raw data, facts about event
  • Information: a message to change the receiver’s perception
  • Knowledge: experience, context, information + semantics
  • Wisdom: application of knowledge in context

In general it makes sense to follow some methodologies because creating an ontology is quit complex.

Ontology Learning

Can we create ontologies automatically? Ways to do this:

  • via text mining from text
  • via linked data mining from e.g. RDF graphs
  • concept learning in Description Logics and OWL (related to linked data mining, but also)
  • crowdsourcing via Amazon Mechanical Turk or games with purposein short there are three steps: term extraction – conceptualization – evaluation. Actual challenges in Ontology Learning:
  • Heterogenity
  • Uncertainty: the quality is low, you cannot be sure whether the information is right or not
  • You need consistency because otherwise you cannot do reasoning
  • Scalability: make sure that it is scalable
  • Quality: you neet to evaluate it and make sure it is right
  • Interactity: you need to involve users to help you improve the ontologies

Ontology Alignment
What is it? You try to find similarities between ontologies in order to combine them. But: an ontology only models reality, it is NOT the reality. The problems are similar to natural language: you run into ambiguities. You can also have problems with different conventions (time in seconds vs. time in time points), different granularities and different points of views.

You have differences on the syntactical, terminological. semantical or semiotic (pragmatic) level

Ontology Evaluation

This is the quality of an ontology in respect to a particular criterion. There are two basic principles:

  • Verification: it encoding and implementation correct (more the formal side)
  • Validation: how good is the model and how well does it match reality?

Criteria for validation:

  • correctness:
    • Accuracy (precision and recall)
    • completeness
    • consistency
  • quality:
    • Adaptibility
    • Clarity
    • computational efficiency
  • organizaional fitness (how well does it integrate in my software/organisation
  • conciseness

Semantic Web Technologies – RDF

Summary of week one for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.

The first week covers the basic principles of the technologies for the semantic web, especially RDF, which is one of the languages you can use for encoding information semantically. The basic principle behind the technologies are triples, which consist of subject-predicate-object. So you encode all your knowledge in that way, for example: Pluto – discovered – 1930.

One problem is that because of the syntax the expressions tend to be very long, so you can use abbreviations with namespaces like in XML or turtle, which helps you also to shorten your syntax.

Mr. Sack claims that with Semantic Web technologies you can go one step further into a web of data, because it is very easy to create data that is machine-readable. He gives a lot of examples using DBPedia. This site also provides a good interface, where you can download data in different machine-readable formats like xml or json. Example-page for Pluto.