Semantic MediaWiki Conference (SMWCon) 2018

Semantic Media Wiki Conference (SMWCon) 2018 took place in Regensburg. In this post, I highlight some of the talks I liked or where I want to share my opinion on. Sorry if I did not talk about yours. This is no critique, rather due to my limited time ;))

The first day was all about business applications. It seemed to me that there were a lot of efforts to somehow standardize solutions for project management, technical documentation and other stuff. One remarkable thing was the project zoGewoon, where the company put a lot of effort in design and usability of the system. This turned out to be very interaction and easy to do, which makes sense for the target group being people with disabilities looking for a place to live. Another cool thing was the Extension VEForAll that introduces Visual Editor working within Forms. This is not possible yet and gives a great advantage when it comes to usability because Page Forms as well as the visual editor helps a lot to make editing wiki pages easier.

The keynote was about how language shapes perception. Marc van Hoof linked this from Orwell’s distopia of Newspeak to the way how we organize knowledge in ontologies. He also votes for a user-centred way of naming and creating these ontologies in order to make it easier for users to perceive information and link this to their everyday lifes. This also leads to the concept of folksonomies, although my impression was that Folksonomies are below the hype time, but maybe they come back…

On the second day my favorite talk was the presentation of the new features of Semantic MediaWiki 3.0. There were several cool things like the improvement of the list format and data tables format. Also you can enter now semantic querys in the search field directly. Karsten also visited the Wikimedia technical conference and said that MWcore will be more open to wishes of third parties, which is remarkable.

On the breaks the big topics were of course the new features in SMW 3.0 like migration and new features. Another topic was (maybe because there were many people from companies and not so much from research) the way of telling people of smw (and then also selling it). It was kind of consensus, that people tend not to talk about wikis anymore, but knowledge management systems. First, because people tend to think about wikipedia when it comes to the openness but also because there are many ways of tweaking input (thanks to forms) and style (thanks to tweeki and Chameleon) were can customize the system very much and be a lot freer than only provide a clone of Wikipedia.

Viktor Schelling talked about WSForm, which might replace page forms and does some stuff very good, like providing templates at every page and not only Template-Pages. I am very excited to see their release and try it out.Talking about graphs, Sebastian Schmid has an improvement for Result formats, which is using the library mermaid, which can display graphs, the one basic principle of knowledge organisation in SMW. I am happy to see the application of this because there is lot of graph data stored in SMWs out there.

At the end it was two nice days in the beautiful city of Regensburg. The conference dinner was really nice in the old city centre of Regensburg. Also thanks to the people at Gesinn.it and TechBase Regensburg for organizing and providing the place.

Installing Nextcloud on a Raspberry Pi using snap

Nextcloud offers some great functions when it comes to sharing data in your local network. I mainly needed a tool to sync a calendar on my network with my devices. I was also thinking about a lighter software like Baikal, but the installation of most of the others seemed more difficult to me than nextcloud, so I choose this.

I used the snap package of nextcloud, so I did not need to do a lot of configuration for the server and it is just ready to start. Have a look at the snap installation.

Connecting calendar clients

I connected an android client, evolution and an ios client. The connection of the android device and evolution is pretty easy, it just works if you create a new calendar using the provided link. For android you have to use the software DAVx5 to establish a connection.

On ios the connection only works with https enabled (I did not find another way to set it up, if you know, please share). Therefore you need to set up https using snap. (if you run into problems on your pi, maybe have a look here) After this you can simply create a new caldav-connection on your ios device. In my case it also worked directly with the basis url (e.g. https://yourdomain.com/nextcloud) without the other adresses the manual wants you to use.

My Audio setup

In this article I will give a little big of background to my music system and why I use the certain components.

First, the speakers: I use some speakers which enable me to give input via USB. This has the great advantage I can use whatever computer I want to push the music signals to the speakers. (side note: I am using some speakers by Nubert, which also have digital and analogue input for other sources) My music comes directly from a raspberry pi with Mopidy and Raspotify. These two tools allow me to play basically all the music I have on hard disk or to stream it from the internet. I also programmed some small software, that shows the title and artist on a small display and I added some buttons, so I can pause the track and turn the pi off.

The connection using USB is actually very great since it reduces the amount of cables a lot. Before this, I used an external sound card, which also worked well, but needed more space. Another option would be HiFiBerry, but I do not have any experience about sound quality.

Why did I use this setup and not buy directly speakers like from Sonos or other systems? I like to keep my system as easy to repair and change as possible. My speakers will probably last longer than my raspberry, so I want to able to change the way music comes to the speakers. Also, I want to be able to change my streaming service. All this might or might not be possible using out-of-the-box systems.

And finally, the raspberry pi and mopidy are open systems, where you can add your own code in order to improve your setup.

Raspotify – Turn your pi into a Spotify server

As some of you know I really like to use a raspberry pi as music server. Therefore I want to introduce a nice tool: Raspotify. This little program turns your raspberry pi into a Spotify server as you might also get it when listening on your computer or on other systems like Sonos. The really cool thing is: it just works out of the box and you can use your phone as a remote control. I actually prefer it right now over the usage of Mopidy-Spotify, which has some problems since Spotify is blocking the API more and more. For instance, it is not possible anymore to load playlists.

I do not know if this will improve in the future or if Spotify nudges us to use their service. I would prefer to have a system working directly together with mopidy, still bundles everything, but we will see.

The installation procedure of Raspotify is actually as easy as it can get: you only need to type one command and it does everything. But pay attention to change the settings (in the paragraph Configuration at the page) to get higher quality playback.

Digital Humanities at Hochschule Darmstadt

In summer semester me and Professor Rittberger will be giving a class about digital humanities at Hochschule Darmstadt in the major information science. Here is our syllabus, I will try to upload slides as well (but in German).

We want to give a broad overview about what Digital Humanities mean. There are also other classes dealing with text-mining, so we do not focus so much on it (there are four lessons about it, though)

  1. Introduction to Digital Humanities: What are DH, what can we do with digital methods
  2. Research methods: Qualitative and quantiative methods in social sciences, hermeneutics, virtual research environments
  3. Law, ethics: Basic understanding what law means and what problems it can cause. This leads to data management and open data
  4. XML: Basics about XML, why DTDs are useful, standards like TEI (2 sessions), XML regarding ontologies
  5. Editions and digitalization: What are editions, how can we create them digitally? How do we digitize content?
  6. Basics of Text analysis: Distant Reading, Google n-grams, how new methods in text analysis can help in research
  7. Named-Entity-Recognition: We chose this problem of NLP to give an overview of what can be done using new technology and also to compare approaches from computer science like machine learning with approaches from information science and semantic web
  8. Topic Modelling: Basic introdution and practial usage with R
  9. Network analysis: Basics of network analysis and how to use it for instance for plays. Tool: Gephi
  10. Geoinformation: How can we code geographical data, how can we use it in DH?
  11. 3D-Modelling: What new approaches are there using 3D-Modelling, how can we use it in DH? Tool: Blender

 

Semantic MediaWiki Conference (SMWCon) Fall 2017

Last week there was the SMWCon, the european conference on Semantic MediaWiki. It was held in Rotterdam. Our venue was directly in the Zoo in the middle of the aquarium. So we could watch sharks and turtles during the talks!!

But there were also very interesting talks. The information to most of the talks you can find here. I will describe some of the talks that were interesting for me because they dealt with stuff I might use in the future.

The keynote on the first day dealt with firefighters and their problem with information overload. Also fire fighters have the problem: you have a lot of information, it is hard to find it, it is in different formats (GPS, Information Systems, paper copies). But fire fighters do only have limited time until they reach the burning building and have to act then and cannot loose even more time reading documentation. So they need the right information in time, which is quite difficult.

He also stressed that machine learning and reasoning over knowledge are nice, but you sometimes especially as fire fighters you have completely new cases, but actually the world and technology changes, so you still will face new obstacles. An example could be the case of a burning of a car with and electric engine.

Karsten then introduced the new stuff they develop for SMW 3.0, which will be a major release. He also stressed that the software needs better documentation, something I also encountered when I tried to introduce new people to SMW. But this is a problem in lots of Open Source projects: People like to code, but not to write documentation. This also shows that for Open Source projects you do not only need coders ;))

Tobias introduced annotation tools for images, text and videos. This was developed as part of our projects and we would be excited to see use-cases and of course feedback to the extensions.

The keynote on Friday was about a project called slidewiki. This is basically a wiki that allows you to create and re-use presentation slides, annotate them, link them to topics and so on. It is really cool because other projects like Slideshare do note allow this and also do not allow forking of that.

The second talk was by Cindy Cicalese, who works for Wikimedia Foundation. She introduced that she will be advocating the 3rd party developers more in the project management of MediaWiki. You can go to their site to see stuff they want to do. In short:

  • They want to do content revision, so making more than one slot on a wiki page, a functionality that right now you can only get with SMW and PageForms
  • They also tackle the installation, updating and maintaing of wikis. This is actually a very important topic that basically everyone in the community faces. We normally do not have one wiki, we have a lot more. And updating every single one and also setting up is cumbersome.
  • They want to introduce a roadmap to make the development of MediaWiki more predictable. This would also help 3rd partys because we can tell our customers if a certain feature will be implemented soon or not

After that, Alexander Gesinn introduced a pre-configured virtual machine. Actually this might be nice for people who only want to try out SMW, with productive usage you still face the problems with maintenance. He named three things every enterprise-wiki has to have (and I agree with him):

  • Semantics (SMW + PageForms)
  • VisualEditor to not torture users with Wikisyntax
  • a responsible skin to have nice-looking wiki on mobile devices

Remco hat on thursday also a talk about a similiar topic, he called it wiki product lines. A product line is similar to the industry where you have different TVs that are all basically the same, only the screen size changes with the different products. He explained from a little bit more theoretical standpoint where he sees potential. To me this looks like a problem that will be tailored and we might have some (hopefully) completely free and documented solutions for this.

End the end of the day there was also a workshop organized by my colleague Lia and me. We said to set up a page on the SMW-Wiki where we collect projects and how we might use the stuff in the future.

Overall, it was a nice conference and I got to know many nice people. Also thanks to the organizers 😉

Algorithmic Criticism

Today I want to present I paper which made me think about Digital Humanities. It is called “Algorithmic Criticism” by Stephen Ramsay.

Unlike most of the other papers that only focus new algorithms and new data, this one also focuses on methods how the two parts of the digital humanities can be combined together. He wants to develop a criticism (which is normally a method more used in humanities) that is based on algorithms.

He argues that even in literature research it could be possible to have approached that are a lot more empirical, meaning you have an experiment and quantitative measurement to proove your claims. Another important point that he states that computers might not be ready for that kind of analysis (the paper is from 2005 though), but in future may be, so he believes that these methods will become available.

One of the central points is that he argues every critic reads a text using his own assumptions and “sees an aspect” (Wittgenstein) in the text. So the feminist reader sees a feminist aspect of the text, and also the “algorithmic” reader can see the aspect of the computer or can read the text transformed by a computer. The paper at the end presents some research doing tf-idf measures at the novel The Waves by Virignial Woolf.

I really like this idea to have a certain way of reading a text by letting this be done by a machine and that it is considered similar to a human reader, which is also not completely effective and free of bias. This also is good for the researcher in NLP, because so you can admit that the judgement the computer gives is also not free of bias, for instance if you change the parameters in your algorithm.

 

How to be a modern scientist, Google Tensorflow

This post I want to share a few things that just came to me the last couple of weeks and think there are worth sharing:

There is a new episode on Open Science Radio. This is a German podcast about Open Science and other stuff that is related. They also have some episodes. One thing they talk about it is Jeffrey Leek, a researcher in (bio-)statistics who wrote a book about being a modern scientist, which you can get for free or for a donation. And he also teaches classes via Cursera in Data Science. I can also recommend a lot episode 59 of open science radio about OpenML, which I think is also a very cool project

Google is open sourcing a tool for visualization of high dimensional data, Tensor Flow. The standard visualization shows word vectors. In my opinion this visualization is a little tricky because stuff that appears to be close in this three-dimensional view is in the real vector space with a couple hundred dimensions not close. But it is still a nice tool in order to explore how word vectors behave on a very large dataset that you do not even have to train yourself. You can also use the tool for plotting other high dimensional data.

Semantic Web Technologies – OWL, Rules and Reasoning

Summary of week four for the course Knowledge Engineering with Semantic Web Technologies 2015 by Harald Sack at OpenHPI.

RDFS Semantics: we need this because there was no formal description of the semantics and then the same querys gave back different results. So you add semantics. Every triple encoded in RDF is a statement and also a RDF-graph is also a statement.

OWL: based on a description logic it consists of classes, properties and individuals (instances of classes)

OWL2 has different flavors with three different dialects (EL, RL, QL), DL (based on description logic and Full (DL is decidable, Full is not). There are different ways to create ontologies. The most important (and shortest) are Manchester Syntax and Turtle.

Classes, properties and individuals in OWL are comparable with the ones in RDFS.

OWL contains NamedIndividuals, which can be introduced directly.

An editor for Ontologies is protege, which can be used as web or desktop application, there are also short courses at their homepage.

Deeper knowledge about OWL in the extra lectures.