Categories
General Science

Reflections on my PhD

In this post I want to write down what I learned during my PhD and what might help you too. It is my opinion on the stuff, but I used all of the techniques I described and they helped me. I split this up into four parts: Before you start, finding a supervisor, writing and finishing.

Why start a PhD?

When I thought about starting a PhD I had the motivation that I liked scientific work. I had done internships and work as a student in companies, but I wanted to dive deeper into research and learn more about the stuff I learned at university.

I also had in mind that a PhD is normally not a bad thing for your career. However, you might earn in a well-paid industry job at least at the beginning more money than in your PhD. And if you really only do it for the title, the time doing it might be really hard for you.

All this depends a lot on the field. Take a look at people you know in the field that archived things you want to archive and see what they did to get there. For me it made sense, since I was interested to get deeper knowledge about the field and could imagine to become a professor.

Finding a supervisor

In German the relationship to your supervisor is really close, you even call them your doctoral mother or doctoral father. But this also shows one issue you should be aware of: Your PhD depends a lot on these people. So choose them wisely!

So, how do you find a supervisor? In my case, I knew some of supervisors before, in fact I wrote my master‘s thesis in their department. My third supervisor I did not know before. Since the relationship to these people is really strong and you depend on them, I would encourage you to choose supervisors you get along with. Three (or four, five, six) years can be a hard time if you hate your supervisor or vice-versa. Also, it might make sense to not join the most prestigious university but get better support (Although you need some balancing in this since your degree might help you afterwards with finding a better job).

Another thing you should keep in mind is that also your supervisor does not know everything. I did my PhD in the very interdisciplinary field of open science, information science, ontology engineering and human-centered design, so I constantly needed to get feedback from other people and I was very lucky to meet a lot of helpful people. I was also happy for some workshops and courses about scientific methods like interviews I did not know a lot before. I also got really valuable feedback from the folks at University of Washington as well as unexpected places like the peer review of some of my articles. Conferences with PhD-sessions are also a great place for this, although I did not attend a lot of these. I also used the PhD programme at the University of Frankfurt a lot, since they provide a lot of helpful workshops about scientific methods, project management and other topics you might need for you dissertation.

Writing

Write drafts

Since I had several supervisors who do not all work at the same institution, I had to do a lot of communication between them. What helped me here, was to write a summary of the next steps I would do in my process and send this to them before the meetings. This takes of course some effort, but it allows your supervisors to know what you are doing and your agreements are more precise if you send them some text before meeting than just presenting your content. I tried this at first, but it did not work out. The other great advantage is if you wrote down your next steps, you are already done with some parts of your thesis!

Start early

Based on this, I also started writing up my analysis as soon as I did it. Since I used a multi-level design process, this was crucial anyway to get to the next step. The disadvantage was that I had to do a lot of reworking at the end.

Start with something easy

You won‘t write your thesis from the beginning to the end. Therefore, start with an easy part and go from there. My order was (with a lot of going back and forth): method – literature – results – discussion – introduction – conclusion. You might start with some other part, but this is your work, so make it comfortable for you.

Make a plan of each chapter

What also helped me a lot was to make a plan what I wanted to say in each chapter. This helps you to keep the red line within your work. The plan can also change during the process, but most of the plan will stay the same and helps you to navigate through the work. Keep in mind that your thesis will most likely be longer than 150 pages, so it is impossible (at least if was impossible for me) to keep an easy overview. The plan can also help you to keep track about your progress and helps finding chapters where you need to focus more on

Software

I used LaTeX to write my thesis. This was because I used LaTeX before in my master’s thesis. The advantages are that you do not have to worry about a document crashing and being unreadable, backups are also very easy, LaTeX takes care of your citations and the final product looks quite nice. A disadvantage is that it does not provide a nice track changes function like Word. My proofreaders wanted to work in Word, so I had to copy back from a Word document to my LaTeX files, which took a lot of time. But you also have to keep in mind that even if your proofreaders use track changes, you have to check all the document again, which might be very time-consuming.

Finishing

The finishing might be the hardest part. You are really stressed and see that things did not make that much sense as you thought at the beginning. There are also a lot of distractions out there and your funding might run out. My last phase of the PhD was not normal, since I did it during the Covid-19 pandemic.

Stop other stuff

The good thing is science is that many people you work with went through a PhD. So they will know that you are really busy at your finishing phase and hopefully leave you alone in that time. I was really shocked when I recognized how much more productive I was when I did not have any meetings at a given day. One good way is to move all the meetings you have to one day, so you won‘t be very productive that day, but all the other days.

Get a routine

As soon as you managed to be free from other day-to-day work, you might want to enjoy your free time. Of course, this makes it harder. During Covid-19, I learned to be way more organized, therefore this was relatively easy for me. Also, a lot of distractions were not possible, so I focused on the thesis. It also helps to have a plan and some milestones, so you keep track what you have finished and what is left to do.

Schedule breaks

If you created a plan, do not be too hard to yourself. You need breaks. For a certain time (this might be shorter than you think) you can work day and night without breaks. But if you are sick afterwards, your productivity drops. My approach therefore was to have a strict plan from the beginning where I can get easier and also have some time a the end for unknown events. So I had a relaxed time finishing my first draft. However, I was really stressed four weeks before in order to get the first draft done for some friends to read over it

Categories
Science

The end of the Information Age?

Last week there was this really interesting article at Heise online, a German tech magazine. It calls for the end of the modern internet, putting it into a postmodern internet. This also tackles the value of information and it’s declining in a postmodern age.

In the traditional way, we talk in information science a lot about the governance of information. The idea behind is in my opinion based on a library: In a library, the goal is to have as much information (books) available as possible in order to give them to people. They read the books and become smarter. At the same time, you know that the books are still kind of expensive, so your ideal is that all books should be free for everyone to read. Then everyone can get smarter and everything will be better.

With the internet, this happened. The only thing that did not happen was that people went to this new library and got the most relevant books. At that time, we as information scientists stepped in and said: „You have to make sure only people who are well-respected can write books in your library and we can teach you how to see this and educate you.“ The issue here was: it was not so easy to find out who is this credible source. It takes a lot of time to learn this and simple heuristics do not work. In science, we use peer-review to make sure we only publish what makes sense, but this also has some flaws.

After this, another phase started: The age of Wikipedia. The promises came true, everyone was able to edit it (although you were more likely to do it if you are a privileged nerd). You had less gatekeepers, but cheap information at a very high level. This idea was so successful that social networks came up, driving this even further: now really everyone with an account was able to publish stuff, get followers and so on. At the same time, the incentives for keeping people at the platform rose in order to bomb them with ads. This had the advantage that people were able to make a living from YouTube, but also people getting highly rewarded for posting conspiracy theories and hate comments.

Another thing happened: Information got so cheap that even if I do not find someone who shares information I like, I can just introduce my own source of information and a „market“ will make sure that the best source wins. The end of this was that the platforms won who were best at selling our attention to ad-companies. And the best way to get our attention seems to be lies (please stop calling it fake-news).

In my opinion, the question of how to fix this is extremely relevant to information science. I do not think we can fix this with better automatic moderating or censoring. There is way too much information out there to do this. We also saw that when Twitter blocked the account of Donald Trump, he started using the next platform.

I see two main barriers in this: first, we do not want to have censorship with old gatekeepers, be it a nation-state, a company or some kind of weird guy at Wikipedia. Second, I also do not see this happen since all of this costs a lot of money. If you have people checking for it, you have to pay them, if an algorithm does it, you have to take care of the errors, if volunteers do it, you create the same hierarchies of gatekeepers you wanted to avoid in the first place.

Therefore I am really curious what comes next. I also think from a theoretical perspective this is very interesting to look and maybe go further to create approaches that work.

Categories
Science

Fellowship Free Knowledge

I am happy to announce that I got accepted for being a fellow for free knowledge and open science (Fellow-Programm Freies Wissen), which is sponsored by Wikimedia Foundation. The program includes some money as well as mentoring and opportunities to network with other people, which are enthusiastic about open science.

Due to corona, the event was online. The first day we had a nice presentation with Judith Simon about ethics in computer science and especially in artificial intelligence and machine learning. The second day was all about our projects. We talked to our mentors and the people from Wikimedia gave us an overview over the program and open science in general

You can find my project at the project page. It will be about improving our project Schularchive (school archives). Our focus is three-fold:

  1. We want to promote the platform more to attract more users within the research community history of education as well as archivists at schools.
  2. We want to improve the platform using feedback of users.
  3. We want to dive deeper into the question how data stored at our wiki can be connected with wikidata and ultimately, which data should be in our wiki and which data should be in wikidata

If you want to connect or use the platform for your teaching, contact us via the platform or via or Twitter account: @Schularchive.

We also did a little networking and other fellows recommended interesting pages I want to share:

Categories
Science

Virtual Open Sym 2020

Open Sym this year was held online. My collegues from University of Washington, Seattle and me handed in a paper about an evaluation of the ontology I developed during my phd. The conference was organized in a different manner than last time. There were no parallel sessions and the presentations were rather short, abut ten minutes. Each session consisted of two to three presentations of the respective papers. Through this format, it was possible to get all presentations and the discussions were quite nice.

The social event was held via Mozilla Hubs. I think this was a good idea, but it is not the same as meeting with the people in person. It was also not that easy to have a real private talk within the virtual room, since everyone was able to hear what the others were talking about, although it got more quiet the further you were away from the people.

All in all, I think we should soon try to get back to face-to-face conferences. The discussions of the articles were rather interesting, but most of the time it is way more interesting to talk with people over a coffee or at a social event about stuff. All this was not possible. On the other side, we might use corona to think if it is necessary to conferences at the most exotic places attract people, or rather on less exotic places and save travel costs and time.

Categories
General

Free as in freedom

This week Mozilla laid off one third of their workers. This (sadly) supports the claim that tech is right now not really capable of maintaining non-profit structures for their main services. Free software works quite well for backend, but not really for frontend.

I just stumbled upon this interesting piece of a guy who used to work for Mozilla. He characterized Mozilla with the focus on Firefox as comparable to Bell Labs or Xerox Parc, which were large research labs funded by an basically endless stream of money coming in from other sources. If we nowadays think of infinite money, we think of GAFA (Google, Amazon, Facebook, Apple). They also fund a lot of research or open source software and actually also develop a lot of open source software. The issue still is: they of course do not fund their competitors. Google, the company developing Chrome is mainly funding Mozilla!

So, what is going on? We do not see a lot of open source software that is directly distributed to end-users that is funded well. If you want to develop open source software, best develop a product that GAFA also needs, but maybe be cheaper for them than developing it themselves.

Second, we, the users are not used to pay for this kind of software. We came a long way to pay for entertainment in the internet, but we still refuse to pay for end-user software. Mozilla tried to get away from this, but they failed. Which is actually really bad considering that it is the only other browser, which is the program most of use most of our time on our computer. One of the few good examples for a foundation and financing of a website is Wikipedia, but actually, if you look at the numbers, the development of MediaWiki and hosting of servers is ten times cheaper than running a cutting edge web browser on several platforms that probably has more complexity than operating systems.

So, what do we need? We need people pay and support for free software. Free software means freedom, not free beer. We, as users should think about this and support these projects because they are at the end all we have against a monopolized web. Last week there was also an article about state funding of Mozilla. I think this might solve the problem short-term, but not long-term. If you have funding from a state, you still rely on one massive income of money. I think the way to go is to get more income streams and be less dependent on big donors, however the making users pay for Firefox is probably also a bad idea. It stays a tough issue, but we should make sure that Firefox will survive.

Categories
Science

Collaborative open analysis in a qualitative research environment

Together with some collegues I recently published a paper about the use of a vritual research environment for teaching the qualitative method objective hermeneutics. It is a follow-up of the paper SMW Based VRE for Addressing Multi-Layered Data Analysis my collegues did in 2017 where they presented the virtual reserach environment (VRE) and anticipated use cases. This time we evaluate the usage of the VRE. We did this using questionnaires for the students working with the VRE. We see the main potential in the guidance of students through the research process as well as in the tracing of the research, which also connects to principles of open science. The paper also discusses the pedagogical boundaries of this work since students mentioned being more distracted while from working from home than meeting in personal. The analysis was done pre-corona, so this might have changed now.

I also think this research is quite interesting when considering that a lot of teaching is done online now. If you want to try out the VRE, please contact me.

Categories
Science

User friendly

Robert Fabricant describes user interaction designers as highly trained tinkerers , with a robust set of prototyping skills that make up for our lack of formal credentials. We find ways to identify user needs, rapidly develop and test solutions, and gather user feedback while relying on the principles found within this book.

The book he talks about is called User Friendly and is written by himself and Cliff Kuang. It is one of the first books about user-friendly design aimed at non-experts. In the book they describe the last 150 years of industrial design with an emphasis on the paradigm of user friendlyness and how it evolved. The book is split up in two parts, the first one is called Easy to Use, the other Easy to Want. Every chapter is named after one principle of user-friendly design like error or trust.

This way of structuring made it a little hard to read for me. You often get the same stories from a different angle, which is a little annoying. But I liked the general way the authors looks at the problem. Coming from science, it was interesting to see practitioners looking at the topic.
I also liked the second part way more than the first, which is mainly because the first part is talking about typical usability flaws like the ridiculous ways in which nuclear power plants were designed making it hard for the engineers working there not blowing up everything. The second part is about the design of products, not only interfaces, putting people first. These topics I think in general are more interesting, it is also what I am dealing with in research.

I liked the book, especially the last part very much. If you are new to user-friendly design it gives a good overview where the field comes from and also why it is still important to create user-friendly products.

Categories
General

Sustainable Software

When we are talking about free software, the point is often that this is more sustainable than proprietary software because everyone can edit the code and even if your company goes bankrupt someone else can take over and go on coding. Actually, in many open source projects there is only one person doing most of the developing work and there is also the risk of abandonware, software that used to be maintained, but the maintainer has moved on and does other stuff now. Still, no one else is taking over the code due to several reasons like bad documentation, complex code, lacking skills. So at the end the whole programm is written new from scratch in the next project (I see this quite a bit in science). Luckily there are institutions like the software sustainability institute tackling some if these especially technical issues, but I want to put more emphasis on social issues.

So actually the question we have to ask: how do we make software more sustainable? I see one crucial point that is true for software as well as any other voluntarily work (be it sports clubs or cultural/political groups): how easy do you attract people and how easy is it to participate in your project? Often, it is only one person working on a project. If this person stops, the whole project goes down. So what should happen? I think there are three more social than technical levels in which many projects might need some improvement:

  1. Community building: it should be easy to join your project. Connect, network with others, show that the atmosphere you do things is nice. Threat people reporting bugs nicely, talk to people and show that you are a person or group of persons it is fun to work with. Remember, people do this often in their free time
  2. Documentation: Make it easy on a technical side to join your project. It should not be easier to re-write the whole software than working on existing code. If you are a political group or other, also document what you do and what you did and why you did it. If you write code, also do this. Also track decisions, you do not want other people to make the same mistakes again.
  3. Financing: yes, financing. How do you expect people to work on things when they still have to pay rent? Therefore it is important to have your project on a stable ground, if you want to have it running. This does not mean that you sell out or try to get rich from ripping of your users, but it means that you think if you want to spend a reasonable amount of your time (or support someone else to spend a reasonable amount of their time for your project and think about putting some money in this). In software this also tackles licenses (another boring topic, I know, but there is also help.)

Summing up, I think we need to talk more about these things when developing free software and I also know that it is not the tasks most programmers are good at, but might be some skills to acquire in the future or attract people having these skills for our projects or software. I also want to show that if you are not a programmer, you can still do very important work in this background.

And even if you do not want to become active in open source software development, there are a lot of clubs, sports teams, political groups that will be happy to use your input and exptertise

Categories
General

Personal state of linux

In this post, I want to give an overview over the linux distributions I am using right now and why I use them.

I am since quite a lot of time using Xubuntu on every computer that is a little bit older or has limited resources. Xubuntu is fast, easy to install and uses litle resources, the in my opinion perfect os for old computers. I switched from Lubuntu to Xubuntu, since I actually liked XFCE more and it seemed more smoothly at that time (2014).

On my working machine, which is an desktop from 2012, I am using right now Ubuntu Budgie. This was due some problems I had on installing Manjaro or Fedora. Ubuntu Budgie runs smoothly, I sometimes have small problems with graphics, but this might be also due to my old graphics card. I also like the interface, which is a good mixture between the more mobile-oriented Gnome and a more desktop-oriented approach. I am also using the LTS-version because I do not want to upgrade the complete os very often. On the other side I switched from Ubuntu to Fedora because I wanted to have the latest software. Right now, I did not miss some newer software on my ubuntu machine.

On my notebooks, I am using Manjaro right now. I switched from Fedora, since I first wanted to try out something new and second, I liked the concept of the rolling release and wanted to try out this. From using it one month, I really like it. You get a lot of updates, but they all work smoothly, so there is not a lot to worry about. One particular thing I like about Manjaro is this layout switcher for gnome, that allows your desktop to look like in certain different ways without any configuration or the re-installation of another desktop.

At the end you really need to think about how stable your system should be. If you want a very stable system where you do not have to worry about (and maybe are a beginner with linux as well), use Ubuntu with some of its flavors or maybe Debian, preferably an LTS-version. If you want to try out new software, but maybe also encounter issues, better use Manjaro or Fedora.

On the software side, I get around with all the different distributions. Sure, the way packages are brought to you is different and the philosophy as well. But most of the larger distributions offer a a lot of software, you can do most of the things with all distributions. Using snap makes specialized or even closed-source software really easy, so you do not have to worry all your favorite programs not running anymore when changing the distribution.

Categories
Science

Open Science Barcamp 2020

This year, I attended again the barcamp open science in Berlin. Due to corona, there were less people than last year, but the experience was still really cool. It is always nice to meet people and chat about open science. In all sessions there were pads where people could add their notes. There are also interviews on Open Science Radio.

The day started with the ignition talk by Birgit Schmidt, who works at University of Göttingen, State and University Library, you can also get the slides. She summarized the actual state of open access science publishing and put emphasis on putting this into the bigger picture and connected this topic with issues about funding as well as open peer review.

I attended four sessions: One about findability of research software, one about diamond open access and two about digital humanities. It seemed to me that this year the barcamp was more focused on certain topics, which way either because of less participants due to the beginning of the corona crisis or because the people attending were more focused on their topics.

Findability of research software is in my opinion a very interesting topic. For an information scientist, software is not findable just because it is on GitHub. On GitHub there are no identifiers, no keywords and often it is also not clear whether the software is still maintained or works with on an actual environment. Therefore I can easily relate to the summary we found in the pad: Research software is often not formally published at all (even though it is available, e.g. via GitHub), or published in specific Software journals (which are not common in all disciplines). This is a problem on two levels

  1.  Existing software cannot be adequately found and people work on the same issues without being able to build on pre-existing work.
  2. It is difficult to get proper credit for your research software and link it to the existing reputation system (that is very much focussed on reputation by formal publication in a journal).

Diamond open access was new to me. Basically it means that you try to keep the licenses of the articles also in your hands and try to do all the publishing process within the community in order to get rid of big journals. So the only infrastructure you have to provide externally is a publication system. For this, there exists especially one system: ojs (open journal systems), which is free software and runs on a server. I really liked this approach because it tackles some problems that still exist with open access nowadays like publication fees and the fact that publishers take your intellectual property away from you. The downside of course are the costs for the infrastructure: I do not have a clear number, but there needs some effort to be put into the hosting and providing of the system, so you also need (public) money or great efforts from within the community in order to run these systems. There are actually some projects even at DIPF doing this and I think it will be interesting to see in the future what happens to these projects.

The workshops about digital humanities were sometimes a little bit challenging. We had started with several discussions what might be problems when it comes to open research in digital humanities and we also have to acknowledge that other fields (especially in the natural sciences) are ahead of the humanities. This lead to interesting discussions in the workshops and still the problems that most of the people attending the open science barcamp do have a background in natural sciences or engineering, where open science is way more established than in the humanities.

I think in digital humanities there are actually two things happening: First, there is the will of a lot of people to make their research more open (I can see this when I talk to people during my dissertation). On the other side, we are also in the middle of the digitization of the field, so there is a lot of stuff tried out as well as researched. I would also argue it is not true that there is not so much open science going on in DH. Just think about all the projects to digitize old writings or the corpora created in linguistics. We see a lot of these processes and actually I think it is really interesting to be in these processes now to see what is possible and what is not possible in the future.

Summing up, it was a great event like last year, and thanks a lot to the organizers.