Categories
General Science

Reflections on my PhD

In this post I want to write down what I learned during my PhD and what might help you too. It is my opinion on the stuff, but I used all of the techniques I described and they helped me. I split this up into four parts: Before you start, finding a supervisor, writing and finishing.

Why start a PhD?

When I thought about starting a PhD I had the motivation that I liked scientific work. I had done internships and work as a student in companies, but I wanted to dive deeper into research and learn more about the stuff I learned at university.

I also had in mind that a PhD is normally not a bad thing for your career. However, you might earn in a well-paid industry job at least at the beginning more money than in your PhD. And if you really only do it for the title, the time doing it might be really hard for you.

All this depends a lot on the field. Take a look at people you know in the field that archived things you want to archive and see what they did to get there. For me it made sense, since I was interested to get deeper knowledge about the field and could imagine to become a professor.

Finding a supervisor

In German the relationship to your supervisor is really close, you even call them your doctoral mother or doctoral father. But this also shows one issue you should be aware of: Your PhD depends a lot on these people. So choose them wisely!

So, how do you find a supervisor? In my case, I knew some of supervisors before, in fact I wrote my master‘s thesis in their department. My third supervisor I did not know before. Since the relationship to these people is really strong and you depend on them, I would encourage you to choose supervisors you get along with. Three (or four, five, six) years can be a hard time if you hate your supervisor or vice-versa. Also, it might make sense to not join the most prestigious university but get better support (Although you need some balancing in this since your degree might help you afterwards with finding a better job).

Another thing you should keep in mind is that also your supervisor does not know everything. I did my PhD in the very interdisciplinary field of open science, information science, ontology engineering and human-centered design, so I constantly needed to get feedback from other people and I was very lucky to meet a lot of helpful people. I was also happy for some workshops and courses about scientific methods like interviews I did not know a lot before. I also got really valuable feedback from the folks at University of Washington as well as unexpected places like the peer review of some of my articles. Conferences with PhD-sessions are also a great place for this, although I did not attend a lot of these. I also used the PhD programme at the University of Frankfurt a lot, since they provide a lot of helpful workshops about scientific methods, project management and other topics you might need for you dissertation.

Writing

Write drafts

Since I had several supervisors who do not all work at the same institution, I had to do a lot of communication between them. What helped me here, was to write a summary of the next steps I would do in my process and send this to them before the meetings. This takes of course some effort, but it allows your supervisors to know what you are doing and your agreements are more precise if you send them some text before meeting than just presenting your content. I tried this at first, but it did not work out. The other great advantage is if you wrote down your next steps, you are already done with some parts of your thesis!

Start early

Based on this, I also started writing up my analysis as soon as I did it. Since I used a multi-level design process, this was crucial anyway to get to the next step. The disadvantage was that I had to do a lot of reworking at the end.

Start with something easy

You won‘t write your thesis from the beginning to the end. Therefore, start with an easy part and go from there. My order was (with a lot of going back and forth): method – literature – results – discussion – introduction – conclusion. You might start with some other part, but this is your work, so make it comfortable for you.

Make a plan of each chapter

What also helped me a lot was to make a plan what I wanted to say in each chapter. This helps you to keep the red line within your work. The plan can also change during the process, but most of the plan will stay the same and helps you to navigate through the work. Keep in mind that your thesis will most likely be longer than 150 pages, so it is impossible (at least if was impossible for me) to keep an easy overview. The plan can also help you to keep track about your progress and helps finding chapters where you need to focus more on

Software

I used LaTeX to write my thesis. This was because I used LaTeX before in my master’s thesis. The advantages are that you do not have to worry about a document crashing and being unreadable, backups are also very easy, LaTeX takes care of your citations and the final product looks quite nice. A disadvantage is that it does not provide a nice track changes function like Word. My proofreaders wanted to work in Word, so I had to copy back from a Word document to my LaTeX files, which took a lot of time. But you also have to keep in mind that even if your proofreaders use track changes, you have to check all the document again, which might be very time-consuming.

Finishing

The finishing might be the hardest part. You are really stressed and see that things did not make that much sense as you thought at the beginning. There are also a lot of distractions out there and your funding might run out. My last phase of the PhD was not normal, since I did it during the Covid-19 pandemic.

Stop other stuff

The good thing is science is that many people you work with went through a PhD. So they will know that you are really busy at your finishing phase and hopefully leave you alone in that time. I was really shocked when I recognized how much more productive I was when I did not have any meetings at a given day. One good way is to move all the meetings you have to one day, so you won‘t be very productive that day, but all the other days.

Get a routine

As soon as you managed to be free from other day-to-day work, you might want to enjoy your free time. Of course, this makes it harder. During Covid-19, I learned to be way more organized, therefore this was relatively easy for me. Also, a lot of distractions were not possible, so I focused on the thesis. It also helps to have a plan and some milestones, so you keep track what you have finished and what is left to do.

Schedule breaks

If you created a plan, do not be too hard to yourself. You need breaks. For a certain time (this might be shorter than you think) you can work day and night without breaks. But if you are sick afterwards, your productivity drops. My approach therefore was to have a strict plan from the beginning where I can get easier and also have some time a the end for unknown events. So I had a relaxed time finishing my first draft. However, I was really stressed four weeks before in order to get the first draft done for some friends to read over it

Categories
Science

The end of the Information Age?

Last week there was this really interesting article at Heise online, a German tech magazine. It calls for the end of the modern internet, putting it into a postmodern internet. This also tackles the value of information and it’s declining in a postmodern age.

In the traditional way, we talk in information science a lot about the governance of information. The idea behind is in my opinion based on a library: In a library, the goal is to have as much information (books) available as possible in order to give them to people. They read the books and become smarter. At the same time, you know that the books are still kind of expensive, so your ideal is that all books should be free for everyone to read. Then everyone can get smarter and everything will be better.

With the internet, this happened. The only thing that did not happen was that people went to this new library and got the most relevant books. At that time, we as information scientists stepped in and said: „You have to make sure only people who are well-respected can write books in your library and we can teach you how to see this and educate you.“ The issue here was: it was not so easy to find out who is this credible source. It takes a lot of time to learn this and simple heuristics do not work. In science, we use peer-review to make sure we only publish what makes sense, but this also has some flaws.

After this, another phase started: The age of Wikipedia. The promises came true, everyone was able to edit it (although you were more likely to do it if you are a privileged nerd). You had less gatekeepers, but cheap information at a very high level. This idea was so successful that social networks came up, driving this even further: now really everyone with an account was able to publish stuff, get followers and so on. At the same time, the incentives for keeping people at the platform rose in order to bomb them with ads. This had the advantage that people were able to make a living from YouTube, but also people getting highly rewarded for posting conspiracy theories and hate comments.

Another thing happened: Information got so cheap that even if I do not find someone who shares information I like, I can just introduce my own source of information and a „market“ will make sure that the best source wins. The end of this was that the platforms won who were best at selling our attention to ad-companies. And the best way to get our attention seems to be lies (please stop calling it fake-news).

In my opinion, the question of how to fix this is extremely relevant to information science. I do not think we can fix this with better automatic moderating or censoring. There is way too much information out there to do this. We also saw that when Twitter blocked the account of Donald Trump, he started using the next platform.

I see two main barriers in this: first, we do not want to have censorship with old gatekeepers, be it a nation-state, a company or some kind of weird guy at Wikipedia. Second, I also do not see this happen since all of this costs a lot of money. If you have people checking for it, you have to pay them, if an algorithm does it, you have to take care of the errors, if volunteers do it, you create the same hierarchies of gatekeepers you wanted to avoid in the first place.

Therefore I am really curious what comes next. I also think from a theoretical perspective this is very interesting to look and maybe go further to create approaches that work.

Categories
Science

Fellowship Free Knowledge

I am happy to announce that I got accepted for being a fellow for free knowledge and open science (Fellow-Programm Freies Wissen), which is sponsored by Wikimedia Foundation. The program includes some money as well as mentoring and opportunities to network with other people, which are enthusiastic about open science.

Due to corona, the event was online. The first day we had a nice presentation with Judith Simon about ethics in computer science and especially in artificial intelligence and machine learning. The second day was all about our projects. We talked to our mentors and the people from Wikimedia gave us an overview over the program and open science in general

You can find my project at the project page. It will be about improving our project Schularchive (school archives). Our focus is three-fold:

  1. We want to promote the platform more to attract more users within the research community history of education as well as archivists at schools.
  2. We want to improve the platform using feedback of users.
  3. We want to dive deeper into the question how data stored at our wiki can be connected with wikidata and ultimately, which data should be in our wiki and which data should be in wikidata

If you want to connect or use the platform for your teaching, contact us via the platform or via or Twitter account: @Schularchive.

We also did a little networking and other fellows recommended interesting pages I want to share:

Categories
Science

Virtual Open Sym 2020

Open Sym this year was held online. My collegues from University of Washington, Seattle and me handed in a paper about an evaluation of the ontology I developed during my phd. The conference was organized in a different manner than last time. There were no parallel sessions and the presentations were rather short, abut ten minutes. Each session consisted of two to three presentations of the respective papers. Through this format, it was possible to get all presentations and the discussions were quite nice.

The social event was held via Mozilla Hubs. I think this was a good idea, but it is not the same as meeting with the people in person. It was also not that easy to have a real private talk within the virtual room, since everyone was able to hear what the others were talking about, although it got more quiet the further you were away from the people.

All in all, I think we should soon try to get back to face-to-face conferences. The discussions of the articles were rather interesting, but most of the time it is way more interesting to talk with people over a coffee or at a social event about stuff. All this was not possible. On the other side, we might use corona to think if it is necessary to conferences at the most exotic places attract people, or rather on less exotic places and save travel costs and time.

Categories
Science

Collaborative open analysis in a qualitative research environment

Together with some collegues I recently published a paper about the use of a vritual research environment for teaching the qualitative method objective hermeneutics. It is a follow-up of the paper SMW Based VRE for Addressing Multi-Layered Data Analysis my collegues did in 2017 where they presented the virtual reserach environment (VRE) and anticipated use cases. This time we evaluate the usage of the VRE. We did this using questionnaires for the students working with the VRE. We see the main potential in the guidance of students through the research process as well as in the tracing of the research, which also connects to principles of open science. The paper also discusses the pedagogical boundaries of this work since students mentioned being more distracted while from working from home than meeting in personal. The analysis was done pre-corona, so this might have changed now.

I also think this research is quite interesting when considering that a lot of teaching is done online now. If you want to try out the VRE, please contact me.

Categories
Science

User friendly

Robert Fabricant describes user interaction designers as highly trained tinkerers , with a robust set of prototyping skills that make up for our lack of formal credentials. We find ways to identify user needs, rapidly develop and test solutions, and gather user feedback while relying on the principles found within this book.

The book he talks about is called User Friendly and is written by himself and Cliff Kuang. It is one of the first books about user-friendly design aimed at non-experts. In the book they describe the last 150 years of industrial design with an emphasis on the paradigm of user friendlyness and how it evolved. The book is split up in two parts, the first one is called Easy to Use, the other Easy to Want. Every chapter is named after one principle of user-friendly design like error or trust.

This way of structuring made it a little hard to read for me. You often get the same stories from a different angle, which is a little annoying. But I liked the general way the authors looks at the problem. Coming from science, it was interesting to see practitioners looking at the topic.
I also liked the second part way more than the first, which is mainly because the first part is talking about typical usability flaws like the ridiculous ways in which nuclear power plants were designed making it hard for the engineers working there not blowing up everything. The second part is about the design of products, not only interfaces, putting people first. These topics I think in general are more interesting, it is also what I am dealing with in research.

I liked the book, especially the last part very much. If you are new to user-friendly design it gives a good overview where the field comes from and also why it is still important to create user-friendly products.

Categories
Science

Open Science Barcamp 2020

This year, I attended again the barcamp open science in Berlin. Due to corona, there were less people than last year, but the experience was still really cool. It is always nice to meet people and chat about open science. In all sessions there were pads where people could add their notes. There are also interviews on Open Science Radio.

The day started with the ignition talk by Birgit Schmidt, who works at University of Göttingen, State and University Library, you can also get the slides. She summarized the actual state of open access science publishing and put emphasis on putting this into the bigger picture and connected this topic with issues about funding as well as open peer review.

I attended four sessions: One about findability of research software, one about diamond open access and two about digital humanities. It seemed to me that this year the barcamp was more focused on certain topics, which way either because of less participants due to the beginning of the corona crisis or because the people attending were more focused on their topics.

Findability of research software is in my opinion a very interesting topic. For an information scientist, software is not findable just because it is on GitHub. On GitHub there are no identifiers, no keywords and often it is also not clear whether the software is still maintained or works with on an actual environment. Therefore I can easily relate to the summary we found in the pad: Research software is often not formally published at all (even though it is available, e.g. via GitHub), or published in specific Software journals (which are not common in all disciplines). This is a problem on two levels

  1.  Existing software cannot be adequately found and people work on the same issues without being able to build on pre-existing work.
  2. It is difficult to get proper credit for your research software and link it to the existing reputation system (that is very much focussed on reputation by formal publication in a journal).

Diamond open access was new to me. Basically it means that you try to keep the licenses of the articles also in your hands and try to do all the publishing process within the community in order to get rid of big journals. So the only infrastructure you have to provide externally is a publication system. For this, there exists especially one system: ojs (open journal systems), which is free software and runs on a server. I really liked this approach because it tackles some problems that still exist with open access nowadays like publication fees and the fact that publishers take your intellectual property away from you. The downside of course are the costs for the infrastructure: I do not have a clear number, but there needs some effort to be put into the hosting and providing of the system, so you also need (public) money or great efforts from within the community in order to run these systems. There are actually some projects even at DIPF doing this and I think it will be interesting to see in the future what happens to these projects.

The workshops about digital humanities were sometimes a little bit challenging. We had started with several discussions what might be problems when it comes to open research in digital humanities and we also have to acknowledge that other fields (especially in the natural sciences) are ahead of the humanities. This lead to interesting discussions in the workshops and still the problems that most of the people attending the open science barcamp do have a background in natural sciences or engineering, where open science is way more established than in the humanities.

I think in digital humanities there are actually two things happening: First, there is the will of a lot of people to make their research more open (I can see this when I talk to people during my dissertation). On the other side, we are also in the middle of the digitization of the field, so there is a lot of stuff tried out as well as researched. I would also argue it is not true that there is not so much open science going on in DH. Just think about all the projects to digitize old writings or the corpora created in linguistics. We see a lot of these processes and actually I think it is really interesting to be in these processes now to see what is possible and what is not possible in the future.

Summing up, it was a great event like last year, and thanks a lot to the organizers.

Categories
Science

DHD 2019

Last week the conference DHD 2019, the German digital humanities conference took place in Frankfurt.

One remarkable discussion I heard was from a panel about 3D-modelling and the reconstruction of buildings. People in this panel were talking about the problems their field has and one was the lack of standards. As we all can imagine it is really hard to reconstruct old objects and buildings.

Some buildings have been re-build, destroyed or have never been built at all. This creates many uncertainties when it comes to questions like: How did the building originally look? Was it built as the architect intended it to be built? If we then look at standards, we see the importance. With a standard, one could exactly see what the other researcher wanted to show with their modelling and exchange and shared work would also be easier.

Another interesting talk was the keynote by Jana Diesner. She talked about her research in computational social science at the University of Illinois. She at first urged for a better collaboration between computational social sciences and digital humanities. This I also think is really important and there are certain fields that are quite close. Actually I think that maybe some of my research more falls in the field of computational social sciences than digital humanities because my institute is still focused on social sciences and the experts in my field are also doing qualitative social science research. The other thing I found remarkable are her stories about the ischool she is working at. In the US, there are many ischools now. The concept (as I understood it) is to bring researchers from different fields like social sciences, information science, computer science and psychology to do research in the broader sense about information. This can be a very fruitful combination because it also brings together new methods and ideas, which always helps to open our minds for difficult questions.

I did not attend so much of the conference because it was just next to my office and I had other stuff to do, but a really cool thing was the poster-slam and the poster session itself. It is just nice to look at posters and being able to discuss research directly with the researchers in a private way and it is also a nicer communication than just via journal articles and presentations.

Categories
Science

FAIR Software

Some of you might have heard about the FAIR principles for data. Since the paper was published in 2015, it became state of the art in data sharing. But data is not all that is needed to make research more transparent. Software is another very important part.

Tackling this topic, the German National Library of Science and Technology hosted a workshop to make software also more FAIR. There have been varios posts, you can also see the complete sessions and the exercises online.

I actually liked the workshop a lot and it is worth having a look at the sessions. It also showed that there are still certain boundaries. For instance, there are no real repositories for scientific software with a search interface that can be narrowed down to scientific criteria. I also know that people are working on knowledge graphs, but right now there is often no good way to link data, software and published results. I liked the approach of Zenodo to provide and easy way to reference software and get a DOI for it, but there are not many metadata available about the software.

The workshop involved a lot of hands-on sessions, the overall principle was based on the carpentries, especially library carpentry, which is a workshop format that is completely open, so everyone can work with it and use it for their own workshops.

I learned a lot and thanks very much to the organizers.