Data curation

Today I participated in a lab in the course Distant reading over Zoom about Data Curation headed by Karl Berglund.

In the data curation lab the students will learn how to automatically manage and manipulate digital texts in different ways.

We will depart from examples of Python code in Jupyter Notebook and, among other things, use the SpaCy module to perform principal tasks in Natural Language Processing, such as tokenization, lemmatisation, and part of speech-tagging.

If you want to read more about the course Distant Reading you can read this previous blog post. If you want to learn more about Karl Berglund and Distant reading, you can read this blog post from 2019.

Distant Reading

Yesterday the last course of the semester started for the students of the Digital Humanities program. It is called Distant reading and is headed by researcher and librarian Karl Berglund.

Image by Free-Photos from Pixabay 

The core of the course is that it introduces and discusses tools and methods for what is known as distant reading; i.e., computer-supported and quantitative analyses of digital text material. This is done by contextualising the term distant reading, theorising on how quantitative and statistical methods differ from more traditional humanistic approaches.

The primary point of departure is finished software; however, some basic scripting languages will also be presented and implemented during the course in order to increase understanding of how computer-supported text processing works in practice.

During the coming months, more blogposts will be posted on this blog about the contents of this course.

Karl Berglund: Reading From a Distance

Hi, I split my time between being a researcher in literature (currently in the project “From Close Reading to Distant Reading”) and a digital scholarship librarian, where I support researchers who want to deploy digital methods in their research.
My own research has from start been focused on large-scale patterns and systematic studies of (Swedish) literature. My thesis quantitatively mapped the boom in contemporary Swedish crime fiction in the 21st century, both concerning publishing patterns, marketing and literary content. After my dissertation I have moved towards computational approaches to literary analysis.
This makes me an odd bird in my own discipline, where most people are engaged in close readings and qualitative studies of different kinds. But with the rapidly growing digitised (and born-digital) literary text collections, the methodical monoculture is slowly starting to be challenged. The digital methods makes other kinds of patterns visible, new kinds of analysis possible, different kinds of research questions relevant to pose.
My course at the program is dedicated to exactly this: distant readings, to use the influential term coined by Franco Moretti. I will try to show you and critically discuss – both theoretically/conceptually and methodically/practically – how one can engage in computational literary analysis (and also: text mining within the humanities more broadly). The course depart from both readymade software and basic programming, and will cover topics such as pre-processing, concordances, collocations, and topic modeling. The ambition is to provide you with some quite hands-on skills and tools for further explorations in this vivid area of the humanities.
Karl Berglund
Researcher at the Department of Literature
Digital Scholarship Librarian at the University Library