Creating digital platforms to explore computational approaches in the humanities (and beyond)

With quantitative approaches being ever more present in the humanities and the social sciences, it is crucial that today’s students in these areas be introduced to a range of concepts and tools from data science and computational modelling. On the one hand, data science requires among others an understanding of the various types of data, of possible sampling procedures and of pitfalls during data analysis (such as those surrounding correlation and causation). On the other hand, computational modelling rests on a specific epistemology, with the construction of an abstraction of a real situation. This often involves the exploration of a parameter space and the study of the relationships between parameter values and simulation outputs, with phenomena such as self-organization, emergence, or phase transition.
These are usually little-chartered territories for students in arts or the social sciences, and they may at first feel worried about their capacity to understand. It has been shown, nevertheless, that hands-on approaches can facilitate learning: students will for instance more easily understand the impact of varying the values of some parameters if they can experiment freely with different configurations, observe various outputs and draw similarities and contrasts by themselves. Also, trying to analyze real, noisy and ‘imperfect’ data will open the door to a form of situated learning more effective than abstract elaborations.
While some excellent resources are now available to discover data science (e.g., Jamovi, JASP, tableau, kaggle.com), there are so many fields and options when it comes to computational modelling that it can make sense for a teacher to develop their own tailored learning tools. The latter can provide the precious hands-on experience aforementioned, and target the right level of thinking. They can, however, create technical difficulties and delays when they require students to deal with some code, install and run programs, etc. After facing some of these challenges, the objective of my TDG was therefore to develop tailored online digital tools readily accessible to my students in linguistics.
With two research assistants, we developed a dozen of simple online interfaces. Some offer a range of computational models, for instance to study the spread of language innovations in social networks or the emergence of a shared lexicon among artificial speakers through simple linguistic interactions. The others let users experience with methods from natural language processing. In particular, the most developed interface offers a range of text corpora to students – from Jules Verne’s novels to movie plot summaries and BBC news articles – and tools to conduct sentiment analysis, build collocation networks, extract topics etc.
I found important, and not always easy, to clearly communicate with the research assistants about the teaching and learning context that was essential for the software being developed – such a context can be minimized when coming from a pure computer science background and focusing on technical issues. I had many suggestions for design, yet working with someone more familiar with UX (user experience) would likely have enhanced the final products.
The impact on my teaching was positive, as suggested by students’ comments and attitude in class. Flipped classroom likely played a role, as I could spend time in class to work with the students and provide assistance. I found in particular that assessments integrating some of the previous tools could be really rich and creative: I was impressed by some students’ insights and finesse when discussing the outputs of NLP tools applied to Bruce Springsteen’s songs or to summaries of horror movies from different parts of the world.
- First, the software can always be improved by considering the students’ experience. It may be for instance that to model a phenomenon like linguistic diffusion, a first version of an implemented model offers too many parameters to characterize the situation, and leaves students a bit confused, with too much to explore (see Figure 1 with the parameters in the left panel). One can then build a second version focusing on some of the parameters only, setting fixed values for the others, to reduce the overall complexity of the simulation. When a range of options are offered, e.g., text corpora to be analyzed, students may also show strong preferences, or request other sources. This can also be accounted for in later versions.

- Second, tools which maximize agency and offer more freedom to students enhance interest and motivation. As an illustration, offering 10 varied corpora for students to apply NLP tools is good, but letting students choose the text(s) they want to study among the 60,000 works made available online by Project Gutenberg is better – and makes grading a more surprising and interesting task! I am now therefore developing another tool to easily select and pre-process documents on gutenberg.org, to be then analyzed. Another option is to create tools which allow students to collect their own data and analyze them. A first attempt centered around semantic variability among various groups of speakers in Hong Kong led students to choose very diverse fields, from city neighborhoods and ball sports to ice-cream perfumes and local festivals (see Figures 2 and 3).

