Argumentative Game Genie: Rhetoric in the digital humanities

I find that a central challenge of presenting digital humanities work is the need to speak across multiple languages: technical, humanistic, visual, algorithmic. To borrow Susan Brown’s phrase, DH work usually involves working in the gaps between disciplines and these gaps come with all kinds of linguistic and communication difficulties. How do we communicate concepts that we know are important to humanistic inquiry when those concepts rely on analyses of algorithms, data visualizations, geographic information systems and similar obtuse technical concepts and tools? Most DH’ers are familiar with the furrowed brows of our colleagues when we explain that their field of study could be aided with some bizarre and opaque computational apparatus. “Machine learning!?!?!” they say in the  drawn out, pained tone usually reserved for doctor’s describing incurable diseases.

Given this need to speak across the gap of hermeneutics & technical forms of knowledge (setting aside the fact that the two are not really all that separate) communication becomes a critical tool in digital humanities work. Indeed, communicating across disciplines and disourses is no small feat. If we have enough trouble communicating with one another in our ‘home’ disciplines, one need not work hard to imagine the difficulty of trying to communicate in a language that is intellectually rigorous,  technically savvy, and even the slightest bit compelling and engaging.

The excellent work of the Stanford Literary Lab has demonstrated the need to adapt scholarly forms of communication to suit digital humanities research. Their work is both compelling in its content — as it pushes DH work in new directions and poses timely questions that challenge the tenets of the ‘discipline’ — and also in its form, using the ‘pamphlet’ as an appropriate genre for communicating this research. Not a peer-reviewed article in the traditional sense, the pamphlet enables the researchers to be more casual, more compelling, more polemic. All in all, a cooler form.

Yet in offering one of the first instances of an adapted form of scholarly research, the Lab’s pamphlets also provide grounds for critique and reconsideration of our methods of argumentation. As DH’ers we must be cautious about our rhetorical and argumentative uses of visual aids, algorithms, and technical language.

At times the struggle to speak in the gap between humanities and the digital results in our borrowing concepts or tools from other disciplines without fully explaining those concepts to our readers. This can result in those concepts becoming a kind of rhetorical black box where texts go in, ‘data’ comes out and arguments are built upon that data. Algorithms and software become a kind of argumentative Game Genie that hacks a regular humanities article, adding cool new features and exciting possibilities of interpretation. Are these methods sound or are they cheat codes that help researchers get past tricky logical problems or uncertainties in their reasoning?

For an example of this I turn to the rhetorical use of algorithms and DH tools in Stanford Lab’s recent publication “On Paragraphs. Scale, Themes, and Narrative Forms.” In this fascinating piece the authors propose that the paragraph remains a relatively unexamined terrain for analysis of structure, form and thematic content. Amongst other compelling analyses, they employ topic modelling methods to compare the general ‘topicality’ of paragraphs with non-paragraph chunks of writing. Based on this analysis they are able to assert that paragraphs are, by and large, more thematically organized than non-paragraphed selections of text. The authors assert that “Paragraphs allows us to ‘see’ themes, because themes fully ‘exist’ only at the scale of the paragraph” (21). By and large I find their arguments convincing and their conclusions sound.

Yet what specifically interests me here is the authors’ rhetorical use of software and algorithms in order to further their argument.  The writers employ a kind of algorithmic black box that doesn’t allow for scrutiny; this makes it difficult to assess their findings. On page 8 for example they discuss the differing levels of thematic focus between paragraphs, 82-word textual chunks, and 200-word textual chunks :

Screen Shot 2016-02-23 at 3.31.48 PM

The authors use “Gini’s index of wealth inequality” and “Herfindahl’s measure of market concentration” in order to assess how “the number of words in a given paragraph” are “distributed among … different topics present in the corpus.” Their argument presumes, in the very least, that these economic measures can be wrested from their original applications and transposed onto literary texts. Is this really true? Can words, as the kind-of currency of topics, really be imagined as the same kinds of distributable resources that Gini and Heifendahl’s measures describe? Is mapping paragraph topical space onto these measures a trivial task or are there ontological questions at stake in the very construction of the algorithms themselves? Most readers won’t know the answer to this question and the writers offer no discussion of this crucial dimension of their argument. Instead these formulae appear to do some kind of work that advances their argument but that work is inscrutable beyond its results.

The wealth that Gini measures and the marketshare that Heifendahl describes can only belong to a single actor yet this is not the case for a word or a paragraph. According to the logic of topic modelling, a word in a given document can belong to a number of topics — no word is in the ‘possession’ of a single topic in the same manner that we think of money, wealth, or marketshare. In this sense it is not at all clear whether these methods are indeed applicable to their new purpose.

The authors also don’t explain their process of “Combining the two measures;” rather the reader is left to assume that this process of combination is an apparently trivial matter. Is it so trivial? We don’t know because the methods of combination are left unstated. As such, the philosophical appropriateness of employing Gini and Heifendahl’s methods to “semantic space” (a concept that, itself, requires further clarification) and the methodological combination of the two forms of measurement are both unclear.

After explaining their use of Gini and Heifendahl, the authors then assert that “as figure 3.3 shows, thematic concentration turned out to be indeed significantly higher in paragraphs than in segments of equivalent length.” Figure 3.3 is:

Screen Shot 2016-02-23 at 2.58.06 PM

Figure 3.3 does a great deal of rhetorical work but is quite difficult to interpret. Indeed, a red flag that a given figure may be somewhat impenetrable is the use of scare quotes to describe components of the figure (“boxes” and “whiskers”).

The figure’s description identifies a “line bisecting the three ‘boxes'” which presumably refers to the lines between the light and dark grey components of the three rectangles (are these the aforementioned boxes)/ This apparently is the “median” value for the group — but the median value of what?

The scale beneath the three lines is marked “Herfindahl” and ranges from 0.045 to 0.09. Presumably we are meant to interpret this diagram as demonstrating a higher level of topical consistency amongst paragraph documents than the 82-Word Slice documents and the 200-Word Slice documents. Unfortunately the diagram provides no information about what the Herfindahl scores indicate and how they indicate topical consistency. Is a higher score better or worse? The quantities and the meaning of the scale are unstated and left uninterpreted for the reader.

Furthermore, the description of the figure mentions “two central qualities,” as well as “upper and lower quartiles” and “whiskers.” These terms are undefined and it is left to the reader to understand what they mean or how we should interpret them in relation to the pamphlet’s broader argument.

In addition to this, there is no evidence in this diagram of the authors’ original statement that they combined Herfindahl and Gini measures in order to assess topical consistency — instead we only have a Herfindahl measurement. What happened to their use of Gini to assess topicality within a paragraph? Should we have more or less faith in the data communicated by this figure given that the Gini index has disappeared from the analysis?

I would also add, having engaged in topic modelling work myself, that the choice of 50 topics is somewhat arbitrary and demands further justification. How do their Herfindahl measurements change as the number of topic models is increased and decreased? Changing the number of topics will transform the overall measure of ‘topicality’ of a given paragraph so it would be useful to compare the 50-topic model to larger and smaller models.

This use of topic models raises a whole set of other questions (beyond the scope of this blog post) concerning whether topic models “infer the underlying structure” (Blei) of a given corpus, or whether they “assert” a model based on a particular interpretation of the text. In other words are the topics actually “there,” in the text, or are they just particular forms of reading the text? This question is, in my opinion, completely unsettled and Blei’s language of “infer” brings into sharp relief the controversial technical, hermeneutical and exegetical dimensions of topic modelling work.

The opacity of figure 3.3 contrasts with the relative clarity of the following figure (6.4 in the Pamphlet). Figure 6.4 attempts to visualize the measure of focus and discontinuity between paragraphs of texts from the labelled genres.

Screen Shot 2016-02-24 at 1.19.04 PM

In this figure the measurement used to determine discontinuity and focus are clearly explained such that the reader can understand and assess the logic of the argument. This graphic functions well as both a component of argument while also being methodologically transparent.

I don’t offer this critique to undermine the work of the authors nor to try and disagree with their argument. Their reasoning appears sound and their discussion of the paragraph as a semantically meaningful – and largely unexamined – space of interpretation is extremely important. And I could have found similar examples of the rhetorical use of algorithms or data visualizations elsewhere.

The point of my critique is to return us to thinking about the gap between what we might have once thought of us our ‘home’ disciplines and the new spaces of the digital humanities. Working in the gap means thinking critically about the methods of communication that are best suited to bringing together disparate discourses. How do we strike the right balance between technical information and argument? How do we ensure we provide the necessary technical background for our work such that it can be understood by our colleagues in various disciplines? Indeed, my own strange background in English and Computer Science has meant that my close reading was always kind of algorithmic and my coding work always feels like a form of exegesis.

In some of my teaching work in Engineering Communication we often refer students back to Northrop Frye’s statement that “you don’t learn to think wholly from one language: you learn to think better from linguistic conflict, from bouncing one language off another.” In the digital humanities we need to stress the value of that linguistic and representative conflict. We work in a space of conflict, both institutionally and in terms of our fields of knowledge, and I suggest we embrace the conflict between digital and humanities as a useful and generative force for understanding both.

In place of using algorithms and data visualization rhetorically or as an argumentative black box we need to think of the computational and the visual as further terrain for productive conflict. I think we need to present our arguments with the conflicts between various concepts foregrounded. How well do our algorithms fit our models? In what ways do our algorithms obscure the very textuality of our objects of study? Do our algorithms detect a presence in a  corpus or assert one? Do our data visualizations present evidence or do they attempt to gloss over argumentative weakness with beautiful graphics? Each question suggests a productive conflict between disciplines that can only make our DH work more interesting.

As such, I think we need to think critically about our use of visualization and algorithm while also using our humanities training as the grounds for a kind of contrapuntal analysis of our work and our forms of communication. In doing so we will bring our traditions of close reading and critique to bear on the mechanistic and seemingly-inscrutable dimensions of DH work. Against the seamless and sutured arguments that we strive for, we should value productive conflict as the basis for cooler interventions.