Can we imagine a Statcheck for the arts and humanities?

Wednesday, February 08, 2017

Here's a wondering for a Wednesday. Can we imagine having software tools in the arts and humanities that do some of the dirty work of fact and data checking ahead of peer review?

The inspiration for this comes from the stir that has been created recently in the sciences - especially experimental psychology - by a tool called Statcheck. Experimental psychology often depends upon applying p-value assessments to data, to determine whether findings are statistically significant or simply the result of experimental bias or background noise. Statcheck was a program devised at Tilberg University, which automatically scanned a massive set of 250 000 published papers, recalculated the p-values within them, and checked whether the researchers had made errors in their original calculations.

The finding was that around half of all published papers have at least one calculating error within them. That's not to say that half of all published papers were fundamentally wrong, such that their findings have to be thrown out of the window entirely. Nevertheless, it does highlight significant deficiencies in the peer review and editorial process, where such errors should be picked up. And while one miscalculation in a series may not be in itself significant, a number of miscalculations might spur suspicion as to the credibility of the findings more generally. Miscalculation also offers a glimpse into the mindset of the paper's author(s) and the processes that went into its production: have calculations been produced by one author alone, or by two authors independently to cross-check? were calculations done on statistical software or by hand? and, most seriously, do miscalculations point to attempts to manipulate data to support a preconceived outcome?

In a time-pressured academic world, peer reviewers often take shortcuts. Among one of the many reasons peer review is flawed as a gate-keeping mechanism for excellence, we know that even though reviews are technically blind, reviewers are often looking for an implicit feeling about the unknown author's overall trustworthiness rather than scrutinising every single feature of the individual article in detail. Beyond exposing problems with the articles themselves, this is a revelation about peer review that may emerge from Statcheck. In the arts and humanities, peer review should ideally be based on an assessment of the clarity and reliability with which an author advances his or her claims, rather than whether we agree with the claims themselves. To make an analogy with philosophical logic, we're looking for validity, not soundness. One of the basic functions of peer review is to get a feel for the author's argument as being based on legitimate reason even if the outcome of that argument is not one with which we concur. In assessing this, where there are deficiencies in basic details these may point to deeper structural or logical flaws in the author's thought processes.

The existence of Statcheck got me thinking about whether in the arts and humanities, and English in particular, our published papers depend upon similar basic mechanisms like the p-value test and, if they do, whether the author's accuracy in using those mechanisms could be checked automatically as a prelude to peer review. Of course, even in the age of the digital humanities, arts and humanities still don't tend to deal in statistical data but rather in 'soft' rhetoric and argumentation. Still, are there any rough equivalents? And if so, could we envisage software capable of running papers through pre-publication tests (just as Statcheck now does) to get a general sense of the care authors have paid to the 'data' on which their argument depends, which might then cue peer reviewers or editors to pay closer attention to some of the deeper assumptions and the article's overall credibility?

Here are some very hypothetical, testing-the-waters assumptions about the sorts of quantifiable signals it might be useful to pick up programmatically (all of which we would like to think peer reviewers would notice anyway - but the lesson of Statcheck in experimental psychology suggests otherwise):

Quotation forms the bedrock of argumentation in the arts and humanities. As I constantly tell my students, if you have not quoted a primary or secondary text with absolute precision, how I am supposed to trust your arguments that depend upon that quotation? If someone is trying to persuade me about their reading of the sprung meter of a Gerald Manley Hopkins poem, but they have mistyped a key word in such a way that the meter is 'broken' in the quotation, this hardly looks good. A software tool that automatically checks the accuracy of quotations within papers, and highlights errors would in many ways be an inversion of plagiarism-testing software, but here we would be actively looking for a match between the quotation and the source.
Similar to the above, spelling of titles of texts and author's names.
Referencing and citation are clearly important, and checking whether references - even or especially in a first draft - have been accurately compiled may highlight flaws in the author's record keeping.
Historical dates may provide another clue as to the author's own processes for writing and his or her strictness in self-verifying. In presenting a date in a paper, we may often be making a case for literary lineage, tradition, or the links between a text and its contexts. It matters that we get dates precise. In not double-checking every date (for example, because an author thinks they know off the top of their head) author's have missed a key step in the process. Erroneous dates may be a clue to problems in arguments that depend upon historical contingency.
If we're looking at novels in particular, there are key markers of place and character, and relationality within these, which need to be rendered precisely. To describe Isabella Linton as mother of Cathy Linton in Wuthering Heights or to write Thrushcross Grange when meaning the Heights might be easy mistakes. But these may also be symptomatic of an issue with the author's close (re)reading of the text. It should in principle be possible to apply computational stylistics to verify that an author really means who or what they refer to in the context of their writing.

I'm sure that there are more possibilities to add to this list - but I'm not sure that even if (and it's a big if for a host of technical reasons) we could devise programs to automatically parse papers for accuracy in areas like this it would be ultimately beneficial. Nevertheless, if peer review is a legacy mechanism for a pre-digital age, what harm in a little futuristic speculation now and again?

And, since I'm feeling cheeky, imagine if we could do a Statcheck on a whole mass of Arts and Humanities articles. Wouldn't it be deliciously gossipy to see just how many big name scholars make basic errors?

Labels: academic excellence, academic publishing, data, digital humanities, facts, peer review, reliability, Statcheck, University Life

Posted by Alistair at 8:03 pm

Recent Posts

Twitter @alibrown18

New Essay

Can we imagine a Statcheck for the arts and humanities?

Wednesday, February 08, 2017

0 Comments:

Post a Comment