This is the full version of an informal talk I delivered in compressed form at the Penn State Center for American Literary Studies Seventh Annual Symposium “Institutionally Speaking: The Object(s) of American Literary and Cultural Criticism” on March 19, 2018, one day preceding the fifteenth anniversary of the U.S. invasion of Iraq (which went otherwise unmentioned, at least explicitly; though it’s also true that much else that needed to be said, was said).
For the occasion, we were asked to specify a literary or aesthetic object and an institution, and to present a critique of the latter by way of the former.
The object I’ll be discussing is a linguistic and textual object, but not a literary object, though it does have a literary history (which I won’t have time to go into).
This object lives in a disciplinary contact zone including language and writing studies, communications studies, and both applied mathematics and the study of computing.
It is, at least for the moment, still fundamental to what Karl de Leeuw has called an “unprecedented civilian deployment of security tools and technologies” in our historical present — something de Leeuw suggests is “insufficiently weighted in current accounts of the impact of new information and communications technologies.”
As such, this object is an object that all of us encounter — indeed, use — many times a day, entirely routinely, in both our personal and our professional lives.
This object is the password, a still handy but now quite misleading term for a sequence of alphanumeric or other symbolic written characters.
I say “handy” because you know more or less exactly what this word, “password,” means, at least conventionally. You possess passwords and you use them.
And I say “misleading,” because as you probably also know, today a password performs its intended function best when it least resembles an actual word.
By “actual word,” in this context, I mean the linguistic and textual object one finds in a dictionary.
To explain why we are almost never permitted to use single dictionary words as passwords, anymore, you can begin with the observation that there are only two things (if they are indeed two separate things) that a computer can do that a human being cannot do.
A computer can count at a speed, and on a scale, that both exceed what all but a few of us can manage in our human heads. (Some of those of us who cannot are more easily impressed by this than others.)
What we call words, in a particular written human language, are marked — one might even say scored — by unit frequency patterns that computers can tabulate at great speed and which they can extract, through tabulation, from great masses of digitized text.
Whether those units are individual letters or characters, digraphs or trigraphs, words, or sequences of words, their relative frequencies of occurrence in in written languages are trivial to compute and to re-compute, in any mass of text data.
Written words are, from this point of view, isolated but distinctly patterned sequences of signs. The dictionary in which you look up a word, seeking its meaning, is also a compilation of such patterns.
While no one would take the time to do so manually, it is trivial for a human using a computer to compile all the words in a dictionary and use that compiled dictionary to maliciously probe an authentication system such as the ones many of us use many times a day to, so to speak, “log in” or “log on” to some electronic system or otherwise gain access to a virtual space.
Computer security professionals call such a tactic a “dictionary attack.”
Though I’m simplifying things for the occasion, it’s accurate enough to say that if your username is already known and you’ve chosen a dictionary word as a password, you’re in special trouble, because your password is the easiest kind to (so to speak) “guess” using such a procedure.
The relative frequency analysis of written natural language has not one but two genealogies. The first begins with what we call philology, a pastime that has, virtually since antiquity, included counting among its modes of analysis. From the eighth century of the CE onward, Arab Muslim Quranic scholars debated the existence and provenance of non-Arabic words in the Quran.
Measuring the relative frequency of occurrence of letters and letter sequences in a written text promised one way to distinguish words from different languages within a text written in a single alphabet.
The other genealogy of the relative frequency analysis of text begins with cryptology and more specifically with cryptanalysis, the direct or interpretive deciphering of enciphered natural-language text.
Let’s say that you select the English word “hello” as a password. (Again, almost no one would allow you do this today, but that wasn’t always the case.)
Of course, your password isn’t stored in the clear, as “hello.” It’s enciphered, meaning that for each letter in the cleartext, some other symbol has been substituted, in an effort to make it illegible to anyone who intercepts it.
But if it can be assumed (in this case correctly) that a password is an actual word, that is a dictionary word, in a known written human language (in this case, English) then the known relative frequency of letters or other units of written English can be used to, so to speak, crack the code, by inferring patterns in the cleartext from patterns in the ciphertext symbols substituted for them.
Can we assume that the letter “e” will occur more frequently than any other letter in most written English? Yes, we can. Can we count on certain letter combinations, such as Q followed by a letter other than U, almost never occurring? Yes, we can. Is is then safe to assume that in enciphered text, the cipher symbol substituted for “e” will be the most commonly occurring symbol in the cipher text? And that the symbol substituted for “q” will almost never be combined with the symbol substituted for “U”? While, again, I’m simplifying things for the occasion, the answer is yes, again yes, and so on.
Such inference is fundamental to both telecommunications engineering in general, where the goal is efficient transmission of data, and (the practical domain of) cryptology in particular, where the goals are the concealment of one’s own data and the exposure of someone else’s.
That is why the gatekeepers who regulate our selection of passwords often prohibit us from selecting individual natural-language words or commonly occurring phrases like “thank you” or “happy birthday,” and why they encourage the selection of a less predictable, because less commonly occurring, sequence of symbols like “q6g8b37!fp*&z” (which, although short in length, which is another issue, is still a much better password than “hello”).
The idea is to remove the many shortcuts the code-breaker can derive from our vast and expanding universe of digitized machine-readable natural-language text.
We were asked to specify both an object and an institution today. The institution I have in mind is the university, but not the university in general; rather, it is specifically the university at war.
Let me remind you, especially those of you who are under forty, that our time is wartime.
United States Public Law 107-40, the Authorization for the Use of Military Force passed by Congress on September 14, 2001, remains in effect, having been used by three successive U.S. presidents to justify the longest war in US history (the sixteen years in Afghanistan), the occupation of Iraq from 2003-2011, troop deployments to at least eight other countries, and the continuing operation of the military prison at Guantanamo Bay Naval Base, among others.
If you know this much, you may also know that Representative Barbara Lee of California was the only member of Congress to vote against the AUMF in 2001, and that she has unsuccessfully proposed an amendment to repeal it every year since then.
In June 2017, for the first time, the Appropriations Committee of the US House of Representatives approved the amendment on a nearly unanimous vote, though it was later removed from the Defense Appropriations Bill.
So the United States is still at war, very much officially. The American university since 2001 has been the university at war, very much officially. The crises and the reforms of the university since 2001 have been the crises and the reforms of the university at war, very much officially.
The intellectual “turns,” if that’s what they should be called, taken by the knowledge disciplines of the university have been turns taken by the university at war, very much officially.
I mentioned earlier that the relative frequency analysis of written natural language has two genealogies. One, I suggested, begins with philology, which for my purpose here we can take as an old-fashioned word for what most people in this room do professionally: that is, analyze relatively small quantities of written documents as representative cultural forms.
The other begins with cryptology, which today is the domain of our colleagues in computer science, and related technical sciences, and of course of the university’s institutional neighbors, the intelligence agencies.
But these two genealogies are entwined, and historically, their entwinement has been stimulated by war, a condition in which everyone is enjoined to contribute to, or at minimum reflect the priorities of, some common cause.
“Not only scientists,” Carol Gruber has observed of American college and university faculty upon U.S. entry into the First World War, “but humanists and social scientists as well sensed in the war situation an opportunity to win confidence in their disciplines, to stimulate interest in them, and to accomplish necessary reorganization and reform.” (That’s from Gruber’s 1975 book Mars and Minerva: World War I and the Uses of the Higher Learning in America.)
It’s fair to say that the period of United States history since September 2001 has been marked by a rapid and aggressive expansion of fundamentally cryptanalytic practices of electronic data collection and creation. Though that expansion was led by the security agencies, it created many opportunities in two other kinds of institutions in particular: tech companies and universities.
That doesn’t exclude us, here in this room, in the domain of the literary humanities. Most of us in this room prefer not to think of what we do as decoding secret messages, much less performing automated or semi-automated surveillance on our research objects. And I’m not here to tell you otherwise, if the cryptanalytic turn is something you’ve resisted.
The fact is that to use a computer to analyze text is to count elements of that text. There is literally nothing else a computer can do for you, no matter how you dress it up. The computational analysis of text — any text, of any kind — is inconceivable without the comparative counting technique we call relative frequency analysis.
And while as I’ve mentioned, that technique has its own roots in philology, it has been nothing less than thoroughly weaponized by seventeen years of continuous war — that is, seventeen and counting — accompanied by extensive economic violence and politically violent efforts to reform the university.
I think the ostensibly new applications of computing in the literary humanities that we’ve heard so much about for the last decade deserve to be called a “cryptophilology,” to mark both their fundamental dependence on relative frequency analysis and their emergence after 2001 amid a surge of national security legislation and cryptanalytic institution-building.
I’d like to see us make more progress in historicizing this element of our recent past and historical present, and in drawing some normative conclusions about it, because like so much else in our current moment, I think its unwinding will be our burden for years to come.