<h5>//“He has never been known to use a word that might send a reader to the dictionary.” – Faulkner on Hemingway
“Poor Faulkner. Does he really think big emotions come from big words?” – Hemingway on Faulkner//</h5>
<h3> Faulkner Versus Hemingway Stylistic Analysis</h3>
I began this project because I was (and still am) incredulous as to the usefulness of employing computational reading. Thus, I decided to conduct my search in what I feel may be the most promising use of the technology: writing style determination. The tools afforded by Voyant seem as if they could provide a top-down view of what separates two writers and could be used in, say, identifying if Shakespeare or Francis Bacon produced a particular play. With this in mind I chose to compare Faulkner and Hemingway, in part inspired by the above quotes but also because I intuitively feel as if the two writers have very disparate, distinct writing styles.
First, I took a look at the general summary information.
<iframe style='width: 100%; height: 800px' src='https://voyant-tools.org/?stopList=keywords-c39ca59429d90507379e6e21d07a56c0&corpus=8c5c904b05948ce461de1dd3454293a7&view=Summary'></iframe>
Here you can see that I have a much larger corpus for Faulkner than for Hemingway, simply because his works were more accessible and required significantly less “scrubbing.” As such, any information gleaned from this project must be predicated with the knowledge that we are not comparing equally-sized corpuses(?) Corpi(?). Nonetheless, some of my intuitive knowledge of the writers seems to hold up, for example the not-at-all-surprising notion that Faulkner writes longer sentences than Hemingway by almost double. This coincides with the common perception that Faulkner is “difficult” to read. Further, I tried to scrub all the proper nouns from the distinctive/frequent words list to see what popped, but unfortunately I eventually reached the stop list limit. Still, we can see that Faulkner obviously employs a number of Southern-isms into his writing. He had far more proper nouns that had to be removed (probably a result of the corpus disparity), but he also uses terms such as ain’t, niggers, and reckon. That two of Hemingway’s words are concerned with beauty and love (darling and lovely), and that two are concerned with official hierarchy (priest and tenente – as in lieutenant) is likewise not surprising to me – my preconceptions being that Faulkner is rooted in the deep South and is concerned with notions of race and family, while Hemingway is concerned with pure emotions and masculinity. I ignored vocabulary density, because without equally sized corpuses these numbers are meaningless (the smaller the sample size, the denser the vocab generally speaking).
I continued my search by creating a white list of what I’m calling “simple” emotional words, such as happy, sad, angry, etc., basically any word that might be readily employed by a child to describe how they are feeling.
<iframe style='width: 100%; height: 800px' src='https://voyant-tools.org/?query=afraid*&query=brave*&query=angry*&query=happy*&query=sad*&bins=2&corpus=8c5c904b05948ce461de1dd3454293a7&view=Trends'></iframe>
As expected, Hemingway, despite using a corpus only half that of Faulkner, has significantly higher instances of such words. The one exception to this was the word “sad” which, if you’ve ever read Faulkner, is probably not too surprising (although this could again be attributed to the disparity of corpus sizes). I then flipped the idea and used “big words” to describe emotions, such as satisfied, regret, eager, etc.
<iframe style='width: 100%; height: 800px' src='https://voyant-tools.org/?query=anticipat*&query=regret*&query=confuse*&query=eager*&query=satisfied*&query=helpless*&bins=2&corpus=8c5c904b05948ce461de1dd3454293a7&view=Trends'></iframe>
Again, the findings seem to line up with the above quotes by the authors; Faulkner tended to use these “big words” more frequently than Hemingway, with the exception of “confuse” and, possibly, “regret” (once more, the disparity of the corpus comes into play).
Thinking about the word “sad,” I then wondered if perhaps there was a tonal difference between the works that could be pinpointed. The results are below:
<iframe style='width: 100%; height: 800px' src='https://voyant-tools.org/?query=lust*&query=cheer*&query=happy*&query=hate*&query=love*&query=sad*&bins=2&corpus=8c5c904b05948ce461de1dd3454293a7&view=Trends'></iframe>
This one was trickier. There is a tentative relationship between tone and author; on the whole Hemingway seems more willing to write about positive stuff such as happiness and love while Faulkner generally writes more negatively, but Hemingway does not shy away from the negative either. He has instances of hate and sadness in his writing. There are two things to note about this: firstly, Faulkner almost exclusively avoids “positive” words. So while we couldn’t determine the probability of a text being written by either author based on the presence of “negative” words, if a work has a significant subset of “positive” words, there is a good chance that Hemingway produced the text. Perhaps more tellingly, by examining the way the authors describe love and sex, we see an even more defined difference. Faulkner uses the term “lust” extremely often, while Hemingway uses it virtually not at all; in contrast, Hemingway likewise is willing to describe love and lovely objects while Faulkner tends to shy away from it. Hemingway uses love (and variations thereof) more than twice as much as Faulkner despite the difference in corpus size.
One last thing:
<iframe style='width: 100%; height: 800px' src='https://voyant-tools.org/?stopList=keywords-77978f8ca6ac758aa912a775fbcb97b8&query=said&corpus=8c5c904b05948ce461de1dd3454293a7&view=Contexts'></iframe>
I noticed that in context one of the reasons why Faulkner had such a prevalence for the word “lust” was that, because I had used The Sound and the Fury as one of my texts, he had a character named Luster who appeared frequently. Thus, I had to amend my findings by stop-listing “Luster;” despite the change, Faulkner still used the term far more than Hemingway.
While this search was obviously experimental and even flawed in its execution due to corpus variance, the results are still intriguing. I feel that this method of data mining could potentially allow for a somewhat accurate assessment of writing style and whether a particular writer composed a specific work. That said, I find it intriguing that one must employ scientific terms when discussing the possibility of authorship. It seems like one would need to determine a confidence variable à la a published science journal article in order to make a claim. Authors certainly can write anomalous works that are unusual for their writing style.
Further, I purposely selected two authors who I intuitively felt wrote in very specific, disparate styles. It would be interesting to test this methodology out on two writers that might seem to coincide more, for example Ray Bradbury versus George Orwell or Mark Twain versus O. Henry. Could computers find patterns that we didn’t even know existed? And if they do exist and we, as writers, are subconsciously creating these patterns, what does that say about us? It certainly could have interesting implications for a free will argument or, at least, provide some evidence of psychoanalytics.
I’m still not completely sold on the usefulness of distance, computer reading, but I can at least acknowledge now that I’ve experimented with it that it could have its uses. I suppose, like any tool, what matters is how the craftsman wields it; you can give a chisel to anyone, but only the sculptor will be able to bring life from the rock.