Computational Analysis of Text

Textual Analysis

An Introduction Inspired by Our Reading(s):

Of the many questions raised by the readings/projects in this unit, those that deal with the concept of meaning-formation as it relates to the accessibility, manipulation, and evaluation of bodies of textual information strike me as most intriguing. What does our ability to analyze corpora using computers change about the process and outcome of study? Where is the line drawn between benefit and potential detriment? More to the point, what is being benefited or put at risk? In attempt to defend the appropriate weight of those questions I offer the notion that we aren’t quite able to speak definitively about an experiment-of-study that is currently in the process of breathing its way to the forefront of scholarship. We can, however, discuss the ongoing metamorphosis of scholarship in the digital age, so long as we accept that to bemoan change is, at least in this case, to ignore great potential. This change, this growth in the public’s access to new or growing technologies is one that, according to Dr. Houston during her in-class visit, can be likened to the technological shift that took place in the 19th century. A heavy and appropriate statement, indeed. A statement that begs the question of “What do we do with what we can do now?” To begin building hypotheses about this question, we should first take into account what we’ve been doing for centuries: close reading.

Close reading, as far as I’m concerned, can be loosely defined as a process of deep concentration on one text or a small body of related (or seemingly related) texts. By considering content after deep and repeated readings we attempt to establish any number conclusions about the work(s) in question. This process is a natural one in that we, in all of our experiences, analyze and often do so with reverence. By attempting to understand the underpinnings of a moment, excerpt, or entire work, we are evaluating and interpreting in order to understand. At times we wish to conclude that this work or that work belongs, or does not belong, within a genre, canon, or area of study. We do this to understand the broad nature of thematic connection so that we can focus our attention on specific works that fall into specific categories. In order to analyze, or to analyze with the intent of forming authoritative meaning, we need to isolate to build worthwhile conclusions. Our conclusions are, however, often very personal. As scholars our intent is often to analyze with little bias, or at least as little personal bias as is possible, however this isolation-of-self from analysis is difficult. That is to say, despite our ability to understand the humanity within a body or bodies of work, we are so mired in our own experiences that we often can’t help but focus on what we find important, a natural bias that often leads to overlooking that which might be pertinent. This issue, as well as the issue of our inability to read as much as a computer, might well be why incorporating computational analysis into traditional humanities scholarship is so tantalizing and important to discuss.

Distant reading is described as a process of searching within/playing with/viewing a body of textual data that makes use of computational analysis. This form of reading, which isn’t quite reading, can when added to traditional close-reading research solve a few of the problems mentioned above. Or, maybe it can’t. Though distant reading allows one (or many) to cut as opposed to wade through large bodies of text, and in allowing offers potential discovery of otherwise overlooked thematic or canonical connections, it carries a few issues similar to those inherent in close reading. Because the analysis, the brunt of the real analysis, is still handled by a human being, the potential for bias finding its way into the evaluation is just as present. That being said, the benefits of digital methods of research like distant reading are far from overshadowed by their inherent issues. Consider Dr. Houston’s work involving Victorian era poetry. By digitizing this broad corpus one can raise questions previously unanswerable (a throwback to our readings of Weinberger). According to Houston in her article VisualPage, “As humanists, we tend to make broad arguments based on a limited number of specific examples.” Here she speaks to the issues of close-reading and interpretation as it relates to establishing authoritative evidence. She continues by expressing that, with the advent and implementation of digitization, we are now able to establish new kinds of evidence using newly (sometimes personally) developed methods. The potential of this shift in methodological approach not only affect our understanding of the information, but the final product of our research and, perhaps most important, the readers of our work.

And so, the above touches, albeit briefly, on the questions surrounding access, manipulation, and evaluation, as well as (hopefully) offering a bit of insight into the questions regarding potential benefit/detriment to the product of our research, but perhaps it doesn’t quite get to the core of the matter. That core being the question of what exactly it is that is being affected. I still don’t quite have an answer to any of the questions, certainly not the last one, but I have a few ideas, ideas that, had I not screwed around a bit with some of these distant reading tools, I would have never had. Hopefully in describing my experiences as a sort of conclusion I will speak to the spirit of screwing around. Perhaps in some way this will be my personal answer to the question “What do we do with what we can do now?”

Lab work, Benefits of Screwing Around, and Conclusion

            Prompted by both Dr. P’s each of us in the class chose either a database containing a collection of Darwin’s letters or a database of letters, diaries, and news articles collected from Augusta County, Virginia and Franklin County, Pennsylvania during the era of the Civil War provided by the University of Virginia. I chose the latter, in part because I knew what I wanted to look for. My goal was to use this database that allows one to navigate using keyword searches to dig up some information about the drought in Kansas during the mid 1800’s. What I thought I’d find was some information about service men traveling from north to the south or vice versa, information that I assumed I could find by simply typing “drought” or keywords involving heat and thirst. I found nothing, surprisingly. Or, I didn’t find what I was looking for. What I did find was a single poem, which led me to search for others within the database. I found very few in the section devoted to letters, however I found several instances present in news articles. Furthermore, I began to see patterns emerge in the poems, distinct patterns that pointed to a regional emphasis on specific topics as well as differences in dialect. Those poems that came from the south tended to be romantic both in form and content, those form the north tended to be more political. What began as a search for drought related letters became screwing around with words related to poetry. This quickly evolved into what looks a lot like humanities research. By making use of the database and my computers ability to navigate through it, and then using my close-reading skills developed through years of scholarship, I was able to draw a few tentative conclusions about the people writing the letters, their state of mind, and their regional location and its influence on their writings content.

To quickly conclude I’d like to say that the concept of computers aiding me in close reading never crossed my mind before this class. In fact, I’ve always been a bit intimidated by computers, never wanting to fully rely on them in my work as a scholar. Despite my initial hesitance I can say with some confidence that now I’m a sort of convert, from the dark-side of humanities research I’ve now placed one leg into the ostensibly darker side of computational analysis. And I posit that this is beneficial, not the stepping so much as the message behind it: the potential for collaboration and interdisciplinary approaches. Though there may be drawbacks associated with both close and distant reading, the combination, with a healthy sense of focus thrown in, strikes me as too powerful to not be the future of humanities scholarship. And, who knows, this could be the beginning of the lines blurring between fields of study, a potential outcome I can get behind.

A Second Conclusion

            Sorry, I will go over the word count here. As promised I’d like to describe the project I’ll be working on for the next couple of years, assuming it will take that long to complete. Prompted by my work with the Valley of the Shadow database and by Dr. Houston’s writings and visit, I began thinking about poems from the1800’s-1900’s that are now in the public domain. This corpus exist in a sort of limbo, much of the work is obscured by time and simply sits in the ether untouched, unread, and unremarkable. The fact that it is in limbo does not (necessarily) mean that it isn’t worth looking at, maybe even archiving, or perhaps screwing with. This project would attempt to do all that and then some.

The process would begin with collecting the works that fit within the abovementioned time period. A team of writers, including myself, would then use text-analysis software like Voyant to establish categorical connections between poems. The now curated work would be divided up between the writers depending on what they feel reading that day. During their close reading the writers will begin rewriting the works, shifting the vernacular into a more modern one, all the while reinterpreting the works content and combining works at their discretion. This process of rewriting and remixing will go on until we’ve a new body of work, a reanimated corpus of sorts. Each new work would be tagged with a number corresponding to the original works to keep track of the poems and for purposes of reference and review. The goal here is archiving the source work so that it can be contained on a website with a search function involving either keywords or the reference number we’ve placed on the remixed works. If published, the remixed poems would have that reference number somewhere on the page so that readers could navigate to the website and enjoy the source material, perhaps feeling encouraged to remix a bit themselves, or at the very least breaking up the linearity of their reading process.

After the work has been completed, and before attempting to get it published, I’d like to assemble yet another team to read both the source material and the remixed works for the purposes of visually representing the content using their personal creative lenses. This second remix would be placed on the website with the original works, and would pop up alongside the visitors search results depending on the reference number they’ve typed in. The intended effect is a less-linear reading experience, as well as an excuse to screw around with work for the purposes of understanding and applying what is understood. From text on page, to text on screen, to visualization and back again. I’m still figuring out some of the particulars, and the idea is subject to change, but I hope from what I’ve provided here that you see the influence of this wonderful class you’ve both put together for us. This is my example of DG at work, screwing around with the intent to create and inspire creation and study. And now I have to turn the paper in, something I’m somewhat saddened by.










Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>