I am not the most technologically savvy person; I’ll be the first to admit that. So when I was told I would be using some programming language to analyze a corpus of texts, I was initially very intimidated. Novel phrases like “Word Vector Analysis” and “RStudio” all went right over my head. Fortunately, as we went over how to prepare corpora, train models and run code as a class, I became more confident in my own abilities to investigate a research question using these new techniques.
A Brief Introduction to Word2Vec
According to the Wikipedia page, Word2Vec is a group of models (a package) that takes in the input of a corpus and creates a “vector space” in which each word of the corpus belongs to a specific vector in this space. From my beginner level understanding of word embedding models, this means words used commonly in similar contexts will be encoded by this algorithm to have a close relationship due to their physical proximity on a spatial model mapped by vectors.
Essentially what we can use with this tool is discern relationships between certain words: including determining “word clusters”or even choosing a specific word, and finding a list of other words commonly used within the same context.
Developing a Research Question
After figuring out vector analysis with word2vec, the next step was thinking of a research question and forming my own corpus of texts to find evidence to answer that question. While exploring different literary corpora, I grew interested in the digital publishing initiative, “Documenting the American South”, otherwise abbreviated as “DocSouth”. This collection of literary works, compiled by UNC Chapel Hill, contains a broad range of texts: from narratives written by slaves, to the literature of famous confederate leaders, all people living in the particular culture of the south, and written within the timeframe before or during the Civil War (early to mid 1800’s). One thing that caught my attention was that both slaves and free white people wrote poetry.
Poetry, a creative expression of ideas, can be very telling of the main themes carried by the collective of a group. I wanted to know if by looking at only poetry, I would be able to see a significant difference in the way these two groups viewed important concepts, mainly freedom and power dynamics between master and slaves. This became the structure of the research question I chose to investigate: To what extent did black slaves and free white people of the south view the conccept of freedom differently, as expressed throughout the poetry of the 19th century, written by each respective group.
How I gathered texts for my corpora
Since I was comparing the kinds of poetry written by two different groups of people (those being black slaves and free white people) I needed to compile two separate corpora in order to compare them.
I started with browsing the different collections on DocSouth, including “Library of Southern Literature” and “The Church in the Southern Black Community”, picking only texts written in verse form. This included not only strictly texts labeled as “poetry”, but also songs, since they are essentially poems as well.
Due to the form and genre of texts I chose to work with, it was challenging to accumulate a big enough corpus, since poems are typically shorter in length. Even though I scoured DocSouth for all signs of poetry, I was still lacking in an appropriate word count to make accurate claims to my research question. So, I sought out other resources, and ended up drawing several more collections of poetry from a website published by Brycchan Carey, a Professor of English at Northumbria University.
Once I identified the texts I wanted to include in my corpora, I converted them into text files by copy and pasting them from the HTML document into oxygen, removing the metadata at the beginning and end of the text and saving as text files to be compressed together and uploaded onto RStudio as one file.
Some Initial Queries
Before I even had finished compiling my corpora (I was at about 200,000 words each at this point), I did some initial test queries to see what kinds of results I would get, to see if I was on the right track. Luckily, my first attempts at training a model and performing a query to the word “freedom” was very promising.
On one hand, results for the corpus of slave poetry linked “freedom” to words such as “struggle”, “vigor”, “deed” and “blood”, suggesting that freedom was something that needed to be fought for.
On the opposite side, the top two words linked to “freedom” in the corpus of texts from white authors were “art’s” and “leisure”. There were still words associated with violence such as “battle” and “armed”, however those first two suggest that freedom was already a granted quality that they used to enjoy their free time.
Just this one initial, and very simple query already showed a distinct difference between the two group’s perspective. Freedom wasn’t given to everyone, but this I already knew was obvious. However, it was very interesting to see how the words used by either group in relationship to freedom was able to show this so clearly.
Before I went further into my queries and finished building my corpora, I did some research of my own about slave poetry, specifically the slave songs. Deemed the one of the most important bodies of folk songs born on American soil, slave songs were passed down and inherited from as early as the first slaves brought to North American, and others still created until slavery was abolished. Early observers described them as “repetitive”, “naive”, “childlike”, “primitive” and even “barbaric”. Interestingly enough, the diction, imagery and structure of the slaves’ own folk songs and poems were all usually based on white revival hymns, evidence that the both the culture and religion of white southerners had an impact on their black slaves. (L. Ramey)
However, while the two kinds of poetry might have carried the same kinds of language, the ideas expressed by the same vocabulary were very much different. What I got from this research, is that although slave songs may have been heavily influenced by the religion and literature of their masters, the poetry and songs written by slaves show that the culture and perspectives of slaves and their masters remained very different.
A Further Investigation
Prompted by this research, and more curious about my texts, I continued to test out different queries on my corpora. And so, as my corpora grew in size, I retrained my models again to perform the rest of the queries I had wanted to investigate.
First I retried looking at “freedom”, getting slightly different results than I had before with the smaller corpora.
The results for a word2vec query of freedom for the corpus authored by white poets were very straightforward, containing synonyms and semantically related terms like “liberty” and “free”, as well as positively connoted words like “flourish” and “glory”.
The results for the corpus of slave poetry on the other hand was very interesting. Many of the words that appeared in the results were related to the act of buying freedom. Words like “payment”, “sold”, and all different tenses of “purchase” show that slaves not only feel like freedom is a right that must be fought for, like I had drawn from my initial query, but that it is something that can be bought with monetary value as well. It is not a granted human right as it is for the white people of the south, as nothing of this notion is indicated through the results from white poets.
Curious, I also wanted to look into what would happen if I changed my query slightly and added another vector. By searching what was closest to “freedom + slave”, I got some very interesting results. For the slave poetry, the list of words weren’t too different, but the corpus by white poets got very different results. A lot of these words were negatively connoted, such as: “punish”, “defy”, “scourge” and “dispute”. These words indicate that the association between slaves and freedom was discouraged, that these words together did not result in a positive outcome (at least for white slaveowners).
Although I started this assignment with assumptions about the differences between the views of black slaves versus free white southerners, getting to see actual evidence of this through digitally analyzing a large corpora of texts was reaffirming.
Overall, I was able to see that slaves felt that freedom could only be achieved through some kind of struggle (either fighting or escaping) or somehow buying their way out. It was not an automatic assumed right as it was for white citizens
I was able to find the explanation behind why my assumptions could be confirmed, that the poetry of the two opposite groups of the 19th century American south had such different views on the concepts of freedom. Even with the influence of white literature on black folk songs and poetry, the ideas expressed remained very much independent, separating two very distinct cultures and perspectives of the American South.
“Library of Southern Literature.” Documenting the American South, University of North Carolina at Chapel Hill, docsouth.unc.edu/southlit/.
“Slavery Poems.” Edited by Brycchan Carey, Slavery, Emancipation, and Abolition, Brycchan Carey, July 2002, www.brycchancarey.com/slavery/poetry.htm.
Ramey, L. “Slave Songs and the Birth of African American Poetry.” Google Books, Springer, 2008, books.google.com/books?hl=en&lr=&id=9WXIAAAAQBAJ&oi=fnd&pg=PP1&dq=slave%2Bpoetry&ots=4kkkXgx3CX&sig=CDanGUUt4L3MYC0Jgsl-xOGFZLo#v=onepage&q=freedom&f=false.