IntroductionFor my first project using word vectors, I chose to look into the works of James Joyce. Now you may be asking: why James Joyce, did he even write enough words to get reliable answers, or what kind of meaningful data can you even get from just one author? And I promise the answers are coming up ahead. Starting simple, I took inspiration for this project from my dad. He’s from Ballycastle, Northern Ireland, and he studied English in university. Despite my siblings and I all being born in England and our mom being English, my dad has always upheld that we are one hundred percent Irish- even our dogs. My dad has always had a fascination with Irish Literature, so, when I needed to have a research topic that I cared about, this came to mind. My research question started out centering around transitional binaries in the works of Joyce, and looking into the difference between the way transitional binaries are used and static binaries are used. However, upon tinkering with the model, I found it difficult to get any interesting or meaningful data out of many of the searches of that nature. This meant I had to try looking into something else. This led me to a new research topic of looking at the correlation between life and death and their connection with the soul. I used the model to look into how Joyce presents the human soul in life vs. how he presents the soul in death. In other terms, I wanted to see how Joyce perceives the value of the soul in life, what it means, and what it brings to the table, and how that changes in death.
The ModelA Vector Space model essentially takes all of the words in a file and maps them out into a 3d space so it can see what words are on each vector and which words are closest together. My vector space model is trained on all of James Joyce’s published works. I did not include his posthumous works simply because they were not available to me as plain text files files or otherwise. Thus, my corpus consists of Ulysses, The Portrait of the Artist as a Young Man, Dubliners, Exiles, Pomes Pennyeach, Chamber Music, and Finnegan’s Wake. It is a total of 579, 120 words with a total of 9951 unique words. Normally when creating a Vector Space Model, you should aim for over one million words; however 500,000 words is the bare minimum if you want any meaningful data at all. I suppose I was fortunate to find an author with works as gigantic as Ulysses(219,281 words) and Finnegan’s Wake(207,903 words) because otherwise I would have had to find something else to ramble on about 🙂
“Such a confusion is inevitable in a culture that has pretty much lost sight of the distinction between body and soul, matter and spirit, garment and man; because the core of Joyce’s values is the simple conservative, essentially Christian idea that the spiritual, rather than the material is a man’s real important condition.” (196, Mason)One of the most interesting binaries in Joyce’s Works is the comparison between the opposite concepts of life and death. To Joyce death is not the end, as Mathew Gallman said in “Life and Death in Joyce’s Dubliners”, a thesis presented to the Graduate School of Clemson University, “Indeed, Joyce does not treat death as the end; instead, the dead tend to linger in the minds of the living and constantly interrupt life while exposing and instilling a paralysis in the living.”(2, Gallman) In my opinion, that connection and lingering after to death has to do with the lingering of a soul and the true essence of a person that remains even after their body has perished. My journey into this topic began with simply doing two separate “nearest” to searches for the words light and dark. A nearest to search prompts the model to return all of the words closest to the given word in the vector space model, one being the highest rating and noting that it’s the same word and zero being the least related.
w2vModel %>% closest_to('life', 30) word similarity to "life" 1 life 1.0000000 2 soul 0.5997167 3 own 0.5420109 4 death 0.5404661 5 cares 0.5356172 6 sentenced 0.5264777 7 understanding 0.5185928 8 spirit 0.5122515 9 awakened 0.5111644 10 revolted 0.5077303 11 denied 0.5065702 12 substance 0.5046731 13 grievous 0.4900404 14 human 0.4897041 15 theories 0.4866965
w2vModel %>% closest_to('death', 30) word similarity to "death" 1 death 1.0000000 2 life 0.5404661 3 judgement 0.5097822 4 birth 0.4857920 5 illness 0.4785076 6 soul 0.4752783 7 agenbite 0.4627504 8 neglected 0.4567509 9 died 0.4433012 10 sorrow 0.4388093 11 ghost 0.4373190 12 mother’s 0.4352769 13 suicide 0.4335094 14 awakened 0.4168518 15 spirit 0.4149271
What I found most interesting about these searches was how high soul is on both of these searches was how high soul appears in both of the lists. It is the number one word most similar to life, as you might expect. It is number six compared to death, which I found to be intriguing. When you think about death you don’t necessarily think about its connection to the soul. Instead it brings to mind the ending. From even this first search, one can start to see that Joyce doesn’t necessarily view death as the absolute end and instead is still connected to the human soul.Before we continue, I find it important to provide a note about the data. In a normal situation, words are shown to be strongly correlated with each other if they are above 0.50% close to each other. However, with the case of death I have chosen to look at rankings instead because it is not generally referenced directly, so there is less data on it. There is less data on the word death because generally in his words Joyce chooses to reference death through euphemism as opposed to the direct word. He seems to prefer to focus on the effect of death on the living-note life at number one- and they way the living experience a death and that revolves around euphemism. But as soul is about 0.47 and that places it at six in words nearest to death I chose to look further into it and give it a chance! After my initial realization at how life and death correlate to the soul, I wanted to see how that correlation is both similar and difference and what that could reveal about James Joyce. So at first I started looking at the space between Life and Soul and the space between Death and Soul in the Vector Space Model, but the results I got were essentially uninteresting and meaningless, so I had to shift gears- hopefully better than when my dad tried to teach me to drive stick shift. So in my next attempt to compare the soul through life and death, I decided to use the closest to two things feature of R studio. This means that I asked the Vector Space Model to find the things closest to two things at the same time. This provided some more meaningful and interesting data that I chose to analyze.
w2vModel %>% closest_to(~'life'+'soul',20) word similarity to "life" + "soul" 1 soul 0.9270663 2 life 0.8559748 3 body 0.6319674 4 consciousness 0.6123463 5 grievous 0.5959201 6 loveless 0.5890365 7 understanding 0.5747402 8 spirit 0.5736897 9 swoon 0.5729068 10 incertitude 0.5718540 11 own 0.5702436 12 death 0.5602845 13 heart 0.5565311 14 ecstasy 0.5517841 15 sinful 0.5398547 16 cares 0.5375182 17 reality 0.5327690 18 holiness 0.5304537 19 revolted 0.5267839 20 mysterious 0.5207918
w2vModel %>% closest_to(~'death'+'soul',20) word similarity to "death" + "soul" 1 soul 0.8619716 2 death 0.8557148 3 life 0.6641184 4 loveless 0.6090423 5 judgement 0.5559238 6 incertitude 0.5530433 7 body 0.5468722 8 spirit 0.5428332 9 loneliness 0.5301504 10 swoon 0.5276921 11 consciousness 0.5208418 12 ecstasy 0.5144898 13 agony 0.5129551 14 grievous 0.5095165 15 heart 0.5072009 16 understanding 0.5011298 17 neglected 0.4988037 18 sloth 0.4906147 19 sorrow 0.4892728 20 awakened 0.4878881The first thing that should be noticed about this data are that the words surrounding life and the soul and death and the soul are not as different and opposite as might have been expected. Notably, you see the words loveless, consciousness, grievous, understanding, and body. Some of these words are easy to understand in their connection to life, death, and the soul. For example, body makes sense in that the soul inhabits the body in life and the soul leaves the body in death. Additionally, the word consciousness seems natural as the soul is often thought of as a person’s consciousness which is essentially a person’s mind and personality. The more compelling similarities are loveless and grievous. At first glance-second glance even-, they don’t seem to make sense in either of the two results. However, upon thought it is possible to see how they’re connected to life and the soul and death and the soul. The words loveless and grievous are complicated. You can live loveless or grievous and you can die loveless or grievous. I find it interesting that loveless and grievous are connected to the soul. These words really come together to represent the soul as something that is affected by the conditions of your life and when you die those conditions comes with you. These similarities really come together to paint an interesting picture of the soul. The soul seems to be something that is molded by the situations a person finds themself in and is then carried with them after the end. Although there are many intriguing similarities that I could talk about all day, there are still differences to be discussed and thought about. The soul in life comes through as being emotional with high ranking words of understanding, swoon, incertitude, and (the most telling word) heart. Which really reinforces the idea of the soul as the personality in life. Words connected to death and the soul however do seem more finite and about the ending of life as would be expected. But what I found interesting was how the words were mostly negative. I was raised Catholic, as was James Joyce, and in the Catholic Church the thing they really hang on to is that when you die, if you’re a good person your soul goes to heaven. So its unique and interesting that in this context James Joyce rebels against the church choosing instead to describe the soul in death as being an uncertain, unhappy, and unpleasant state. This is illustrated through such words as incertitude, agony, loneliness, and neglected. The things I take away from this research are essentially Joyce’s uses of the soul in his works and essentially in terms of the bigger picture what he views as being the role of the soul in both life and death. This research illustrates that Joyce uses the soul in the context of how it feels in life and uses the soul in terms of it lingering in hurting in death. He does not view death as the absolute end, but instead there is a transition into death where the dead are still very much affecting and looming over the living. Joyce uses the soul as the essence of human emotion and personality. It is exceedingly connected to the body and life, but in death it has no certain path to follow.