Implementation of Theme – Deion Smith

My Corpus

My corpus consists of the top twenty most downloaded and viewed texts on Project Gutenburg from the last thirty days. The list consists of texts that would be considered classic literature. It holds the likes of Frankenstein, Pride and Prejudice and others just as widely known.

Training My Model

As it pertains to training my model, I did not add any extra parameters than what was there originally because based on the queries I expected to make there was nothing that would particularly interfere or skew the data. The only thing I did was make sure to delete the table of contents and extra information at the beginning of each document as that accounted for a decent number of words that may have affected the data.

Research Question

I am interested in looking into how themes implement one another, for example romance and death. What are typically viewed as opposites yet are both beautiful in literature. With one you gain the passion and happiness that comes with adding someone whom you care very much to your life, and with the other you lose someone whom you care very much for. Being so far apart from one another it would be expected that diction associated with romance and diction associated with death and sadness would be mutually exclusive. I intend to use word2vec and its ‘closest to’ function to determine if different themes are at all intertwined. To do this I will look at words such as “love” and “pain” and the worlds closely associated with them. From these queries I may then conclude whether or not it is common among some of the most popular classic texts for romance and death to be used to express one another. I felt that my corpus and this question fit well together because it is a mix of different themes and genres allowing for my queries to have some diversity. In my queries I intend to focus on the themes of romance and death in order to have a certain set of queries that may be compared.


Words Surrounding the Theme of Romance

(Bolded are queries I found interesting)

word similarity to “love”

1        love            1.0000000

2   devotedly            0.6692666

3       loved            0.6486187

4      adored            0.6438752

5  consistent            0.6263078

6       loves            0.6229203

7       mourn            0.6166894

8       adorn            0.6119925

9     despise            0.6119083

10     dearly            0.6056409 


word similarity to “tenderness”

1  tenderness                  1.0000000

2      sorrow                  0.6442990

3    sympathy                  0.6225607

4      innate                  0.6061488

5      blight                  0.6014306

6  compassion                  0.5941850

7    audacity                  0.5831740

8     fervour                  0.5830718

9   penitence                  0.5827857

10      sense                  0.5769508


word similarity to “fondness”

1    fondness                1.0000000

2       atone                0.6594858

3    devotion                0.6205520

4  stimulated                0.6118879

5      avowal                0.5951034

6     threats                0.5932011

7      labors                0.5919635

8  nightmares                0.5896481

9  efficiency                0.5877966

10       lust                0.5722211


These queries resulted in nothing that was surprising. There were words like “sorrow,” “mourn,” and “despised.” As well as words like “lust,” and “devotion.” So there were either words that supported the fact the theme of romance would implement another darker theme or there were words that were nearly synonymous with the ones that were queried. There were even words in the past tense such as “adored” and “loved” which also infer a darker theme to have cared for something so deeply and to apparently not anymore.

Queries Surrounding the Theme of Death

(Bolded are queries I found interesting)

word similarity to “pain”

1           pain            1.0000000

2       reproach            0.6462628

3        despair            0.6454567

4            ire            0.6345604

5         avowal            0.6339717

6       pleasure            0.6308229

7         excess            0.6272061

8      antipathy            0.6248724

9  inexpressible            0.6244705

10       passion            0.6208383        


        word similarity to “suffer”

1     suffer              1.0000000

2       e’en              0.6311585

3   commands              0.6093823

4   governed              0.6010654

5     acquit              0.5831211

6    teasing              0.5805825

7    avenged              0.5777616

8    disobey              0.5741206

9      tease              0.5683390

10 befitting              0.5678391


word similarity to “agony”

1          agony             1.0000000

2         excess             0.6310823

3        languor             0.6223186

4        despair             0.6191462

5  inconceivable             0.6131747

6        anguish             0.6011885

7        turmoil             0.5914349

8         pulses             0.5830113

9     stimulated             0.5782920

10      impotent             0.5715420


word similarity to “sadness”

1        sadness               1.0000000

2    despondency               0.6724228

3            ire               0.6337179

4        turmoil               0.6266054

5     bitterness               0.6227777

6         horror               0.6182509

7         melody               0.6173900

8  inexpressible               0.6005863

9       thrilled               0.5913425

10       anguish               0.5768453


The queries surrounding death and pain returned much of the same, besides the fact that apparently the words “pleasure” and “passion” are somehow related to the word “pain.” Which once again supports the thought of multiple themes. The word “sadness” is even closely related to the word “horror” and “thrilled” which is another interesting combination of themes. Like the themes of romance and death, “sadness” and “thrilled” may be considered polar opposites so it quite interesting to think about the context they may be used to have some type of correlation.


In my first query of the word “love” words 7 and 8 stand out to me to very much. “Mourn” and “despise” are not words that would typically be associated with romance, especially the word “mourn,” as that is typically used to express sadness in a character after the passing of someone. Another oddity I have noticed is the words “loved” and “adored” which are both past tense and are most typically used in a way that would insinuate that either the person or thing that was the object of those words either no longer exist or there was an event to cause those feelings to become past tense. On the opposite side of the spectrum querying the word “pain,” also gives words associated with romance, “pleasure” and “passion.” Which once again supports the conclusion that among these texts the themes of death and romance are used to support one another. Despite this All my other queries turned up little to nothing. All words listed are those that would be expected such as “anguish” with “agony” or even other mismatches such as “suffer” and “avenged.” From a scholarly article speaking on theme hierarchy the phenomenon I am researching seems to be relatively common place among all types of literature. One thing I noticed about my queries which I did not intend to was that it seems as if romance may implement more death and sadness than vice versa. This is something I could research further by creating separate corpora of the different genres and making similar queries I did to this corpus.

Generally a book as a whole will have one overarching theme but also one or more themes implemented throughout. Judging by my queries I find this to be true as although not every word that my queries returned were that of a different theme, there was enough to support the fact that themes such as romance and death can implement one another as it is not expected that they will consistently be one and the same so even the five to ten words I identified shows that there is some kind of correlation and apparently benefit to implementing more than one theme.