Introduction
Soon after Film was introduced in the late nineteenth century, film critique emerged. The first papers to include film critique were The Optical Lantern and Cinematograph Journal, followed by the Bioscope in 1908. In the modern age, with all the technology that currently exists, a lot of the film critique is located on sites such as IMDB, which stands for Internet Movie Database.
For this project, I was looking through Kaggle and found a dataset of fifty thousand movie reviews, half of them being positive and the other half being negative. When I saw the title of the dataset, I immediately downloaded it to look through because it seems like valuable information that could yield some useful information towards reactions to movies. The sentiment analysis factor was not immediately apparent from the Kaggle post, so I was excited to see that I could sort between positive and negative reviews which helped in creating two different text files, in order to do a comparative analysis by training different models. Unfortunately, the original dataset was too large to upload to RStudio and train a model, so instead I chose about fifteen thousand positive reviews and fifteen thousand negative ones. There is no way of telling if this has skewed the results in any way, but that could be something that I might test with in the future. Before talking about the results from training these two models and comparing them, it is important to explain what “training the model” actually means and what it helps us find out.
Word Vector Model Training with the R Programming Language
Using RStudio, we are able to use the R Programming Language and include any libraries that are made for R, where libraries can be interpreted as packages with extra capabilities included to help the user achieve their goal with less code and manually doing the math. The code that I needed to use was already included in the assignment files that were given by the Professors, and what I needed to do was upload the files that have the corpus of texts and reference them in the code. Using my corpus of movie reviews, I first placed the positive reviews in a single .txt file alone and the negative ones into another. Once the corpus is uploaded, you can train the model by running a couple lines of code, and then use queries to analyze the corpus (which include seeing what words are most similar to other words within the corpus). This process was done twice for this project since my goal was to compare the positive and negative reviews.
Research Question
As the title suggests, my research question for this project is “How is the Intensity of Language in Positive and Negative Movie Reviews Shaped Differently, especially in Different Movie Genres?”. After I found the dataset I wanted to work with, my first draft of a research question was “How do positive and negative movie reviews differ in terms of intensity in their language?”, but after querying both datasets and visiting Office Hours I decided that including genres in my queries since they offer great insight on why the review has the intensity it has.
Result of Positive vs Negative Queries
First, I tried to limit my queries to non-genre specific queries to get a general overviews of the difference between positive and negative reviews. How I did this was by seeing what words were similar to the following adverbs: “entirely”, “extremely”, “incredibly”, “largely”, and “shockingly”.
Entirely
Negative Positive
For the adverb “entirely”, it becomes evident that the words that are associated with the adverb are very different between positive and negative reviews. In negative reviews, “entirely” is associated with other adverbs that are mostly negative such as “utterly”, “terribly” and “needlessly”. Using a normal editor, I checked to see how many times the word “entirely” occurs in the negative reviews and that number ended up being 330 occurrences, while for the positive reviews it occurs 310 times. A difference of twenty occurrences does not give us as much information as we had liked, so relying solely on what words are similar to “entirely” we can deduce that negative reviews associate “entirely” with many negative words such as the adverbs mentioned above and “unnecessary”, while the positive reviews seem more moderate in tone.
Extremely
Negative Positive
Unlike the results for “Entirely”, the results for “Extremely” were much more apparent. Firstly, the words associated with “extremely” in the negative reviews had a much more negative tone and this time not only limited to adverbs, but words that are commonly associated with bad movies such as “boring” and “unengaging”. Also, this time the words associated with “extremely” in positive reviews had a much more positive tone that matches the intensity of the words in the negative reviews. These words include but are not limited to: “amazingly”, “exceptionally”, “exceedingly”, “imaginative”, “intelligent” and “entertaining”. It is worth noting that the words “amazingly” and “exceedingly” are also present for the negative reviews, but with a lower score to their positive review counterpart indicating that they are less frequently used together. Similar to “entirely”, the difference between the occurrence of the word between negative and positive reviews is negligible, with it occurring 660 times in negative reviews and 656 times in positive reviews.
Incredibly
Negative Positive
I found the results of “Incredibly” to be somewhat similar to those of “Extremely” with the exception of the intensity of both kinds of reviews to be even larger. The top words associated with “incredibly” in negative reviews such as “insufferably”, “excruciatingly”, “irredeemably” and “uninspired” are very strong and represent a deep intensity in the language, as do the top words in positive reviews such as “amazingly”, “heartwarming” and “breathtakingly”. Unlike the previous adverbs, the word count of “incredibly” in positive reviews versus negative ones can actually help us understand the intensity in language. In negative reviews, the word is mentioned 440 times while in positive reviews it is mentioned 305 times, a difference of 135. This difference, coupled with the intensity of the negative review table above gives us the sense that negative reviews are on average more intense than positive reviews with the use of “incredibly”.
Largely
Negative Positive
In negative reviews, the word “largely” was not used too much to refer to critiques of the movie itself but its contents, especially in the positive reviews as shown through the mentioning of churches and guillotine. Aside from this, the negative reviews do relate more to an actual critique of the movie with words like “undistinguished”, “unimpressive” and “regrettable”. Occurrences of “largely” in positive and negative movie reviews only differ by 20 which doesn’t tell us much about the differences between them.
Shockingly
Negative Positive
Finally, I found the results for “shockingly” to be very similar to the results for “incredibly”. Both have very intense words for negative and positive reviews and there is no mix-up between them—positive reviews do not have any negative words and negative reviews do not have any positive words, although the positive reviews have less intense words discussing the actual movies. The word “shockingly” occurs in negative reviews 39 times and in positive reviews 20 times, resulting in a negligible difference of 20.
Result of Positive vs Negative Queries with Genres
For querying in relation to genres and intensity of language, the way I chose to do that was to query the words “extremely”, “very”, “shockingly”, “largely” and “incredibly” together along with the specified genre. The genres I chose to focus on are comedy, action, romance, and the social issues racism and sexism. The results that I got were the following:
Negative
Positive
To me, the most important results from both models are the genre-specific ones displayed above for negative and positive reviews. Surprisingly, including genre within the query can make analysis of the dataset much clearer and informative. Just like you would expect, each genre has its own adjectives such as comedy having “humorous”, romance having “heartwarming”, action having “tense” and the social issues their own adjectives. One thing that I found very interesting about the results is that generic genres like comedy, action and romance are described on a wholistic basis with negative reviews mentioning the words “boring” and “unengaging” and positive reviews mentioning “entertaining” and “touching”, while movies focused on social issues are described based on how the issue is handled within the story, with among the words mentioned being “insincere” and “offensive” for negative reviews and “resilience” and “disturbing” for positive reviews. Surprisingly, in terms of intensity, both the negative and positive reviews are very intense in the words they use with opposing words such as “atrociously” and “remarkably”.
Conclusion
In conclusion, while I originally thought that negative movie reviews’ intensity would vastly outweigh the positive reviews, by using word to vector analysis using the R programming language I have learned much more about the subject matter. Although with some queries the negative reviews seem to have more intense language, the results are not conclusive as the difference is not too big. The data that did turn out to be useful was the different genre queries and how intensity was shaped in each one. The three generic genres chosen, comedy and action and romance, had very predictable results. Comedy’s intensity focused on the humor and the quality of its presentation. Action’s intensity focused on how engaging, tense, and paced the movie is. Finally, Romance’s intensity focused mainly on emotion and whether the movie was “touching” and “heartbreaking”. On the other hand, the two genres relating to social issues share the same characteristics in their respective intensity of language but differ from these three generic genres. The genres tackling social issues have reviews that mostly focus on how well the issue is presented in the movie, not how entertaining the movie is. A word present in both is “resilience” which refer to victims of racism and sexism respectively. Thus, at a general overview, positive and negative movie reviews have a very similar level of intensity in their language but after diving deep into the genres of the movies it becomes evident that the intensity is shaped differently in each of the different genres.
Sources
Battaglia, James. “Everyone’s a Critic: Film Criticism Through History and Into the Digital Age.” Digital Commons @Brockport, digitalcommons.brockport.edu/honors/32/.