As a New Yorker, being a student in Boston often means the compromise of many sports loyalties. Luckily I’m not a big enough sports fan to really mind, and after the World Series, I decided to join in on the fun and attend the Red Sox World Series Parade. I quickly noticed that all the Red Sox fans seemed to have one unifying mantra: ‘F**k the Yankees!’ I was surprised because 1) The Red Sox had just beaten the Dodgers, not the Yankees, and 2) Yankees fans don’t take the rivalry nearly as seriously. If anything, the Red Sox just feel less relevant in New York than the Yankees feel in Boston. I decided to back up this claim by analyzing the subreddits r/NYYankees and r/redsox, to answer the question, which fans mention the other team more often and what is the discourse surrounding these mentions? I also addressed the additional question, do the Red Sox or Yankees fans reference past players more often? I ask this question in regards to a stereotype that Yankees fans depend on the past glory of the team rather than their present achievements. I also ask, what is the discourse surrounding discussions of Babe Ruth? Ruth played for the Red Sox before he played for the Yankees and brought “The Curse of the Bambino” upon the Red Sox, so there should be some difference in the way he is discussed. I thought Reddit forums would be the best place to do this form of analysis, because comments are usually longer, and Reddit discourse tends to be extremely opinionated due to the anonymous nature of the site. I used rStudio to analyze word vectors that show up in these forums, which I will explain below.
How I Got My Corpuses
I used a redditextractor code to get enough text from the r/NYYankees and r/redsox threads. The Yankees corpus is about 3 million words from 700 days, or about 2 years. The Red Sox corpus is about 6.4 million words from 1000 days, or a little over 2 ½ years. The Yankees corpus has less words because the server was particularly slow and I drew less days and 300 comments per thread (as opposed to 500 for the Red Sox thread) to make sure the code would run. Additionally, there is a second Yankees thread, r/yankees, that draws a good amount of fans away from the official r/NYYankees thread. This also contributes to the disparity in corpus size.
A Brief Explanation of Word Vectors
I analyzed the corpuses for word vectors by using word2vec, an application created by Northeastern University professor Ben Schmidt. This application analyzes large corpuses, the smallest possible being around 800 thousand words, for commonly occurring words and words that appear in their context. It can find ‘clusters,’ or groups of words that are used often and found together. This tool is enormously helpful when examining common trends and themes that run throughout an entire corpus. It can also be used to focus on specific words and dig deeper, examining two or more words’ relationship to each other, and word neighborhoods of a specific word.
The first thing I did was use a simple word count tool to find the amount of time ‘Yankee’ is found in the Red Sox corpus and how many times ‘Red Sox’ is found in the Yankees corpus. The Yankees fans mention ‘sox’ 4,288 times, ‘red sock’ 17 times, and ‘sawk’ 8 times, for a total of 4313 mentions. Red Sox fans mention ‘yankee’ 11,390 times and ‘yanks’ 932 times, for a total of 12322 mentions. When I accounted for the disparity in size of the corpuses (the Red Sox corpus is a little over twice as large as the Yankees corpus), the Red Sox mentioned the Yankees about 1.36 times as often the Yankees mention the Red Sox. This first preliminary search seems to indicate that the Red Sox care more about the Yankees than the Yankees do about the Red Sox. Next, I began to analyze common word clusters within each corpus.
Common Word Clusters
Now that my preliminary question had been answered, I decided to look into word clusters that pop up within each corpus. I started off with r/NYYankees and found several expected clusters. These discussed injuries, steroid use, and coaches and managers paired with experience. I was not surprised to find these groups or others similar to them, but a few more interesting clusters showed up, as well.
The first cluster is entirely made up of the names of retired Yankees players (headed by ‘ruth’ and ‘babe’), which means there must be a common thread of discourse around former players as expected.
Two of the threads also contained the word ‘racism’ with one including ‘rich’ and ‘assholes.’ I thought this could be an interesting thread to follow and proceeded with this.
In Depth Research
The Terms ‘Boston’ and ‘Sox’
I investigated words similar to ‘Boston’ and some interesting results were ‘beating,’ ‘underdogs,’ ‘massholes,’ and ‘cheaters.’ I then investigated ‘sox’ and found some light similarity to ‘traitor,’ ‘f**k,’ and ‘massholes.’
I then decided to look up the analogy between ‘Boston’ and ‘masshole’ and see what Yankees fans would similarly correlate with ‘york’ or ‘yankee’ (just ‘york’ because using two words messes up the function).
I got these results:
I then looked up which words are closest to ‘york’ and not to ‘boston’ with the space between two things function:
These results were fairly interesting, because it suggests that to Yankees fans, ‘masshole’ is to ‘boston’ what ‘majestic’ is to ‘york.’ Looking up words that are similar to ‘york’ and not ‘boston’ also turned up ‘majestic’ and ‘authentic’ again. I found this pretty funny because it appears to be in line with a stereotypical New York attitude of superiority. If you look up ‘new yorker’ on urbandictionary.com, the top definition is ‘According to them they are better than you, me, and anyone else who is not from New York City.’ While this is not true of all New Yorkers, it is certainly a prevailing stereotype that this particular analogy humorously supports. I was also interested to find among words that are similar to ‘boston’ and not to ‘york,’ ‘trolls’ was listed. ‘Trolls’ are people who post provocative or offensive messages to spur heated debate on internet message boards.
I then looked up ‘masshole’ to ‘boston’ with ‘yankee.’ Apparently ‘banana’ is one of the top results, which made me doubt how accurate this tool is, but gave me another good laugh.
I then moved on to looking up words similar to racism, and found that Boston was included in the list. I looked up words closest to ‘boston’ + ‘racism’ and found that ‘massholes,’ ‘cheaters,’ and ‘asians,’ which also showed up in the racism search. I decided to go directly into my corpus and found that when racism is mentioned, it often involves accusing Boston of racism. Here are a few quotes I found:
|“Cool. Only the bind defenders diagree yup You know you’re right. He’s had so many headlines accusing people of being racist and denying racism at Fenway I ended up just associating him with racism.”|
|“F**K THE RACISTS AND THE RED SOX Massholes”|
|“Seriously our r/baseball roast thread was about championships and legendary players. The Red Sox roast was Pablo Sandoval and the fact that their city and fans are racist Massholes.”|
|“You claim it’s a place for racism and when I say show me some should be easy considering how it’s all that we talk about according to you you can’t.”|
|“Doesn’t he HATE Boston and the people. Haven’t they been extremely racist towards him?”|
Many instances linked Boston with racism or involved defending New York from racism charges. Part of the fanfare of hating another team is painting its players and fans negatively, so it makes sense that fans of one team will accuse fans on the other of immorality such as racism, particularly because Boston has a long-held stereotype for racism. Some Red Sox players are actively trying to combat this stereotype (https://www.bostonmagazine.com/news/2018/03/20/david-ortiz-boston-racist/), but it is a difficult reputation to shake.
References to Famous Past Players
I then moved on to investigating mentions of famous past players, starting by finding the amount of times a few famous past players were mentioned.
This does not seem like all that many mentions out of about 3 million words, but I continued with my analysis.
I started with words closest to ‘ruth.’ I found the years ‘1961’ and ‘1927’ mentioned, two of the years the Yankees won the World Series. Many other past players were mentioned as well, and the terms ‘reincarnation’ and ‘clout.’ ‘Reincarnation is interesting because it suggests that there is a theme of wistfulness for the past and retired legendary players, and ‘clout’ has been repurposed as an internet slang term referring to power, fame, and influence.
|“Bring back the Bambino!”|
|“Yes F**k the Red Sox! Sorry no way to make you look like an idiot here What prominent player would you absolutely not want on your fantasy team? yea exactly. drew was a gamble and a decent one at 5m… Who is literally the best player of all time? The sultan of swat! The king of crash! The colossus of clout! The colossus of clout! BABE RUTH!”|
|“Trout is basically the reincarnation of Mickey Mantle.”|
|“BRING THE YABKEES INTO THE LAKE OF REINCARNATION”|
|“we so obviously need to sign David Freese and hope he magically turns into Babe Ruth again in the postseason”|
When I looked up ‘reincarnation,’ there were actually only 5 mentions within the corpus, but these quotes seemed to reflect a wish for a return to older days. Some of these comments compared new players to old players or simply wished that the old roster could be reincarnated. The second comment uses Babe Ruth as an example of the Yankees’ greatness. It first insults the Red Sox and one of its players, then backs up this claim by referencing Babe Ruth and his past clout, a stereotypical defense used by Yankees fans.
Common Word Clusters
The first thing I noticed was several word clusters related to the Yankees and their players, often with negative terms such as ‘hate,’ ‘annoying,’ and ‘suck.’ I also noticed a cluster concerning racism, and wondered if this cluster would be related to the Yankees and New York similarly to the way it was related to Boston in r/NYYankees. Lastly, I saw a prevalence of clusters related to food items, which I wasn’t necessarily expecting and didn’t find in r/NYYankees.
In Depth Research
The Terms ‘York’ and ‘Yankee’
I searched words closest to ‘york’ and didn’t find anything negative, mostly references to other regions and to ‘transplants’ which is often used to refer to people who have moved from one city to another. When I looked up ‘yankees,’ the results quickly became more negative. The results included ‘booing,’ ‘fairweather,’ ‘bandwagon,’ ‘obnoxious,’ ‘classless,’ ‘cheapie,’ ‘insufferable,’ ‘gloating,’ and ‘invading.’ These results are drastically more negative than those I returned when looking up ‘sox’ in the Yankees thread. This could point to a heightened perception of rivalry from Red Sox fans.
I then looked up words closest to ‘yankee’ but not to ‘sox.’ This turned up more negative results, some highlights being ‘douche,’ ‘dumbasses,’ ‘spineless,’ and ‘entitlement.’ I tried to flip the search but unfortunately both ‘sox’ and ‘red sox’ did not turn up logical data (sox is too nondescript while red sox is two words).
I looked up the analogy of ‘f**k’ to ‘yankee’ with ‘sox,’ and found one of the terms listed was ‘leggo.’ So the equivalent of “f**ck Yankees” is “leggo Sox.”
I looked up words closest to ‘racism’ and found many words that were in the r/NYYankees search, such as ‘minorities,’ ‘attitude,’ and other words commonly paired with racism. I then looked up ‘racist,’ and one of the first results was ‘stereotype.’ This seemed to point towards a self-awareness of the stereotype for racism that Yankees fans point out, and I searched the corpus for quotes by searching ‘stereotype’ and looking at surrounding entries:
|“why you tryna exemplify the racist red sox meme stereotype by invoking racist catchphrases against a player that’s been called the N word at fenway.”|
|“It should be noted that CAH [Cards Against Humanity] is a comedy game. People are going to make jokes and if you want to make a joke about a baseball team not liking minorities the Red Sox are generally going to be the brunt of that joke for integrating last. Nice”|
|“There is no doubt about the Red Sox racist history and how those old wounds were opened by shitty fans this year but that doesn’t represent the fan base and definitely doesn’t represent the game day experience. We loved Papi like we love Pedey we loved Mo Vaughn like we loved Wakefield. It’s a loud select few. I’ve never seen anything racist in my 26 years of going to Fenway. Don’t let stereotypes of a few shitty fans as there are everywhere misconstrue your idea of Sox nation”|
These quotes involve fans either reprimanding others for racist comments or attempting to counter the stereotype with acceptance. The majority of commenters seem to be aware of the stereotype and upset at its longevity. One of the comments mentions that the Red Sox were the last team to integrate (while the Yankees were the first) which is part of the history that has led to this stereotype, along with the more recent actions of some of their fans. The last quotes argues that this is not representative of the fanbase.
References to Famous Past Players
I next looked up the names of famous Red Sox players to see how often they were mentioned. Surprisingly, the Yankees only mentioned their past players about 13% more often than Red Sox fans mentioned their past players (accounting for the different sizes in corpuses).
Additionally, r/redsox mentioned Babe Ruth 152 times, which is less than half as often as Yankees fans mentioned him (as expected). They mentioned ‘bambino,’ as in The Curse of the Bambino, 28 times.
When looking up words similar to ‘ruth,’ I found a mix of Red Sox and Yankees players returned. The high correlation with Ted Williams, the Red Sox best hitter of all time, was an interesting find. When looking up words similar to ‘bambino’ and ‘yankee,’ many emotionally charged words were returned, such as ‘childhood,’ ‘agony,’ ‘heart,’ and ‘heartbreak.’ References to the longevity of the curse were included as well, with ‘1918,’ ‘generations,’ and ‘grandfather.’
I looked up the equivalent of ‘ruth’ to ‘yankee’ for ‘sox.’ The highest result, as I suspected, was ‘williams’ and ‘ted.’ This seems to indicate that Red Sox fans talk about Ted Williams essentially as their team’s Babe Ruth.
I decided to go directly into the corpus to find mentions of the Curse of the Bambino.
|“I am going to assume 1919 when we sold our best contracts to NYY and the Curse of the Bambino began. Frazee fucked our team for quite awhile with that.”|
|“Ortiz meanwhile has become practically a sainted figure in Boston not only for his role in helping the Red Sox break the Curse of the Bambino and help win three championships”|
|“F**k the Bambino and everyone who looks like the Bambino we don’t need him no more.”|
Obviously there is still some resentment over the trade of Babe Ruth and the 58 year drought of World Series wins that ensued, but with the Red Sox’s recent success, many fans feel that the Curse is finally broken and that they no longer need Ruth.
Bonus Finding: Food References
I was interested to find that clusters about food items repeatedly popped up within r/redsox. Apparently Red Sox fans are particularly passionate about their food – I decided to do some fieldwork and texted a Red Sox fan friend to get his input:
Sure enough, ‘sausage’ is one of the listed food items in a food cluster. Fenway is known for its Italian sausages, and has captured the hearts and stomachs of many Red Sox fans.
|Hope you had a good one! go early soak in the atmosphere around the park before the game while you eat a sausage you bought|
|Outside Fenway park by gate C is a little stand called the sausage guy guaranteed best sausages around.|
|My go to Fenway diet is a sausage with peppers when I get there clam chowder in the 3rd and a papa Gino’s pizza in the 7th. You can throw in whatever drinks or snacks you like|
I found many supporting comments within r/redsox, where fans raved about the sausages. This indicates that local food is a strong aspect of Red Sox fanfare.
Looking into the Yankees/Red Sox rivalry, one of the most famous rivalries in sports history, led me to wonder where the animosity and aggression between sports fans comes from. I found a study by Yale University scientists, “When outgroup negativity trumps ingroup positivity: Fans of the Boston Red Sox and New York Yankees place greater value on rival losses than own-team gains.” This study asserts that “pleasure from a powerful rival’s losses can outstrip that from gains of one’s own group (Studies 1–2), and these patterns extend into domains not immediately relevant to the competition (Studies 3–4)…Indeed, fans of the rival teams frequently valued outgroup losses more than ingroup gains, and this effect was particularly strong when one’s own team was behind in the rivalry.” Essentially, a Red Sox fan may be happier when the Yankees lose a game than when the Red Sox win a game, taking pleasure from the other team’s pain. This concept is called “schadenfreude,” and is found most often in “diehard” fans. This is what fuels rivalries; it gives fans a chance to receive pleasure both from their team winning and the other team losing. Winning a rivalry game is an intense experience of pleasure for fans, and explains the longevity of the nearly hundred year old rivalry between the Yankees and Red Sox. This leads to the kinds of negative and sometimes crude discourse found on these Reddit forums and between fans in real life, because it’s fun to hate the other team.
So, do the Red Sox mention the Yankees more often than the Yankees mention the Red Sox? Yes, The Red Sox do mention the Yankees more often, but not by as high an incidence rate as I might have thought, at about 1.25 times more often. At first I thought the difference was much higher because I searched ‘red sox’ within r/NYYankees, but that left out when people referred to them simply as ‘sox.’ I then looked up ‘sox,’ which ended up giving a more accurate number of mentions. I also included spellings such as ‘sawks’ and ‘yanks’ to be as accurate as possible.
What about the discourse surrounding these mentions?
I found that Red Sox fans are far more vitriolic in the way they speak about the Yankees and their fans than the other way around. When I searched words closest to “yankees” in r/redsox, I found nearly a dozen negative words, whereas when I searched “sox” in r/NYYankees, I only found about three negative words. The Red Sox were second best for 54 years, ever since Babe Ruth was traded to the Yankees and brought on “The Curse of the Bambino.” Yankees fans used to chant “1918” during games, the date of their last World Series win until the Curse was finally broken in 2004. This kind of history explains why the Red Sox seem to hate the Yankees so much, because they seemed to be directly responsible for their unsuccessful stretch that lasted over half a century.
Do the Yankees mention past players more often than the Red Sox do?
Yankees fans are often accused of riding on the coattails of their past success because they haven’t been doing as well in recent years. However, when I searched the names of five famous past players from both teams, Yankees fans only mentioned their players about 13% more often than Red Sox fans did. While this is still a notable difference, it is fairly small. This could simply be an indication that fans who frequent Reddit take pride in their team history and their own knowledge of it, and will discuss this history at similar levels amongst themselves.
How does each team discuss Babe Ruth?
Yankees fans discuss Babe Ruth with reverence and use his legacy as a defense of the Yankees’ greatness as a team. He’s discussed amongst other past players and remembered fondly and wistfully. Red Sox fans discuss Ruth in regards to the Curse of the Bambino, and discuss Ted Williams, their best hitter of all time, similarly to the way Babe Ruth in the context of the Yankees.
I also found that Yankees fans will often point out a reputation for racism amongst Red Sox fans and the city of Boston, and that Red Sox fans will often defend themselves within their own forum against these claims. Red Sox fans also seem to be avid foodies, and discuss local food more frequently than Yankees fans do.
A Few Words
Red Sox and Yankees have one of the oldest and best known rivalries in sports history, and it is an integral part of both team’s fanfare. As a New Yorker living in Boston, it’s been enlightening for me to experience and appreciate Red Sox culture and how it is intertwined with Boston culture and history. The Red Sox may mention the Yankees more often than the Yankees mention the Red Sox, but it is because they are finally experiencing success over the Yankees after many years of living in their shadow. The Yankees may mention past players more often, but that is because they have had the most successful team in the history of baseball, and that is something to be proud of. Both teams have rich histories and loyal fanbases, and I’m happy to have had the unique opportunity to witness both.
“New Yorker.” Urban Dictionary
Lehr, Steven A, et al. “When Outgroup Negativity Trumps Ingroup Positivity: Fans of the Boston Red Sox and New York Yankees Place Greater Value on Rival Losses than Own-Team Gains.” Journal of Research in Crime and Delinquency, 19 July 2017, journals.sagepub.com/doi/abs/10.1177/1368430217712834.
“Baseball Pitcher on the Mound in a Park.” The Effect of Athletic Participation on the Academic Achievement of High School Students – DRS, repository.library.northeastern.edu/files/neu:m040tn06r.
“Free Public Domain Images.” Free Images – Millions of Public Domain/cc0 Photos and Clipart, free-images.com/display/chris_carter_pawtucket_red.html.