Introduction and Question
Whether we notice it or not, whether we pay attention to it or not, the United States Congress has a huge impact on the day to day life of every American. Not only that, but because of the global position and power that the U.S. has, decisions made by Congress can also have enormous effects on international communities as well.
When I found the abundant amount of congressional hearings available at govinfo.gov, I knew that there was a lot of potential in things to explore with digital text analysis in those documents, and I felt like it would be especially interesting to make comparisons between the same committees during different times. Congressional committees stay relatively similar in terms of membership even as a new Congress convenes, so comparing committee hearings in one Congress versus one several years later would likely have many of the same people speaking or involved in the hearing. To me, this meant that any differences in the language between the two Congresses would be due to some external circumstance rather than a change in individual voice. Similarly, most congressmen and women are likely to use professional language so changes in membership might not be relevant between the Congresses anyway.
What major external circumstance greatly impacted Congress in somewhat recent years? Of course there are many ways to answer this question, but the obvious one is the terrorist attacks on September 11th. This event had such a huge impact on Congress that it spurred the establishment of an entirely new committee: the Committee on Homeland Security. Not to mention the tremendous impacts this event had on privacy, security, and national identity in the United States overall.
I expected that 9/ll was a dramatic enough event to cause a change in language used, perhaps across all policies and hearings as a whole, but I decided to focus on something more relevant to the event, and perhaps more important to more people: international relations, for which there is a specific committee in Congress. I wanted to explore how language changed in congressional hearings in regard to international relations before and after 9/11. My overall question is how does the language change, but some specific things I wanted to consider was how does the language about terrorism change? How does language about the Middle East change? What about attitudes towards the Middle East and Islam?
As I mentioned, the September 11th attacks had huge impacts on the United States, including Congressional acts. Other than the development of the Committee and Department on Homeland Security, the passing of the USA PATRIOT Act was also a direct result of these attacks. This act essentially allowed law enforcement to use available tools and technologies to investigate crimes (this part is notable for its privacy concerns), created better information sharing between government agencies, and increased punishments for terrorist crimes. Sacrificing privacy for national security is definitely one of the lasting impacts after 9/11.
There are also many scholarly articles and studies that take a look at the various consequences of the attacks on 9/11. For example, in “Deference to the Executive in the United States After September 11: Congress, the Courts, and the Office of Legal Counsel,” Eric A. Posner takes a look at the theory of deference to the executive specifically after 9/11. The theory refers to the lowering of restrictions for executive power in times of crisis, which is what happened in response to 9/11, and in a way is what continues to happen in terms of situations like sacrificing privacy.
Authors Hutcheson, Domke, Billeaudeaux, and Garland actually also examined language in the aftermath of 9/11 in “U.S. National Identity, Political Elites, and a Patriotic Press Following September 11.” They studied the language of government and military officials as well as the press and their findings suggested that there was a heavy emphasis on American values and ideals while also creating an “enemy” of the U.S. Since these attitudes are reflected outwardly to the public after 9/11, I expect to find similar results in my text analysis.
In “Post-traumatic stress disorder following the September 11, 2001, terrorist attacks: A review of the literature among highly exposed populations,” authors Neria, DiGrande, and Adams conduct research on Post-Traumatic Stress Disorder in nearby communities to the 9/11 attacks. Their findings indicate that there were higher rates of PTSD in the nearby areas, further showing one of the consequences of 9/11, a consequence that could potentially extend to politicians and members of Congress who were in the D.C. area during the attack on the Pentagon.
It’s clear that 9/11 had a tremendous impact on the United States, including the U.S. government, with the passing of various legislation and an increased presence in the Middle East, but it’s less obvious how that change occurred exactly and how newer mindsets are reflected in the language of Congress. I’m hoping that by taking a look at the change in language, something can be revealed about the international mindsets in Congress following 9/11. At the very least, I think that the language will verify what some people, including myself, expect of Congress in terms of this event: a harsher view on the Middle East and terrorism.
Corpus and Model
Another reason I decided to focus on the Committee on Internal Relations is that although it might be more relevant to look at the Committee on Homeland Security, it didn’t exist before 9/11, so there would be no accurate basis of comparison. Therefore, I chose to look at the hearings from the Committee on International Relations during the 106th Congress (1999-2000) and the 109th Congress (2005-2006). I couldn’t choose to look at the 107th Congress (2001-2002) because many of those hearings would have occurred before 9/11, skewing the results. Unfortunately, I also could not look at the 108th Congress (2003-2004) because this committee either did not convene during that time or the hearing weren’t available through the website I was using.
Logistically, I went through an exceptional amount of copying and pasting to obtain my bodies of text. There were around 145 hearings (accumulating to approximately 2.5 million words) for the International Relations Committee in the 106th Congress and about 45 (approximately 1.5 million words) for the 109th Congress. The 106th hearings had plain text versions, but they were not downloadable so I copied them manually into plain text files and removed excess information at the beginnings and ends of the documents (member names, “image not available in this format,” etc.). The 109th hearings were unfortunately only available in PDFs which made the copying and pasting process a little less smooth. I removed the same information from those documents as well.
Another change I made in order to prep my documents for Word2Vec text analysis was to adapt two word phrases that I wanted to be able to analyze into essentially one word so that it could be entered into the code. I did this for “Middle East” and “United States,” just adding an underscore between the words. It was actually during this process that I made my first observation: “Middle East” appeared significantly more in the 109th hearings despite this set having 100 fewer documents than the 106th hearings.
After that, I deemed my corpuses ready to be trained. Using the R code provided through the Literature and Digital Diversity class, the files are read, combined into one file, cleaned up to be easily read through, and then “trained” by the code. I’m not a computer scientist, but from what I understand, the code looks through the cleaned file and assigns a vector to each word. Then, when working with this model, the vectors for the words are compared to each other based on how the words are used in context, what the surrounding words are, etc. So, when the user enters a word, or a combination of words, the code will output the closest words.
The queries I tried in order to determine associated words were: Islam, Muslim, terrorism, terrorist, terror, war, conflict, attack, United States (united_states), Middle East (middle_east), Iraq, Arab, aid, protect, security, kill, peace, violence, tragedy, war – cold, war – ii, war + Middle East, and pray – Christianity + Islam. The last query, a set of three words, is aimed to get at an analogy, Christianity is to pray as Islam is to blank. I wanted to determine what Congress perceived as the most common activity of the religion. Because these are a lot of queries and not all of them yielded significant or interesting findings, I’ll just touch on the ones that did.
Note: I did try to run my text through the Word Counter tool because I wanted to see if Middle East truly did appear much more in the 109th hearings, along with any other possible differences, but unfortunately I kept getting an error on the page even when my file was under 10MB.
If you look above, you can see the comparison of the 15 closest words to “Islam” in the two sets of hearings. These actually weren’t very different; words like “fundamentalist” and “extremism” appear in both, though there are more variations of “fundamentalism” in the list from the 109th hearings. The two most notable words for me were “stigmatized” in the 106th list and “murderous” in the 109th list. “Stigmatized” is interesting because it suggests that even in 1999, people had an understanding that Islam was unfairly misunderstood, even before 9/11 and the Islamophobia that followed. On the other hand, “murderous” appearing in the 109th list indicates that Congress lost some of that understanding.
The next interesting query was “terrorism,” shown above. The main thing to note here is that the 106th list features several drug related words that don’t appear in the 109th list, perhaps indicating that terrorism surrounding drugs was more of a concern before 9/ll, whereas “Iran” and “nuclear” make it onto to the 109th list. Another noteworthy addition to the 109th list is “transnational,” showing that international terrorism is more of a concern of Congress following 9/11, which makes sense.
Similarly, “terrorist” indicated somewhat of a shift in focus. While “osama” and “hezbollah” appear in the 106th list, the 109th list features three different spellings of “hezbollah” as well as having “jihadist” appear as the closest word. Other notable additions to the 109th list are “attacks,” “Alqaeda,” Islamist,” and “Hamas.”
I don’t want to spend too long on these queries, but I just wanted to point out that for “terror” the 109th list continues to feature more words focused on Islamic extremism, in contrast to the 106th list which has more generic words.
The only thing I want to point out for the “attack” query is that “towers” and “terrorist” appear in the 109th list, likely evidence that the 9/ll attacks are referenced even in 2005 and 2006.
My “war – cold” and “war – ii” didn’t turn up very interesting results, but something quick to point out in the “war + Middle East” combination is that “Iraq” is the closest word after the words in the query for the 109th list, and the words “terror” and “terrorism” are included in that list as well.
I had expected queries like “United States” and “Middle East” to have interesting results, but the lists were pretty generic and similar between the two sets of text. My expectation was that after 9/11, the U.S. would be more associated with patriotism and security while the Middle East would have more negative associations, but they either didn’t or the closest words contextually are not the ones that would indicate that.
Unfortunately, my query of “pray – Christianity + Islam” outputted seemingly random lists for both sets of hearings so I don’t think I was able to get at the analogy I was interested in.
Overall, I think my queries did show that there was at least more of a focus on terrorism and the Middle East after the 9/11 attacks. I think there’s also some evidence for a harsher attitude towards the Middle East and Islam in addition to the shift in focus, particularly thinking about the word “murderous” appearing in the 109th hearings list for “Islam.”
I was definitely limited in not having something like the Committee on Homeland Security to compare to before the 9/11 attacks, as well not having the Committee on International Relations available in the years immediately following. However, it might be interesting to look at just Homeland Security and ask questions about the language used there, or maybe compare how that language has changed over time.
I think the Word2Vec process is a really cool and helpful tool for text analysis and it definitely has a lot of potential in the field of politics as a whole.