(Codebook is the third sheet on the bottom of the file.)
Opening Remarks
This critical introduction functions as a preliminary document on the goals, aims, and potentials (both possibilities and challenges) associated with my research dataset, tentatively titled “Binge-Watching, Audience(s), & The Body.” In this initial section I will provide a descriptive overview and commentary on the current state of the dataset and its corresponding data/metadata. Further, I will provide an overview of my data collection process and the advantages and disadvantages of using Twitter without the API functionality. Before I delve into these areas, I will provide a brief set of comments discussing my interest with television studies and audience participation analyses and the motivations undergirding the dataset.
Comments on Field of Interest
Television and seriality studies are generally neglected within the field of Literary Studies. Often these areas of inquiry are relegated to the domains of communication studies or film and media scholarship more broadly. There is, I argue, a productive yet untapped overlap that can be discerned between these seemingly disparate fields through the use of literary and digital humanities methodological practices in the study of serial formalism and audience habituation. This dataset is specifically oriented around an under-commented on, and generally untheorized element, of literary and television scholarship; that is, binge-watching/reading. In this dataset, I seek to address a potential domain of interest for future scholars of inter-disciplinary serial formalism through the collection of tweets that discuss bingeing in relationship to popular contemporary serial programs. This work is, as it stands, in its preliminary stages and is but a proof-of-concept for future research on binge-watching/reading.
Section I
Dataset Description & Collection Procedures
In this section I look to outline a broad overview of the dataset and its various metadata fields. Each metadata field will be divided into its own section and will be given a set of brief remarks on its overall functionality and utility in the dataset. I will also provide notes on my use of Twitter in-browser UI and search function for research purposes. As a provisional comment: during my time working on this project, I did not have access to Twitter’s API and therefore acquired data via in-website search functions. Although this limits the scope of this project and the number of tweets that can be collected, I also believe it shows that this data does not need to be collected through the API. That is, the data can be acquired by hand in an effective (albeit quantitatively limited) manner. An upshot of this “by-hand” collection method is that the researcher (myself, in this instance) is required to read the tweets and can thereby generate insights through the active process of reading and collecting.
The dataset is a collection of 111 Tweets collected using Twitter’s search functionality. The parameters used for this collection process are relatively simple but provided me with ample data. To establish the principles of inclusion and exclusion for the dataset I organized the content of the Twitter search feature around two elements: key terms and hashtags. For the dataset, I isolated two terms that would be input into the search bar on Twitter’s UI. The two terms were: binge and bingeing. Several of my colleagues voiced their concern over the use the terms binge and bingeing to isolate the specific phenomenon of “binge watching”. Specifically, how would the use of these two terms mitigate the inclusion of non-Television related discussions of binge(ing) occurring on Twitter; that is, bingeing in relationship to, for example, excess food consumption? This is an excellent question and the guiding logic undergirding the second element of the search function: hashtags. To isolate television series generating ample conversation on social media, I used the review aggregate website Rotten Tomatoes to determine the most popular programs across cable and streaming services. Using the most popular programs as a baseline, I used a random number generator to then isolate the four series (and their corresponding hashtags) for the search function. The four selected programs were Peacemaker, Bridgerton, Euphoria, and Yellowjackets. Peacemaker and Euphoria are weekly programs released on HBO and HBO Max. That is, the two series were simultaneously released every week on HBO’s premium cable service and their online streaming platforms. Viewers thereby had the option to watch the programs weekly (as they are released) or to wait until their season had ended to “binge” the program. Yellowjackets, released on Showtime and their corresponding streaming platform, also offered a similar release and viewing cycle as Peacemaker and Euphoria. Bridgerton differs from the three other programs as it was released through Netflix and its entire season was “dropped” (or released) all at once. The four hashtags used, then, were #Peacemaker, #Euphoria, #Bridgerton, and #Yellowjackets. The two key terms and the four hashtags allowed for several variations in searchability. An example of the language used in the Twitter search function can now be broken into these different variations.
After each corresponding value was input into Twitter’s search bar, I sorted the results using the “Latest” tab as opposed to the “Top” tab. The decision between “Latest” and “Top” was an attempt to use a clearly reproducible option for future researchers. As it stands, the analytics, metrics, and data that undergird the “Top” tab are unclear. That is to say, the standards and metrics behind what makes a top tweet “Top” are deeply unclear. The “Latest” tab is more easily reproducible as its undergirding logic is clearly explicable: newer tweets that meet that outlined search criteria are located towards the top of the screen and, as one scrolls down the list of tweets, a researcher will encounter progressively older tweets. Although a researcher using the outlined search criteria above will encounter newer tweets simply due to the temporal delay between their project and my own, the researcher will still be able to find the tweets that I used in the same order that I used them. There is, then, a modicum of reproducibility and accountability in the methodological practices detailed above. In the following section, I will detail the metadata tabs used in this dataset and the corresponding data housed within.
Metadata
Assigned Number: Assigned Number indicates the number assigned to a particular tweet. This number is given to the tweet as it is input into the dataset. Number 33, as an example, is the 33rd tweet collected in my process. Researchers looking to share data may easily select a large portion of tweets based on the numbers associated with them. If researchers wanted to use a random number generator to randomly select tweets from the dataset for a sentiment analysis, researchers are easily able to accomplish this task using the number field.
Tweet: The tweet field contains the given tweet selected using the parameters described in the previous section. These tweets either contain the term “binge” or “bingeing” and use the corresponding hashtag associated with the programs selected. A limitation of this dataset more broadly is that if a Tweet is quoting or responding to a prior tweet this medium-specific nuance is lost.
Series: As discussed in more detail above, the series metadata field is populated by one of four programs depending on the associated Tweet: Peacemaker, Bridgerton, Yellowjackets, and Euphoria. Researchers can isolate specific programs and their corresponding Tweets or, for example, compare sentiment in Tweets across two HBO programs, such as: Peacemaker and Euphoria.
Hashtags: The “hashtags” metadata field includes a value for every hashtag in a corresponding tweet that is not the hashtag for the particular series. For example, if a tweet includes a series of hashtags such as #Peacemaker #JohnCena #JamesGunn #Awesome, the “hashtag” value would identify three unique hashtags within the tweet. #Peacemaker is the identifier hashtag whereas the other three hashtags are additional information. Researchers can use the information in this field to isolate the series’ that garner the most audience participation and that include the most unique hashtags within the tweet. The hashtags used in certain tweets provide a great deal of information regarding audience interest.
Date: The date for the given tweet.
Time: The time at which the given tweet was posted on Twitter. There are quite a few interesting avenues for future research that can be opened by using this value. For example, are viewers in the throes of the binge more likely to post at exceptionally late hours? Are those in a binge willing to forgo sleep for narrative closure? What is the relationship between sleep, sickness, and binge watching? These are but a few questions that this type of information, when isolated against tweets and series, can provide.
Twitter User Handle: As of right now, this metadata field provides the twitter user handle for the associated tweet. If this dataset were to be made publicly accessible, there is a question as to whether I would anonymize the data. My immediate response is yes, the data should be made anonymous. But one of my metadata fields is a source for the tweet which includes the user handle. This is still an open question, but I may anonymize the data for use on the spreadsheet.
ReTweet Count and “Like” Count: This metadata field, as well as the “Like” count, provide popularity metrics. Some questions that these metadata field may provoke: How often were particular tweets retweeted and which series were generating the most conversation?
Media Type (GIF, Video, Photo): The media type metadata field allows for three possibilities: GIF, Video, or Photo. If the tweet includes any media, the media type field will provide the corresponding type of media.
Media Link: This metadata field includes a link to the media type identified in the previous Media Type metadata field.
Source Link: This metadata field includes a link to the corresponding Tweet.
Section II
Curatorial Process, Affordances, Limitations, and Questions
The dataset was initially conceptualized as an inquiry into the relationship between bingeing practices and audiences’ relationships to their bodies (prior to, in the midst of, and after a binge). While these sets of questions motivate my own research, I felt that these lines of inquiry would limit the practical feasibility of the dataset and, further, would limit the types of questions that future researchers would be able to ask. I decided, then, to broaden the scope of the dataset to include the potential to address my questions and also questions that I could not possibly foresee. The broadscale metadata fields within the dataset are mostly lifted directly from the Twitter API. If this project were to be scaled up and include some form of automation, the Twitter API would be the groundwork for this later dataset. I hewed as closely to the Twitter API as feasibly possible without having direct access to the API. As I mentioned above, this project was conducted by-hand without access to the API nor to any software’s such as Python. The metadata fields Tweet, Date, Time, User Handle, ReTweet, and Like, are all included in the Twitter API scraping process. Alongside these metadata fields, as I have already sketched above, I included: Series, Hashtags, Media Type and Link, and Source. These metadata fields provide researchers with additional television viewing specific information.
The dataset was designed precisely because there were no other academic datasets of its type that addressed binge watching and television practices. This dataset is intended to be used by humanities researchers and, in my case, literary scholars. Although other datasets use Twitter’s API information, those datasets were not designed around any of the particular questions that motivate this project, specifically binge watching, television, and the language and discourse of the body. I hope, even in its initial stages, that this dataset provides potential lines of inquiry for literary scholars and digital humanists interested in new media studies. Ideally, as this dataset increases in scope and complexity, I hope to add letters, tweets, and additional documentation that move beyond the scope of the American television landscape and incorporate books, early print serials, and multinational televisual programs. This data, I argue, can and should be studied to analyze audience participation trends across media types and, in the scope of my own research, the formal elements and devices that contribute to binge watching/reading.
The data within this dataset is in conversation with a variety of fields including, but not limited to: television studies, media studies, participation studies, sentiment analysis, literary formalist analysis, and seriality studies. These are but some of the different fields that this dataset touches upon, but it is not inconceivable to imagine this dataset in conversation with researchers in communication studies.
I have addressed elsewhere in this document the affordances of this dataset but for clarity I will reiterate some of those points here. This dataset allows researchers to ask striking questions on why viewers binge watch programs. What are the mechanisms and formal elements that lead viewers to binge? How do audiences participate with serial programming before, during, and after a binge? Are binges singular or do they occur in multiples (that is, do most viewers binge a single program or multiple programs)? What is the relationship between bingeing and sickness?
That final question, on the relationship between binge watching habits and sickness, was one of the more fascinating directions of inquiry that revealed itself during the collection process. During the data collection process, I read over one hundred tweets and filtered and sorted them into the various metadata fields discussed above and one of the recurring conversational points was binge watching and illness. Many tweets discussed a peculiar enjoyment in binge watching that seemed to nullify, transcend, or distract from chronic illnesses, COVID, and so on. And yet, a growing body of research details a fatigue associated with binge watching practices. There is an untapped and undiscussed relationship between illness and bodily fatigue that this dataset and specifically its collection process made evident.
Above, I have provided two Voyant visualizations using TermsBerry and Cirrus. TermsBerry provides term relations between frequently used terms within the dataset. If one were to highlight over “Binge”, for example, one would see the terms that most readily appear alongside the highlighted term “Binge”. Cirrus is a word cloud that graphically displays word frequency within the dataset. Finally, the bottom image shows an example of correlations between terms. The most interesting correlation appears between season and time. The two terms are negatively correlated with a high degree of significance (0.0009). In a future post I may further explore this negative relationship as it is of theoretical interest to my ongoing projects.
The majority of datasets that deal with audience repletion, binge watching habits, and Twitter are found on non-academic data sharing websites. Below I have offered examples excerpted from my previous Abstract post:
Twitter Dataset - #AvengersEndgame
The Avengers Endgame twitter dataset focuses on a collection of tweets that were posted immediately after the release of Marvel’s Avengers Endgame. Tweets that contained the #AvengersEndgame were scrapped and added into the dataset. The dataset is time-stamped and contains the number of retweets and favorites.
The Apple Twitter Sentiment database follows much of the same basic structure as the Avenger Endgame tweet dataset but adds a sentiment metric. The researchers have not indicated the valuation of the 1-3 labels, nor have they provided a codebook to determine how sentiment (positive, neutral, or negative) is determined. Yet, the attempt to assign a sentiment value is interesting. Although, I believe that for my dataset it would be redundant to assign sentiments to binge-watching phenomenon, as most twitter users discussing bingeing-as-habit do so because they actively binge-watch programs.
Finally, the Game of Thrones S8 twitter dataset is a compilation of tweets produced immediately after the airing of individual episodes during the release of Game of Thrones Season 8. Again, this is a rather simple dataset. It is primarily an aggregate of Tweets regarding a particular topic.
While these datasets are interesting in their own rights, they do not exactly address the questions that I am looking to explore. I consider this dataset unique because of its research-oriented approach to a specific audience practice and formal aesthetic. This is also why my dataset is not, unlike those discussed above, oriented around a single program but around multiple programs. This may produce particular limits insofar as the data being collected is a narrow slice of the broader conversation around the programs of interest, but this is a necessary limitation. While I in no way want to argue that this dataset is ideal, it is a preliminary step for academic engagement with new media formalisms and audience practices. Further, it provides an initial trajectory for questions on narratology, the body, and the binge.
Section III
Reflections
The dataset, tentatively titled “Binge Watching, Audience(s) & The Body”, is conceptualized as one of the first data-driven projects on binge watching as an audience practice and narrative formalism. The concept of binge watching as an audience practice is generally clear and accepted but binge watching as a narrative formalism is less clear. For the purposes of this critical introduction, binge watching as a narrative formalism can be conceptualized as a narrative constructed around gaps and lacunae in narrative closure that propel audiences to search in the next installment for the closure they seek. This is a tentative definition with several flaws, but it is ultimately a move to portray the binge as not simply something that audiences do, but something that narrative compels audiences to do through its very structuration.
The more clearly delineated goal of this dataset is to compile a collection of tweets from Twitter users prior to, in the midst, or coming out of a binge of televisual programs. An immediate difficulty for the dataset outlined above is the attempt to link audience participation to narrative formalism. I contend that it is through the impact of binge watching to the body that we may begin to outline these potential crossroads. Yet, it is the goal of future research to explore these relationships.
Two immediate difficulties with this dataset are the use of Twitter and the specific programs that are represented therein. First, Twitter is a limited platform but the best that researchers have due to their accessible API. Twitter is also easily amenable with favored programming software such as Python. Yet, conversations around television and reading are occurring across a number of platforms (Reddit, TikTok, Facebook, etc.). Twitter is simply not the only or even the best platform to access the type of information that I am interested in collecting, but it is one of the easiest. In scaling up this project, I would be interested in looking to other social media platforms for information on binge watching and audience participation.
Second, the programs I have selected using Rotten Tomatoes and a Random Number Generator are part of only a small portion of cable and streaming Television offerings and they are predominantly American and European centric series. Further, these programs are offered on subscription-based platforms (HBO, Showtime, and Netflix). There are a number of cable streaming programs that have been neglected in this selection process. But the programs in this dataset were selected due to their popularity. Yet, this popularity was determined based on Rotten Tomatoes, a review aggregate site, which is in no way representative of actual popularity. Other metrics must be used to determine this but streaming platforms and premium cables services often obscure in dolling out their numbers. This is an open difficulty for those interested in new media.
I want to briefly return to a point I made. These programs that I have selected are predominantly American and European television series. Ideally, in scaling this project up, I hope to embrace the fifth principle of Data Feminism. As Catherine D’Ignazio and Lauren Klein state, “Data feminism insists that the most complete knowledge comes from synthesizing multiple perspectives, with priority given to local, indigenous, and experiential ways of knowing.” I agree with this sentiment and would like to add two comments: the emphasis on the body that has driven part of this research comes from an attempt to embrace “experiential ways of knowing” and, further, an incredible limitation of this research is the lack of non-American/European programs. Let me address the first comment: The specific knowing motivating this research, simply put, is narrative. How and why does the body know narrative? Why is the body imbricated in the movements of the serial dialectic of narrative openness and closure? On the second comment: I hope to expand this research and work through audience habits and narrative formalism that stretch beyond the American and European framework. This requires further resources and an expansion of those working on, and interested in, pursuing this type of research. The labor required in simply reading Tweets, reddit posts, and so on, as well as watching non-American/European programs is immense. Ideally, this would be accomplished with a team of committed researchers but ultimately this stretches beyond the current scope of this project.
Works Cited
“Apple Twitter Sentiment - Dataset by Crowdflower.” Data.world, 21 Nov. 2016, https://data.world/crowdflower/apple-twitter-sentiment.
D’Ignazio , Catherine D’Ignazio, and Lauren Klein. “5. Unicorns, Janitors, Ninjas, Wizards, and Rock Stars · Data Feminism.” Data Feminism, PubPub, 16 Mar. 2020, https://data-feminism.mitpress.mit.edu/pub/2wu7aft8/release/3.
Lima, Francisco de Abreu e. “Game of Thrones S8 (Twitter).” Kaggle, 13 June 2019, https://www.kaggle.com/datasets/monogenea/game-of-thrones-twitter.
Lolayekar, Kavita. “Twitter Dataset- #Avengersendgame.” Kaggle, 23 Apr. 2019, https://www.kaggle.com/datasets/kavita5/twitter-dataset-avengersendgame?resource=download%5D%28https%3A%2F%2Fwww.kaggle.com%2Fdatasets%2Fkavita5%2Ftwitter-dataset-avengersendgame%3Fresource.