PRAXIS: Analyzing text in novel Klara and the Sun

Last semester, my colleagues and I read the novel Klara and the Sun by Kazuo Ishiguro. This novel is set in a world different from ours. Children have artificial friends (af), and most children go through a process called ‘lifted’; which is when their intelligence is artificially engineered. This novel is narrated by Klara an af, who is always learning, in order for her to be the best af for her child Josie.  

For this praxis, I chose the tool Voyant. Unlike me thinking of what text to select for this assignment; I was thankful for how simple it is to use the tools.  I jumped right in and viewed the tutorial as needed. At first glance, I was surprised when I read the summary to see words such as I’m and said, are considered most frequent.  Additionally, if someone were to analyze the text based on frequent words, they would think the novel is about Josie and the mother. While in parts, it’s true about Josie, the novel is not about her relationship with her mom.

There are many themes identified for this novel. Within one of the book reviews I had read, the themes identified are life, love, and mortality. Based on notes and conversations I had with my colleagues, some themes we identified are curiosity, empathy, loneliness, courage, and religion. After reflecting on the themes, I settled on focusing on life, love, and mortality, as I felt it still connected to some of the other themes. As seen below on the trend, I selected words, that I felt connected best with the themes. I was stunned that based on the theme and some major discussions, that some of these words weren’t mentioned as much as others. 

Klara is powered by solar, and therefore was taught that the sun provides nourishment; which is her form of life. Rick, Klara, and Josie’s mother have one thing in common and that is the love they have for Josie. In determining what is best for their children and out of love, as mentioned earlier parents lifted their kids and bought them an af.  

Between the af and being lifted it seems that the idea of mortality is almost nonexistent.  Towards the end of the novel, Klara reflects and states “Mr Capaldi believed there was nothing special inside Josie that couldn’t be continued. He told the Mother he’d searched and searched and found nothing like that. But I believe now he was searching in the wrong place. There was something very special, but it wasn’t inside Josie. It was inside those who loved her. That’s why I think now Mr Capaldi was wrong and I wouldn’t have succeeded.” The love, of the mother, is what leads her to not give up. It gave her the thought of trying to make Josie (her sick child) live longer even if it wasn’t as a mortal. 

To conclude, I understand why it has been expressed in our course that there’s meaning that is not always established with the use of these tools. The common words for this text do not fully reflect the story or themes of the novel. Using the cirrus (word cloud) as a visual, you will suspect the novel is primarily about Josie and the mother, and barely about Klara. I do see the benefits of using the text analysis to compare readings, although I would have some concerns about possibly misinterpreting the information if I haven’t already read some text. In terms of the tool, as mentioned, it was user-friendly. However, I do admit I have more to learn, as I’m still trying to understand how to best gather certain details within the corpus. There were other tools like the TermsBerry and Links I had planned on using, though upon writing I realized it wouldn’t properly reflect what I wanted.   

Topic Modeling Bernard Stiegler’s ‘The Neganthropocene’

For my praxis assignment, I chose to use Mallet to perform topic modeling on some texts by the philosopher Bernard Stiegler. The texts I used were essays from his 2018 collection The Neganthropocene. I chose these essays because, according to editor and translator Daniel Ross, they mark a shift in the philosopher’s thinking, and were published together before any full-length book during this period of his thought. Therefore, they are posited as representative of the state of his conceptual framework by the time of his death in August 2020. The question I wanted to pose was whether or not the topics modeled could provide us with something like a constellation of concepts important for understanding his philosophical project.

I only ran the topic models on two of the thirteen essays contained in the volume because I wanted to use these specific texts and they do not already exist as data anywhere on the internet. Because of the time-consuming nature of turning the essays into something Mallet could work with, I was only able to get to two. This means that the project was not particularly fruitful and was not able to produce something actually meaningful, but it was an excellent foray into text analysis and is something I would like to continue working on in the future. To make the essays into files that Mallet could read, I copied the text from the PDF versions of the essays in the book into a .txt file in TextEdit. This was easy enough in itself, but the practice that took the most time was dealing with the way the text was formatted as a PDF. There were 60+ instances in each essay where words were broken up by a line-break, and so I had to go through each essay and correct them. This was important because Mallet would read “memory” and “mem-ory” as two different words and would thus throw off the model. One important lesson from this was just how tedious and time-consuming the process of text mining can be.

Mallet is “a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.” I chose to use Mallet for my topic model because of its ease of use, and given I already have some experience with Python, I wanted to try something new. Mallet does not have its own GUI, so once you download the software, you have to use it from within your Terminal or Command Prompt. Once I had my .txt files prepared, I saved them to a folder within my Mallet directory so Mallet would be able to draw on them. One benefit of Mallet is that it has many capabilities built in that can do powerful things with your texts. I was able to turn my small corpus into something Mallet could work with by formatting it in the same sequence as the original texts while removing punctuation and stopwords, such as articles. Upon viewing the stopwords that Mallet removes by default while writing this blog, I realized there are some problems with it. It removes certain words such as “becoming” and “already”, which are actually important concepts for Bernard Stiegler (for whom, following Heidegger, being is becoming and Dasein is structured by the “already-there” of its historical past, and following both Heidegger and Derrida, many things are “always already” many other things). In doing future work with Mallet, this list of stopwords is something I would have to address.

Because of my very small number of texts and my lack of time to do a close reading of the texts with the topic(s) in mind, I did not produce anything that may actually be meaningful, but I do have some thoughts about topic modeling moving forward. It seems that one of the more difficult parts of topic modeling is deciding what is the right amount of topics for the data you are working with (I would also like to mess around with how many tokens make it in to a topic, but I did not know how to adjust that parameter with Mallet). It seemed to me that working with such a small number of texts (either one or two) required the amount of topics to be small to get anything meaningful. I used Mallet to produce two documents for myself. One was the list of keywords that listed out what the topics actually were, and the other was the composition that was a table of the likelihood of the topic to come up in a given text in the corpus. When looking for a larger number of topics with one or two texts (I started off with 10), the likelihood of each topic coming up in the texts was very small. As I incrementally lowered the number of topics, the likelihood of appearance in the texts went up. However, I still struggled to optimize my topics for the two texts. Even when I only searched for two topics, every time I ran them, there was basically one topic that was more related to one of the texts and another that was more related to the other. I might have been better reading with a highlighter.I know that, in the future, I will have to do two things. I will have to expand my corpus to at least the 13 texts in the book. I will also have to learn more about adjusting the parameters in Mallet in order to optimize my topics. When I ran the model for one topic, these were my results:

  • question
  • fsmilli
  • knowledge
  • organological
  • anthropocene
  • protentions
  • life
  • form
  • collective
  • noetic
  • negentropy
  • retentions
  • technical
  • l\’e
  • future
  • psychic
  • fact
  • digital
  • time
  • vi-strauss

This group of terms do provide an interesting point of departure for a potential future close reading of these essays. They are concerned with an approach to all theoretical and academic work which would situate them in their context of the Anthropocene from the standpoint of digital organology which produces new collective retentions and protentions, understood as the possibility of negentropy. It would be interesting to expand the number of texts to include the whole collection. In his newest book Psychopolitical Anaphylaxis, Daniel Ross claims a systemic explication of Stiegler’s thought has yet to be done and is highly necessary. Perhaps some text analysis work could be part of this project.

Another obvious obstacle that I faced was the appearance of nonsensical words in my topics. The ones listed above (“fsmilli”, “l\’e”, “vi-strauss) were found in the topics no matter how many times I ran it and no matter how many topics I trained the texts on. I combed both my texts to make sure there were no mistakes. The anthropologist Claude Levi-Strauss was mentioned many times, but none of these words themselves existed as tokens in my corpus. Upon doing a concatenation of the Mallet file I created, I realized this was the source of the mistakes. For some reason, running the following command created these nonsense tokens:

./bin/mallet import-dir –input TheNeganthropoceneData –output neganthropocene.mallet –keep-sequence –remove-stopwords

Somewhere between importing the directory containing my texts and plugging it into Mallet, these tokens were created. This is another point where better knowledge of how Mallet commands work would come in handy. If I had more time, I would have troubleshooted this further.

Other than this small hiccup, the project was a lot of fun and, as already mentioned, something I look forward to continue working on in the future. Stiegler himself is a profound thinker of technology and a passionate advocate for taking up digital tools as new powers of reading and writing. He teaches us that digital tools can create new revolutions in science, politics, and art, so long as their capacities for automation are not limited to mere analytic operations of calculation, but that also lend themselves to the synthesis of new knowledge, which always requires a human mind to take such leaps. I believe that something like topic modeling could individuate such knowledge, provided that we think through exactly what it can do and what that tells us.

Text Analysis Praxis-Analysis of the Words Behind the Animated Disney Movies–by Lu

I think that most of us at some point in our lives had the opportunity to watch an animated Disney movie. In fact, these animated films have become timeless classics and favorites of many generations not only because of the peculiarity of their characters but also because of the messages that they convey. But, are these messages truly positive? Are there words that the Disney script writers use more frequently than others? If so, what is the connection between these words?
Therefore, for this text mining assignment, I decided to analyze the words of some of the most popular animated Disney movies. So, I selected about 100 emotional and inspiring quotes from various Disney animated films. In my list, I included quotes from both the golden oldies and more modern Disney films such as Cinderella, Snow White, Sleeping Beauty, Beauty and the Beast, Bambi, Pinocchio, Lion King, Ratatouille, Finding Nemo, Dumbo, Winnie the Pooh, Toy Story, Lady and theTramp, Alice in Wonderland, Up, The Little Mermaid, Aladdin, The Hunchback of Notre Dame, and Frozen. I found these quotes published on two different portals, Kidadl and Good Housekeeping, dedicated to educating users and publishing trusted and curated content. In order to perform this text analysis, I used Voyant Tools and Google Ngram, which are open-source and web-based applications. I was eager to complete this project since I believe that, because of their popularity, Disney characters have a tremendous influence on the behavior of children. At the same time, being a Disney-movies fan, I think that in general Disney films have the power to inspire people to develop empathy and be more caring, kind, and appreciative of the value of family and friendship.

Analysis:

After I uploaded my corpus on Voyant tools, I obtained the following results:

The image shown above was made using the Cirrus tool, which allows us to visualize the frequency words of the corpus or document. It shows a cloud of words with the centrally positioned and largest-sized words representing the most frequent words of the Disney movie quotes I analyzed. Among these words we have “love”, “dream,” “day,” “it’s,” “come,” “true,” “heart,” and “believe.” This result shows us that the majority of the quotes from the Disney animated films used for this analysis convey a positive message related to love and believing in dreams that can come true. 

The next image shown below corresponds to the network-links graph. This network graph shows the links between the keywords I selected (shown in blue) and other frequent words that appear in the Disney quotes used for this analysis (shown in orange). This graph confirms that there is a connection between the words “dream” and “wish.” It also shows a connection between the words “believe,” “stronger,” and “braver.” In addition, we can observe a connection between the words “love,” “beautiful,” and “wish.”

The third graph was made using the trend tool. It shows a curved line graph of the most frequent words used in my document. Each series in the graph is colored according to the word it represents. As we can see, the blue line is the highest line, and it  represents the word “love,” which is the most frequent word. It is followed by a green line that represents the word “dream.” Next, we have a pink line which represents the word “believe.” Other frequent words observed are “life” represented by the light-blue line, and the word “heart” represented by the purple line.

In this graph, we can see how the word “love,” which is represented by the blue bar, appears with more frequency in quotes that are collected from movies such as Winnie the Pooh, Pinocchio, Bambi, The Lion King and Up, etc. At the same time, I think it is important to mention that the graph shows that in the quotes from the movie Cinderella the words “love,” “dream,” “believe,” “heart,” and “life” seem to appear with more frequency than in the rest of the movies. I’m not surprised about this result because after all, the message of the movie Cinderella is about kindness, and learning to forgive others as well as to believe with all your heart that through the power of love all dreams come true.

These results motivated me to make a quick analysis of the words “dream,” “believe,” “love,” and “Cinderella” using Google’s Ngram tool in order to see how popular these words have been in books independently of their frequency in the Disney films.

As we can see, the Ngram for the words “love,” “believe,” and “dream” shows that “love” and “believe” have always been a trend, especially during the period of 1800 and 1900. The graph shows an increase in the frequency of the use of these words during the 2000’s. In addition, it seems that the word “love” overtook the word “believe” in recent years (around 2018). On the other hand, the word “dream” has been used consistently over the years, but with less frequency than “love” and “believe.” This gives us an idea of how appealing these words are for the public, and why Disney writers decided to include them as part of the scripts of their animated movies.  

Moreover, the Ngram below shows that the word “Cinderella” has always been used in books through the years. As we can see, the word Cinderella increased its popularity around the time in which Disney released its animated film in 1950, which was based on the original and popular Cinderella fairy tale. The graph also shows that frequency of this word reached a peak around 2015, which was the year in which Disney released a new adaptation of the Cinderella story.

My thoughts about the text analysis applications:

Overall, I really enjoyed doing this analysis. I think that Voyant Tools and Google’s Ngram are very useful applications to analyze all types of texts. However, when using Voyant Tools, it took me some time to find out how to remove from the corpus stopwords or words that do not add much meaning to the analysis such as “an,” “and,” “or,” “but,” etc. Thus, I would like to explore in depth more about the use of Voyant Tools in order to apply this to other text analysis. 

Conclusion:

This text analysis shows that Disney classic animated movies contain messages that inspire viewers to believe in the magic of love and dreams come true. The words that appear with more frequency regarding the name of the film are: “dream,” “love,” “come,” “true,” “believe,” “heart,” and “day.” At the same time, the results show that these words are interconnected, and that they were used by Disney writers to create messages appealing to people about love, believing, and dreams. In addition, another result that this text analysis reveals is that, for many years, the story of Cinderella, in all its versions and different adaptations, has been one of the people’s favorites of all time.

Visual Analysis Praxis–Analyzing the Factors that Contribute to the Success of Horror Movies–by Lu

10 obscure yet extremely addictive films with popular actors - Bangalore  Next

Introduction
One of my favorite activities to do during my spare time is to watch movies, especially horror movies. So, I was curious about why some horror movies are more popular than others. Thus, some questions came to my mind. For instance, I wanted to find out if there is a relationship between the budget of a film and its popularity among viewers, or if such a relationship exists between box office earnings and the number of user votes of these films on sites like IMDb.

I tried to find online datasets that could help me to find the answers to my questions. However, I was not able to find anything with updated information related specifically to horror movies. Therefore, for this visualization assignment, I decided to collect information from a website called IMDb (Internet Movie Database) in order to create my own dataset of the 30 best horror movies of all time. According to Wikipedia, IMDb is an online database that provides information related to films and television programs. The dataset I created contains the title of the 30 most voted horror films among the IMDb users, the number of votes for each one of the movies on the list, the budget of each movie and their corresponding global box office earnings, release date, and running time. This data was collected by IMDb between 2011 and 2020.

Analyzing my dataset using Excel, TableauPublic, and RStudio
I created my graphs using Excel. At first, I tried to use TableauPublic and RStudio, which gave me similar results as the ones I got using Excel. However, I could not figure out how to show my data numbers as thousands (K) and millions (M). So, I decided to use Excel as I am more familiar with the Excel tools available to change settings and labels on the graphs. Thus, I created 4 different graphs to show and analyze the relationship between IMDb users’ vote numbers, movie budgets, and box office earnings. Also, I calculated Pearson’s correlation coefficient to measure how strong the relationship between these variables is. According to Pearson’s coefficient (r), a value of 1 indicates a strong positive relationship while a value of -1 indicates a strong negative relationship. In addition, a result of zero indicates no relationship at all. My results show no significant relationship between the variables I analyzed.

Data Analysis Results:

Graph 1: IMDb Thirty Best Movies of All Time
Graph 1 shows the relationship between the titles of 30 horror movies released between 1973 and 2018. We can see that the most voted movie among IMDb users was The Silence of the Lambs with over 1 million votes while the least voted movie was Hereditary with only 273,051 votes.

Graph 2: Relationship between IMDb users’ vote numbers and movie budgets:
Graph 2 shows the relationship between IMDb users’ vote numbers and the budget in dollars of each one of the 30 most voted IMDb horror movies between the years 2011 and 2020. In order to visualize this relationship, I chose a bar graph with a trend line. My results show a value of -0.155 for the Person’s coefficient (r), which indicates no significant relationship between the number of votes and the budget of each of the 30 most popular horror movies on the IMDb list. As a result, a large budget doesn’t necessarily translate into a high number of votes. For instance, according to IMDb, the movie The Silence of the Lambs with a budget of $19 million obtained over 1 million votes among the IMDb users, and it became the most voted horror movie on IMDb. On the other hand, the movie Sleepy Hollow with a budget of $70 million obtained only 345,504 votes. Furthermore, the movie Psycho turned out to be the fourth most voted movie on the IMDb list with a budget of about only $1 million.

Graph 3: Relationship between IMDb users’ vote numbers and box office earnings

Graph 3 shows the relationship between IMDb users’ vote numbers and the box office earnings in dollars of each one the 30 most voted IMDb horror movies between the years 2011 and 2020. In order to visualize this relationship, I chose a bar graph with a trendline. Again, according to the calculated Pearson’s coefficient (r = 0.097), I found that there is no significant relationship between the IMDb viewers’ vote numbers and box office earnings of each one of the most popular horror movies on the IMDb list. For instance, the movie The Silence of the Lambs, which is the most voted movie on the IMDb list had box office earnings equal to $273 million. However, movies such as Jaws, It, and The Exorcist obtained less votes on IMDb, but they reported higher box office earnings.

Graph 4: Relationship Between Movie Budget and Box Office Earnings
Graph 4 shows the relationship between movie budgets and box office earnings corresponding to the 30 most voted horror movies on IMDb between the years 2011 and 2020. A Pearson’s coefficient (r) equal to 0.233 shows that there is no significant relationship between the budget of the movies and their box office earnings. As we can observe in the graph, the majority of the horror movies on the IMDb list had a budget of about $30 million and box office earnings of up to $500 million. Two particular cases caught my attention: It and Sleepy Hollow. It had a budget of $35 million and accumulated box office earnings equal to $701 million. On the other hand, Sleepy Hollow had a budget twice as big as the one of It ($70 million). However, this movie had a poor performance in the box office as it only accumulated about $207 million, which represents only about one third of the box office earnings of the movie It.

Conclusion:

I believe that data analysis and visualization is very important to understand information about any topic. Also, I think that learning basic statistical concepts and how to use software such as Excel, Tableau, RStudio, and Python has become increasingly important for all academic fields.

In respect to the topic I chose for this particular assignment, I found no significant relationship between the variables that I analyzed. Therefore, there is no relationship between the budget of each one of the 30 most voted horror movies on IMDb and their box office earnings and how popular they are among the IMDb users. Both the graphs and Pearson’s coefficient (r) results confirm this.

In addition, from my experience after completing this assignment, I find Excel to be the most accessible tool for users of all backgrounds. However, the value of Tableau, RStudio, and Python as open-access tools is huge, especially for researchers. This is because they offer the possibility of analyzing large datasets and creating graphs to visualize and understand information. However, RStudio and Python require coding skills and more practice than Excel and Tableau, which can be more challenging and time consuming for people with less experience in this area.

On Digital Pedagogy

In “How to Not Teach Digital Humanities,” Ryan Cordell makes DH take a hard look in the mirror. Or rather, he accuses DH of too much self-contemplation, of asking one too many times “What is DH?” Cordell ventures that students aren’t really interested in these “meta-academic” questions, and that professors would do better to stick to more direct theory and practice. I agree—as an emerging field, I think DH defines itself better through practice than self-questioning.

Cordell honestly reflects on mistakes he made in designing his own DH curriculum, and shows sympathy to critics of the field. On digitality, he remarks that many students suffer from digital exhaustion, and gravitate to the humanities to read and think deeply away from more “practical” and “technical” fields. Rather than a way to appeal to attention-deficit, internet-addicted teens, from this perspective DH will only repulse them.

I agreed with Cordell that DH should try to temper its perhaps unseemly digitality by placing itself on a long timeline of evolving media (integration), and that DH would do well to introduce itself in small doses into humanities courses.

How to leverage the impact of digital humanities projects? A review of Posner’s reverse engineering in digital projects

The introduction of the Digital Memory Project Reviews, which is a compilation of several reviews of digital memory projects developed by the students of the course Digital Memories: Theory and Practice, presents an interesting reference used for analyzing Digital Humanities projects: the  “How Did They Make That?” video. In this post, I want to make a brief presentation of Posner’s reverse-engineering of digital products and suggest how it could be improved to leverage the impact of digital humanities projects.

In the video, Posner presents a method that helps to unveil how digital humanities projects are built (what she also calls “black boxes”). She then defines three patterns that are present in any digital project: 

1) Sources: a collection of data (ex.: files, images, texts, numbers, videos, sounds, documents, artifacts, etc.);

2) Processes: treating data to make it machine-readable (ex.: organizing, editing, correcting, digitizing, quantifying, etc.);

3) Presentation: make it human-viewable (ex.: make it searchable, interactive, web-accessible, mapped, etc.).

In her perspective, identifying these patterns can be applied in most DH projects and, therefore, inspire students to create their projects. However, in the same video, when she interviews scholars and analyses their projects based on her reverse engineering framework, an issue emerges regarding how people access their digital projects and what they should do to make them more accessible. This aspect becomes very clear in Rachel Deblinger’s comments when she talks about how she was surprised by how people navigate her Memories/Motifs project, a website that presents memories of the Holocaust survivors in postwar America.

Based on that, I started to think that maybe it would be helpful to add another element in Posner’s reverse-engineering method: audience. This way, students would exercise thinking not only about the resources and formats they can choose to build a digital project but also who they want to benefit by building it. In my perspective, covering this aspect would leverage Digital Humanities projects’ impact, bringing insights into how DH projects are taking into account the demands of different audiences, specially the ones outside the academy.

Pedagogy and the Digital Humanities, Felicity Howlett

Ryan Cordell’s account, “How Not to Teach Digital Humanities,” is an exceptionally refreshing perspective on how digital technology can be integrated into humanities courses in an undergraduate academic setting. His practical approach to bringing technological applications to students, interlacing innovation and historical context, emerged after his initial, general proposal for a Digital Humanities (DH) course failed to gain acceptance. Rethinking his agenda, he peeled away the cloudy, theoretical layers enclosing his topic—definitions of the field, its role in academic protocol, its responsibilities—and focused on incorporating applications from some core areas into academic inquiry. By relinquishing the responsibility to “cover the field,” he succeeded in placing materials and methods together on the table for a more hands-on approach. Along the way, his course titles have shifted from, for example, “Doing Digital Humanities” to “Texts, Maps, Networks: Digital Literary Studies.” Clarity comes from demonstration and engagement in meaningful digital skills and innovations.

Near the end of his discussion, Cordell considers a prediction that “digital humanities” may not be around at some future point. His response: “If it falls away because DH methodologies have become widely accepted as possible ways (among many) to study literature, history, and other humanities subjects, this seems to me a fine outcome.” That would seem to be a defining goal for the field—to become so integrated into the body specific that there would not have to be a distinction. Implementation of digital technologies, increased visualizations, sonic exploration, the ability to measure and think beyond the binary, the unearthing of material that has upended old ways of looking at things—all of these and more should be part of the working tools for scholarship in the humanities. As he points out, such an integration has not yet taken place.

The immensity of this pool of information—of data, interests, goals, implicit possibility, and the burgeoning applications (and their ever-expanding potential)—is demonstrated by the website, “Digital Pedagogy in the Humanities.” It is fed by so many rivulets that even keyword definitions are difficult to contain. In the abundance of possible interpretations, the relative youth of DH is particularly notable, and that seems to correspond with the promise that the field holds. As digital technologies are instrumental in unlocking previously hidden, or overlooked caches of historical information, there is immediacy and intensity in making these connections. Likewise, relatively younger, evolving areas (disability, feminism, sex, gender, marginalization of indigenous communities, to name a few), are replete with interested parties lobbying for space and acknowledgment. All this, while DH and its related areas are evolving so quickly that new attempts at self-definition appear every few years.

Within the above website, for example, the first keyword is Access. Perhaps I anticipated a discussion of how to make DH topics accessible to students. Rather, the first article opens with the report that 15% of the world’s population lives with some form of disability and how lack of digital access presents a significant barrier (Williams, Access: Curatorial Statement). Surely more than 15% of the world’s population lacks access to digital materials before the issue of disability even presents itself! This point is not to create an argument but simply to emphasize the range of issues that are brought into the forum for discussion and how difficult it is to cover them all while retaining a sense of focus and structure. In the following article, “Suggested Practices for Syllabus Accessibility,” Tara Wood and Shannon Madden encourage instructors to create their own (more personal) statements to students about accessibility. Examples are provided, including the idea of personalization such as “Your success in this class is important to me.” Such a remark may take place with sincerity in a classroom, but in this context, it seems an accommodation to a prevailing political academic atmosphere that goes beyond what might reasonably be included within the term “access” in a DH syllabus. In contrast, a “Wordpress Accessibility Plugin” also appears in this keynote collection as does “An Accessibility Audit Assignment,” a project asking students to assess the accessibility of given campus spaces.

With time, some topics that represent particularly sensitive issues, may fade from these areas, not because they should be forgotten but because they will become foundational in more general academic considerations rather than paperclipped into the DH field. And as the field develops and matures, it will take on a more definable shape. For the moment, I have a lot of sympathy for Ryan Cordell’s efforts to hone into methods of working with his students in ways that helped them connect and integrate the past, the present and the future!

References:

Cordell, Ryan. 2016. “How Not to Teach Digital Humanities” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. University of Minnesota Press.

Digital Pedagogy in the Humanities: Concepts, Models, and Experiments. A peer-reviewed, scholarly collection of pedagogical artifacts. Website edited by Rebecca Frost Davis, Matthew K. Gold, Katherine D. Harris, and Jentery Sayers.

Educational Engagement

Within the past two weeks I have had the opportunity to attend two workshops. One was on “Intro to Educational Game Design” and the other on “Fostering Engagement and Participation Using Multi-modal Learning”. While I am not teaching, I do train mentors at LaGuardia Community College, and felt that these workshops can help support my work. Both workshops provided new terms to some of what I was already doing and provided me with new ideas. 

For starters in “Intro to Educational Game Design”, I learned two new terms: gamification and game-based learning (GBL). While both terms can be used to approach Educational Game Design, the goal is to consider the outcome and assessment you want.  Additionally, I was introduced to some game principles: identity, risk taking, well-ordered problem, pleasantly frustrating, and situated meanings.  

With this knowledge I realized that I was already incorporating educational games to my training. One educational game I always include is role play. I provide my mentors scenarios in which the objective is for them to respond as if they are already mentors. This would qualify as the identity principle and fall under GBL. Another educational game I sometimes use is Kahoot to test out their knowledge of topics discussed during training. Based on my understanding of the types of approaches, this would be gamification, because it has extrinsic motivation, which is trying to place in the top three. In terms of the principle, I would identify this as risk taking, since the game is designed as multiple choice, with a time limit. Instinctively one selects a choice before time runs out, because they want the points. 

Like “Intro to Educational Game Design”, I realized that I was already fostering engagement and participation through the use of multi-modal learning, particularly more within the last year. The idea behind multimodal pedagogy is to encourage creativity by including a combination of text, images, motion or audio. Multimodal learning should also challenge students, allow them to engage more, use existing skills, create their own meaning and more.  

A primary tool used at our college is ePortfolio. One assignment on ePortfolio, that we assigned towards the end of training is called Peer identity. For this assignment, the mentors write about why they are a Peer Advisor, the strengths they bring to the role, etc. Within this assignment they are encouraged to include an image. Some choose a picture of themselves, while others choose an image that represents the department they are placed to support.  

Given our work online, within this past year, I have used many engagement tools, such as Jamboard, and Padlet. Both tools have been used to allow peers to collaborate with one another, for icebreakers or to dissect readings or discussions we had during training. I came away from the workshop with a new idea on how I can incorporate Padlet into another part of my training. 

While these two workshops were unique, it had a common goal which is to have students, or in my case mentors, learn in an engaging way. My takeaways from both workshops were a clearer understanding of what I am currently doing. Having that clear understanding has allowed me to think of other ways I can engage my mentors either through educational games or multimodal pedagogy. It also provided me with ideas of how I can enhance areas of my training by incorporating new tools or adapting topics and using tools I already know. 

Text Analysis Praxis – Sara Teasdale: The Early Years – by Caitlin Cacciatore

Word cloud
A word cloud of Sara Teasdale’s most-used words

Introduction:                                                        

Sara Teasdale is a name many both in the literary world and in the larger sphere of academia would recognize. I have long been fascinated by her works, and as a youth, I had “Barter” printed out on computer paper and pasted on my bedroom wall.

Teasdale was born in 1884, and would go on to write several books of poetry for which she would achieve international acclaim. She won the first-ever Columbia Poetry Prize, now known as the Pulitzer Prize for Poetry, in 1918 for the collection of poetry entitled Love Songs.

“Although later critics and scholars have marginalized or excluded Teasdale from canons of early 20th century American verse, she was popular in her lifetime with both the public and critics.”

The Poetry Foundation

I was able to find four of the seven books of poetry she published in her life on Project Gutenberg:

  • Helen of Troy and Other Poems (1911)
  • Rivers to the Sea (1915)
  • Love Songs (1917)
  • Flame and Shadow (1920)

Her later books were not available, as the copyright has yet to expire on Dark of the Moon (1926) and Stars To-Night (1930). (On January 1st of 2021, all books from 1925 entered the public domain, so one need just wait a scant few months for the copyright to expire on Dark of the Moon!)

“In the twenty-first century Teasdale has received attention from scholars such as Melissa Girard, who argues that aspects of Teasdale’s poetry have been neglected or overlooked, including her anti-war poetry from World War I.”

The Poetry Foundation

Teasdale’s life was tragically cut short by suicide in 1933. She was just 48 years old. Strange Victory was the last of Teasdale’s books, published posthumously later that year.

For this project, I assembled the four books of her poetry that were available at Project Gutenberg. The final document was nearly 200 pages long.

This exploration into the early years of Sara Teasdale does not document Teasdale’s troubled years, plagued by mental illness in a time where great stigma was attached to such ailments. We will see, in our analysis, a young, vibrant poetess, with great potential and greater passion. While this analysis is, by nature, incomplete, and does not reveal the same nuances a close reading of her works might, it highlights over-arching themes and brings the text to light and to life in a way that would be impossible without Voyant.

That is, perhaps, the incredible power of text-mining and text analysis – when you cannot see the forest for the trees, Voyant and similar tools can provide you with a trail guide – a way to navigate, and a different way of seeing.

Analysis:

‘Love’ is Teasdale’s primary focus, with 349 occurrences throughout her work. Love is described in various turns of phrases, like:

  • “After love…” which occurs 8 times
  • “Buried love…” which occurs 4 times
  • “Hidden/hiding love…” which occurs 3 times
  • “New love…” which occurs 6 times

Second on Teasdale’s list of literary focuses is the ‘night,’ which occurs 193 times. This contrast – the light of love, and the dark of night – is a poignant one. The strength and power of love dwarfs night, which is described at various occasions in phrases such as:

  • “Winter night…” which occurs 7 times
  • “June night…” which occurs twice
  • “Blue night…” which occurs twice

The ‘sea’ is a word that occurs 164 times, described variously as a “burnished sea,” “darkened sea,” “dreaming sea,” “living sea,” “molten sea” “predestined sea,” “shallow sea,” and “starlike sea,” among others. The shifting moods of the sea are well-documented; the sea is, at some points, “sweet” and at other times “bitter.”

The word ‘like’ appears often as well, clocking in at 166 usages. This is, upon further inspection, is due to Teasdale’s extensive use of simile, which she uses to describe “music like a curve of gold,” in my personal favorite poem of hers, “Barter.”

Rounding out the top five is the word ‘heart.’ You can see the first five of 150 usages in the screenshot below, which gives the context surrounding each usage of ‘heart.’

Heart context

Also popular were:

  • Oh (144 usages)
  • Shall (130 usages)
  • Wind (115 usages)
  • Eyes (111 usages)
  • Come (99 usages)
  • Song (89 usages)
  • Stars (87 usages)
  • Light ( 78 usages)
  • Soul (65 usages)
  • White (64 usages)
  • Rain (60 usages)
  • Death (53 usages)

Further Exploration:

I decided to play with Voyant’s suite of tools, and made the following graph of the four most popular terms in Teasdale’s early poetry. The graph moves along the horizontal x-axis in terms of document’s segments. For clarity’s sake, I changed the default setting of 10 settings to 12, after experimenting with splitting the book into its original four segments, which didn’t quite shed the same light on the progression of Teasdale’s thoughts and ideas throughout her early poetry. In the graph below, each of the four books are represented by three distinct segments:

Graph of top four words
Graph of the top four words used by Teasdale

One can see the presence of love in Teasdale’s writing often spiked or dropped, and it would be interesting (though potentially intrusive) to map this onto a history of her personal relationships.

One can see other correlations in the data as well, and it is evident that love is most often Teasdale’s focus, eclipsing all other words in all but a few of the segments.  

Another interesting tool is TermsBerry. When one hovers over a word, such as ‘love,’ it appears in green and all related words that appear in conjunction with that word are highlighted in shades of pink, depending on how often they appear in concert with the selected word.

Concluding Remarks:

Exploring Voyant was an interesting voyage. I began with the most basic functionalities, then worked my way to fiddling with the higher-level settings of more advanced tools.

One aspect of textual analysis I feel compelled to note is that using Voyant for text-mining is not necessarily a replacement for reading the text itself. A lot is lost through distant reading – though there is still a lot to be gained through the process. However, one will never be able to feel what Teasdale meant for us to feel, if we use Voyant or a similar such tool instead of reading her work, in its original form; perhaps even aloud, the way poetry was meant to be read.

Yes, certain themes become more evident by employing text-mining, but even though we can safely classify Teasdale’s work as ‘love poetry,’ just knowing that love appears 349 is not enough. It’s an impersonal metric. For a fuller, more complete, more nuanced understanding of the work, Teasdale should be read, either silently or aloud, in conjunction with text-mining.

Only when the tools within us and the tools we create are used in tandem can we fully begin to understand a text we are analyzing.

I will leave you with a poem. It’s an old favorite of mine, and I hope it’ll speak to you too.

Blog Posting #5: Praxis Assignment – The Trends of Seven Major Felonies in NYC from 2000 to 2024

When we decide whether a region is habitable, we consider many factors: infrastructure, environment, education, transportation, market, etc. Crime rate is one of these indices because it is closely related to our feeling of safety. As a criminal justice major, I feel interested in whether New York City is becoming a safer place to live. By visualizing the trends of seven major felonies from 2000 to 2020, I found that this city was getting less struggled with crimes. In addition, I also found that the future trends from 2021 to 2024 would be similar to those of the recent two decades.

I selected Tableau Public as my tool for data visualization because it could provide an easy way to visualize the data through a simple drag-and-drop, and it was free for public use. I collected the data from two sources: 1) the NYPD and 2) the US Census. First, I could access the historical NYC crime data on the NYPD web page, including the citywide seven major felony offenses from 2000 to 2020. The seven major felony offenses consist of 1) murder, 2) rape, 3) robbery, 4) burglary, 5) felony assault, 6) grand larceny, and 7) grand larceny of motor vehicles. Second, because the numbers of felony incidents alone cannot indicate the situation precisely, I included the information on the population of NYC and constructed the crime rate data. On the US Census webpage, I could acquire the NYC population in 2000, 2010, and 2020. Then, I divided the number of each felony crime by the population: the 2000 to 2009 crime data were divided by the population in 2000; the 2010 to 2019 crime data by the population in 2010; and the 2020 crime data by the population in 2020.

After constructing eight broken-line graphs, including the total and seven felony rates, I added the trend lines and forecasts in each graph. As shown in the graphs below, all felony rates, as well as the total rate, were declining during the two decades, although they were different in the extents. Based solely on these visualized data, NYC seems to become less affected by the major felony crimes. Specifically, the total rate per 10000 population decreased by more than 50 percent during the period (from 230.6 to 108.6). The rates of robbery (from 40.66 to 14.89), burglary (from 47.89 to 17.58), and grand larceny of motor vehicles (from 44.26 to 10.26) were also reduced by more than 50 percent. Even though not as dramatically as the previous three major felony crimes, the rates of murder (from 0.84 to 0.53), rape (from 2.58 to 1.62), felony assault (from 32.37 to 23.37), and grand larceny (from 61.97 to 40.33) decreased significantly.

The trend lines and forecasts also showed similar results with the past changes during the last two decades. The numbers in light red shadows presented the future trends from 2021 to 2024. Specifically, Tableau Public forecasted that the rates of total felony crimes (from 108.6 to 97.4), murder (from 0.53 to 0.42), robbery (from 14.89 to 10.74), burglary (from 17.58 to 13.23), and grand larceny of motor vehicles (from 10.26 to 6.60) would decrease. In contrast, it predicted that the rates of rape (from 1.62 to 1.84), felony assault (23.37 to 24.20), and grand larceny (from 40.33 to 46.71) would increase slightly. However, it is clear that the general trend is declining, as shown in the total felony crime rate. In other words, we can say that NYC will have fewer problems with serious crimes.

I feel that Tableau Public is an easy-to-use and robust instrument to visualize the data. For example, by doing drag-and-drops several times, I could create nice-looking graphs, including trend lines and predictions for the future. I believe that such tools can bridge the gaps between academia and the general public by making it easier to read data.

Link to the graphs:

https://public.tableau.com/app/profile/jinuk.jeong/viz/TheTrendsofSevenMajorFeloniesinNYCfrom2000to2024/Story#1

Data Sources

1) NYPD: https://www1.nyc.gov/site/nypd/stats/crime-statistics/historical.page

2) US Census: https://www.census.gov/quickfacts/newyorkcitynewyork