Background
In 2012, along with DH software developers Alejandro Peña and Francisco Onielfa, I started to work on a digital oral history archive that gathers, preserves and provides access to the testimonies of Spanish women who became adults and mothers during the Francoist dictatorship (1939-1975). Their daughters, who came of age during the Spanish transition to democracy and its subsequent democratic governments, interview them about their recollections of the pre-democracy years and the socio-cultural differences they perceive between the two generations. The interviews are recorded on video. The archive, Mothers and Daughters of the Spanish Transition to Democracy, has collected 51 interviews to this date, and we continue expanding it.
Corpus
For this praxis assignment, I have used the first 20 interviews of our oral history archive.
Brief description of the sources:
- The participating mothers were born between 1921 and 1942.
- Their daughters were born between 1944 and 1977.
- The interviews were conducted between January and June of 2012.
- The interviews followed a semi-structured, open-ended format.
- The interview time-average is 92 minutes.
- The 20 interviews have been combined in one document for a total of 246,659 words.
Tools
All the interviews in the archive are processed with Dédalo. After being transcribed, they actually undergo a text analysis . We “index” them by linking different interview segments to the thesaurus descriptors that we have created for this specific project.

For this praxis activity, however, I have not used our thesaurus descriptors, and I have worked with the unindexed texts.
After a superficial exploration of Voyant, the tool that I decided to use, I was under the impression that it did not have multi language functionalities, so I consulted with Filipa Calado, our Digital Fellow, who used Python to clean my text.
Here’s the code that Filipa wrote to eliminate words that were irrelevant for my analysis:

I actually made Filipa go through a good deal of unnecessary work because, upon a more thorough investigation of Voyant, I found that the tool is, indeed!, multilingual, and that it provides interesting options to users, such as the possibility to edit the stopword list, which I took advantage of.


After I applied the new stopword list to my text, the count went down to 165,215 words.
Process
Making decisions about which words should stay or leave was not easy. For example, after applying my first modified list to the text, the analysis showed that the most frequent word was “no.” I wondered: Was the presence and frequency of this adverb saying something about the project participants’ experience of repression under the dictatorship? I began playing with the stopword list to try different scenarios, and decided that the analysis was richer when “no” was absent.
Similar questions arose with words such as “bueno,” which in Spanish can be used as a filler that marks a moment of reflection or hesitation (“Well…”) or as the adjective “good,” in opposition to “bad.” Eliminating all the “bueno” words might hide important information. I began to see how digital text analysis needs a good amount of linguistic tweaking in order to guide interpretation in a reliable way.
After playing with the stopword list for some time, I decided to keep this Cirrus visualization for the time being:

“Yo” (“I” in English; 1348 occurrences) and “madre” (“mother”; 1148 occurrences) are the highest frequency words. One could formulate some preliminary interpretations based on this data. For instance, subject pronouns are generally implied in Spanish. Speakers do not need to insert the subject pronoun in every sentence because verb conjugations already indicate who or what the subject of the sentence is. The excessive presence of subject pronouns is redundant, unless it is used for clarification or reinforcement. Thus, the fact that “yo” is the most frequent word in the interviews might denote self-assertiveness: the mothers are asserting themselves as the protagonists of the interviews. If confirmed, this would be a positive outcome, as many of them expressed fears and insecurities before participating in the project. They often said that their lives were “normal and uninteresting,” and that they didn’t think they had anything to share with the larger public.
The Links tool of Voyant shows the occurrence of “yo” in connection with “creo” (“I think/believe”; 198 simultaneous occurrences) and “sé” (“I know”; 127 simultaneous occurrences), which would support the idea of the interview as a space for self-definition and self-determination.

However, because I had eliminated the word “no” from the analysis, I do not know whether the verbs “creo” and “sé” might be have been used, at least in some instances, in the context of negative statements, as in “I don’t know.” An analysis of both scenarios should be made before arriving to conclusions.
There is another caveat to the “assertiveness” interpretation: the corpus contains the daughters’ questions too, and, in all probability, they have used “yo”. This distortion could be easily avoided by eliminating the daughters’ questions from the corpus before uploading it to Voyant.
The same caveat applies to my entire text analysis, which focuses on the mothers but has, nevertheless, included the daughters’ questions in the corpus to be analyzed. However, considering that the mothers’ narrative is a lot more extensive than their daughters’ interventions, my improvised interpretations might not be completely invalid.
The high presence of the word “mother” is intriguing. You might say that it is not a surprise: after all, the project is “all about mothers” (wink to Almodóvar). But, are they speaking about their own maternal role or are they referring to their mothers? I am inclined to think that they are speaking about their own mothers, which would show the presence of a matrilineal focus in the interviews.
The Links tool (see above) did not provide me with information about the presence of the mothers’ mothers in the interview, but did reveal that the term “mother” is collocated in the environment of “father,” which might indicate that the interviewee is, indeed, speaking about her parents when the term “mother” appears. The Links visualization also shows the terms “daughter” (“hija”) and “granddaughter” (“nieta”) in connection to “mother,” which could support the hypothesis of the matrilineal angle. Again, one could say that the project itself is matrilineal by design, but the interviewers and interviewees were not asked to focus on the grandmother-daughter-granddaughter line. If anything, the semi-structured interview-guide includes questions about family and children in general.
Back to Voyant word-lists options, I’d like to highlight the “White List” function (I wish they would have called it something less racialized), which allows users to observe the behavior of terms of interest to them. In order to use the “White List” options, it is important to set the “Stopword” list to “None:”

I chose to look at the historical and political terms of the periods that the interviews cover: republic, war (“guerra”), dictatorship, democracy. I also inserted some terms frequently associated with them: repression, Church (“Iglesia”), sin (“pecado”), freedom (“libertad”), free (“libre”).

“War” is the highest frequency term, which shows its robust presence in the collective memory of the project participants –a stronger presence than that of the 40-year dictatorship. A possible interpretation is that the questionable and imperfect nature of Spain’s democratic transition has failed to facilitate an unambiguous condemnation of the dictatorship, which might lead the participants to address the term indirectly, use euphemisms or avoid it altogether. By contrast, the “horrors-of-the-civil-war-narrative” does not carry any ambivalence in Spanish collective memory, which might account for the strong presence of the term “guerra.” Of interest, too, is that the “república,” the democratic period immediately preceding the war, has a minimal presence, which might corroborate the ineffectiveness of the Spanish democratic transition to rehabilitate the memory of its pre-war democratic precedent: the much-demonized, very progressive, and shortly-lived Spanish Second Republic (1931-1936).
There are many other interesting observations based on this quick analysis. For example, the participants might have codified the term “repression” as “sin” and “Church,” judging from the disparate presence of those three terms in the Cirrus visualization. The terms “libertad” and “libre” are more frequent than “dictadura,” and about as frequent as “democracia,” perhaps signaling a more defined and stable presence in the collective memory of the participants.
Possibilities
Voyant is a versatile tool that offers multiple possibilities for a project like mine. I could, for instance, separate the interviews to compare age and term frequency; I could analyze daughters and mothers separately; I could compare my interviews to other memory projects covering the same period; etc.
It is important to note, though, that variable control and a careful design of the analysis are necessary steps if we are to rely on Voyant’s data. For instance, we must be sure of the accuracy and homogeneity of the interview transcriptions (i.e. you cannot compare terms referring to time if dates have not been transcribed homogeneously). The stopword list is also of paramount importance because it has a direct impact on the type of information the analysis will yield. Additionally, in a project like the one I am working with, the data collection process must be taken into account as well: project design, interview format and questions, participants’ profiles, how interviews have been processed, etc.





