Claire Richie

Critical Introduction to the Early Modern Care Project

2022-05-07T00:00:00+00:00

Technical Documentation

The dataset Early Modern Care is a dataset of 1022 data points presented in rows of a google sheet. Our data points are bibliographic information on Early Modern print texts (from roughly 1500-1700). The Early Modern print texts that we have chosen for this dataset are catalogued on the English Short Title Catalogue (ESTC) housed by the British Library (BL) and are related to our definition of Early Modern Care.

Our definition of care and this dataset are inspired by a previous digital humanities project that we created called: Early Modern Maternity and Caretaking (EMMC) Over the course of five months from August 2021-December 2021 for EMMC we curated by hand (through bibliography mining and other research methods) a list of 20 early print texts on the subject of caretaking written by women. This preliminary, hand-curated dataset is housed under the “Print Text CSV” tab of EMMC. On this preliminary CSV under column “STC” we provided links to each print edition’s catalogue in the ESTC. To fully understand the boundaries of our definition of care for the final project dataset for ENG 612, we explored the keywords labelled by BL librarians to each print text on EMMC and chose 10 keywords to use as the basis for our definition of care.

We envision Early Modern Care for our scholarly research as positive acts of caring for the whole person grounded in the body and materiality. In light of this definition, we chose the following keywords to identify texts for the Early Modern Care dataset. Our keywords are: Cookery; Home Economics; Canning and Preserving; Etiquette; Gynecology; Obstetrics; Midwifery; Midwives; Parent and Child; Childcare. Each of these ten keywords in the ESTC calls up fewer than 200 texts. We chose to focus on these keywords because they were manageable to scrape from the ESTC by one person in one sitting.

The scraping process included using the add-on extension of Zotero Research Assistant on the Chrome web browser. The extension on Zotero would obtain all of the bibliographic information for each text including all the keywords BL librarians attribute to it. These BL keywords are important information for us to retain as we hope to turn Early Modern Care into a functional and searchable database. These BL keywords are a starting point for that future phase of the project.

Once we scraped all the texts on a keyword, for instance, the keyword “Gynecology” the information would be stored on the Zotero research assistant desktop application. The scraping process involved scraping ten texts at a time as they appear in the ESTC database. To trace how we arrived at the keyword “Gynecology-Early works to 1800.” we took a look at Jane Sharpe’s 1671 edition of The Midwives Book which is catalogued in the Print Text CSV of EMMC. Then we located the keyword “Gynecology-Early works to 1800,” clicked on it in the ESTC and then clicked on the “Find other documents in the catalogue” button. Then, we found that this keyword is scrapable in one sitting (less than 200 entries) and continued with scraping 10 entries at a time.

For instance, once all 67 texts under the keyword “Gynecology-Early works to 1800” were scraped using Zotero and stored in the desktop app, we exported the Zotero folder to a CSV file on our local computers. Then, we uploaded the CSV file to Google drive and converted it into a google sheet. The raw data of each keyword was initially messy with repeats of each edition. First, we went through by hand and deleted rows that were repeats of the same information. Then we deleted empty columns and metadata columns that are unhelpful to us such as the date accessed by us the researchers. Finally, we added column G and column H. In column G, we hand assigned gender under three categories, male, female, and unknown. Any texts that we could not ascertain the gender we defaulted to unknown rather than to male. And texts by multiple authors with at least one female author we defaulted to female. Finally, the last step in our cleaning process was to add the exact keyword from the ESTC that we used to scrape, in this example “Gynecology-Early works to 1800.” This is to make our process transparent and reproducible for other researchers.

For the sake of documentation, transparency and reproducibility, we retained each original CSV from Zotero and each cleaned CSV according to keyword. Ultimately, we compiled all the keyword-based cleaned data into one master sheet, sorted it ascending according to author name, and created a codebook for our metadata fields which are below. This CSV is called “Early Modern Care Final Dataset” with the “Codebook” as a second tab on the google sheets.

Our metadata fields are as follows: A. ESTC identification number; B. Materiality of the text; C. Author name; D. Full Title; E. Publication Date; F. Page numbers; G. Gender of Author; H. Keyword Search from ESTC Subject headings; I. Shortened Title; J. Printer; K. Publication Place; L. URL of ESTC; M. Subject keywords as defined by ESTC. We acknowledge that Column M, Subject keywords as defined by the BL librarians at the ESTC, will need to be hand cleaned and organized using regular expressions in the future.

The most striking part about our process of creating the dataset Early Modern Care for the ENG 612 final project, is the reproducibility of our work. Early in this discussion of our process or, as others might think of it, our research methods, I mention the word curation. Julia Flanders and Trevor Munoz aim to define data curation in their introduction to Digital Humanities Data Curation. They begin by defining curation in the vein of early print and manuscript studies and the care of a text. This rightly has great overlap with the kind of caretaking we hope to highlight with our dataset Early Modern Care. It also speaks to the kinds of ethos we hope to build as we document our process of data collection, an activity Flanders and Munoz quote as key: “[Data curation] carries with it the burden of capturing and preserving not only the data itself, but information about the methods by which it was produced.”¹ We hope that this portion of our critical introduction carries on the tradition that Flanders and Munoz seek to create.

We would be remiss to only touch upon the ethos of documenting our project. Another important aspect to understanding our process is to understand our data cleaning. Katie Rawson and Trevor Munoz discuss the lack of transparency around data cleaning and what it entails. In their chapter “Against Cleaning,” they write: “In reality, data cleaning is a consequential step in the research process that we often make opaque by the way we talk about it. That we employ obscuring language like ‘data cleaning’ should be a strong invitation to scrutinize, perhaps reimagine, and almost certainly rename this part of our practice.”² This portion of our critical introduction pushes back against this opacity. Yes, we data clean like most digital humanities researchers. But we attempt here to explain in detail our process and to also acknowledge that our data such as the BL keywords is still messy and that is okay.

Curating and cleaning are practices that tie into the notion of radical acts of care- the care of data. Through our exploration of early modern caretaking we hope to make a connection to a 21st century kind of care that is different than the kind of early modern care we seek to circumscribe. Data care is ephemeral, abstract, sometimes singular and not grounded in the body or the material. Yet, is no more or less important than early modern caretaking.

Conceptual Framework

The Early Modern Care dataset provides a comprehensive corpus of Early Modern printed texts on caregiving. Defined by the concept of cura personalis, or “care of the whole person,” this dataset encompasses materials on caregiving according to Early Modern ideas of care. The time period this dataset covers is that of the English Short Title Catalog (ESTC), 1473-1800. This time period covers the Early Modern period and slightly beyond, accounting for the immense boom of printed texts at the end of the seventeenth century during which Early Modern ideas and values can still be traced in printed texts. Care is not necessarily easily defined, and thus creating a database of these texts is not an altogether intuitive process. The exclusion metrics were equally as difficult to define. For the current dataset, all of the texts are positive rather than punitive; the dataset chooses to focus on positive actions that can be taken to enact care, rather than instructions of how to avoid punishment or read signs. The keywords that have been scraped thus far are grounded in the body and instructions are actionable by all people. For instance, astrological pamphlets and almanacs have been excluded because they instruct their readers on how to interpret celestial signs, rather than actions they can take themselves to best interact with celestial forces. The texts are relevant across classes, and not exclusive to the elite. Grounded in bodily actions and how people move within their environment, these texts are focused on positive action and personal autonomy. The Early Modern Care project seeks to aggregate texts on care practices from throughout the Early Modern period.

This dataset addresses issues surrounding access to Early Modern texts that are often gendered in their construction. Caregiving has been traditionally gendered as a female activity and thus has not received much attention in scholarly literature, especially considering the multifaceted nature of caregiving in the Early Modern period. This dataset is an attempt to define caregiving via texts collected through a digital humanities methodology. Especially for pedagogical purposes, this dataset will allow for a definition of caregiving with tangible examples of how that care would have been enacted through textual evidence. An aggregated resource for this gendered activity connects physical objects with a more nebulous theory of caregiving practices. That women were primarily involved in these caregiving actions contributes to the vague definition and lack of concrete knowledge of these practices. Writing on actual tasks that take place in a primarily domestic environment helps to lift the shroud of mystery that is often so present surrounding women’s household labor. The open access nature of this dataset allows it to be accessed for purposes of scholarly research and pedagogy, which will help foreground these practices in scholarship and the classroom. The aggregation of these sources cuts down the research legwork that is so often a barrier to practically implementing this scholarship. The dataset allows researchers to answer questions about gendered caregiving practices, the popular market for texts on caregiving, authorship, and definitions of care. Only about 5% of texts in the dataset have female authors, but many of the titles suggest that these texts were marketed primarily to women. This dataset will help answer questions about the differences between who was regarded as an authority on caregiving versus who was buying these texts versus who was profiting off of them. Early Modern Care can also further the examinations of how authors created authority for themselves and how that authority functions in the book trade. Ultimately, how did the public book trade marketplace reflect the domestic realities of caregiving in the home?

The dataset will eventually also be a point of comparison for other similar projects, such as the Reading Early Medicine project (REM). REM “built a robust bibliographic database of all works on health and healing published in English from the dawn of print until 1700,” which contains over 2500 titles.³ Early Modern Care will share some of the same texts with REM, but the different parameters will also put this dataset into conversation with the other project. How does medicine overlap with caregiving? Where does domestic practice overlap with professional physicians? In the context of other digital humanities scholarship, the Early Modern care dataset can be informed by and enlighten other projects. This dataset is in conversation with other projects that seek to expand access to and knowledge on domestic practices, such as the Making and Knowing Project at Columbia University, and the Manuscript Cookbooks Survey. The Making and Knowing Project takes a practical and tactile approach to exploring domestic practice, actually making the recipes from an Early Modern manuscript receipt book. The Early Modern Care dataset will explore how instructions for these physical practices were disseminated through print. The Manuscript Cookbooks Survey is a practical resource for researchers, as it identifies repositories of unique manuscript items. The Early Modern Care dataset will allow researchers to put print and manuscript materials into conversations on caregiving, as manuscripts circulated alongside print materials, especially in domestic settings. Ultimately, this dataset aims to contribute to interdisciplinary conversations about gender, bibliography, labor, domestic culture, and economy across the Early Modern period.

Personal Reflection - Kate

The process of working on Early Modern Care helped me think more intentionally about the process of collaboration not only in DH but also in the humanities at large.

Almost no output in academia happens in a vacuum. Yet, some people’s labor gets more acknowledged (and acknowledged in the reward streams of academia). The relationship between graduate student research assistant and PI on a DH project was something that struck me intimately in Rachel Mann’s “Paid to Do but Not to Think: Reevaluating the Role of Graduate Student Collaborators” from Debates in the Digital Humanities. It has been shown in studies that graduate teaching assistants often outperform faculty in the classroom, and I can confirm this anecdotally from listening to colleagues recount their course evaluations. What concerns me most is what are we incentivizing in the academy? I feel like we are incentivizing a place of competition where the single-authored monograph is the gold standard of intellectual activity. When we read about data papers as we examined humanities journals such as The Journal of Cultural Analytics, I was thrilled to see datasets published and professionalized online. This helped me envision co-authoring a data paper on Early Modern Care with my colleague, Claire Richie, to publish on a similar platform. Yet, from our class discussion, I was shocked by how collaborative papers are counted in terms of tenure review. This makes me think about how I value my work on this dataset and my role as a graduate student:

Graduate students in the humanities also need to be trained to write and publish critical, interpretive work based on DH projects.⁴

Working on this dataset has shown me that just because something may not be valued by traditional streams of reward does not mean that it is not still worthwhile. Honestly, collecting this data was almost effortless for me. It was a task that I did for fun to destress rather than a class assignment. The reason that the dataset is only 1000 points is that we conceptually ran out of keywords to scrape. I would have kept scraping, but Claire and I need to conceptually regroup before doing so. The value of this data-work for me blends into the personal and is not simply motivated by academic reward.

Throughout ENG 612, we have discussed the labor and intellectual work that goes into creating a dataset. However, through collaboration and strong advising, this dataset has felt far from laborious. The most laborious part of the dataset is the documentation of our curatorial choices for reproducibility and transparency. This has helped me think about how I make choices in life. Perhaps this is too meta or beyond the scope of this critical introduction, but it feels like one can go through life making a series of choices each day and not think about why they made those choices. I am certainly guilty of that. However, working on this dataset has made me understand the choice behind more “scientific” work or perhaps “big dick data.” This attention to choice-making makes me a more ethical scholar and a more ethical human.

The ethics of data strikes me. This dataset that we are working on has helped me quantify in a number the percentage of women publishing in early modern England. I have learned to assert to others that women were in fact writing and publishing during the time of Shakespeare. I thought the patriarchal bias of the canon was to blame for these voices getting excluded–which is only one of the factors. A huge factor in the fact that we rarely read and teach early modern women’s writing is that in comparison, women’s writing is only a small fraction of what exists in the archives–I estimate only 5 percent. This is why this database matters. For me, it forefronts women’s writing and situates them in the context of the other male authors publishing on similar subjects in a way that no other early modern DH project has. Unlike the Pulter Project which focuses on one text by Hester Pulter, or Women Writers Online which creates an archive of texts by women, Early Modern Care contextualizes women’s writing on caretaking. It does not only foreground the “treasures” of the early modern print on caretaking but rather reimagines the archive. Andrew Prescott and Lorna Hughes give good advice in their article “Why Do We Digitize? The Case for Slow Digitization” when they write, “There is a risk that digitization programs, by focusing on making “treasures” more widely available, will reinforce existing cultural stereotypes and canonicities.”⁵ We at Early Modern Care are trying to rebuild the canon to highlight the work of women in holistic embodied care. This is what fascinates me most about the work that we are trying to do.

Personal Reflection - Claire

As we developed our dataset, there were several considerations that were at the forefront of my work on this project. First and foremost, both Kate and I placed value on the transparency and the replicability of our data collection processes. While scraping the data for our initial dataset was a fairly straightforward process once we determined the most efficient method, it was important to us to be able to justify and document these choices. From choosing the metadata fields to strategically selecting our initial keyword searches, we prioritized having a record of each of the curation choices we made. Which of the metadata fields scraped by Zotero were worth keeping? And what fields would we have to add manually? What seemed intuitive to me as a researcher was not intuitive from a technical perspective or from a DH perspective. Parsing the fields to determine which metadata was there because it was available for our program to take and what would be important to include for researchers was difficult. What if we were eliminating something important? Ultimately, asking ourselves ‘why?’ for every decision we made was the most effective way to counteract this worry. If we could not come up with a reason, we could reconsider the decision. It was also important that I keep in mind that the decisions we made when curating our metadata, no matter how logical they seemed, could not be neutral. As pointed out in Chapter 6 of Data Feminism, no dataset is free of bias. “Rather than seeing knowledge artifacts, like datasets, as raw input that can be simply fed into a statistical analysis or data visualization, a feminist approach insists on connecting data back to the context in which they were produced.”⁶ It is important for us to consider the perspective we ourselves are projecting on the dataset, as well as the context from which the data came. We must consider the affordances and limitations of our database sources, namely the ESTC, EEBO, and Women Writers Online. Their data collection processes will affect our data collection processes. As much as we would like to think that our data is ‘raw,’ even the metadata fields these catalogs include ‘cooks’ the data. There is hundreds of years of historical bias to consider in the case of Early Modern Care as well. The very nature of the dataset limits it. It will be a dataset that contains most of the known texts that have survived long enough to be cataloged by various projects. It will never accurately communicate the whole story of Early Modern caregiving, and the conclusions researchers may reach through using it will be reasonable assumptions at best. This is why I have found that working through the context and documenting our decision-making processes to be so important. As articulated b D’Ignazio and Klein: “This context allows us, as data scientists, to better understand any functional limitations of the data and any associated ethical obligations, as well as how the power and privilege that contributed to their making may be obscuring the truth.”⁷ Our aim is to bring caregiving practices to light, and to do that we need to prioritize our own limitations and biases, as well as those of the data we have to work with.

Personally, having the model of the Data-Sitter’s Club website was an immense help. The way the project was documented provided a guideline for transparent practices while also providing practical help and guidance for our future work on the blog that will accompany the database. One of the barriers I faced when beginning work on this dataset was a lack of technical knowledge that made the scraping process seem daunting, and the metadata selection process seem intimidating. I found DSC #3, “The Truth About Digital Humanities Collaborations (and Textual Variants!)” enlightening on this part. Maria’s journey to realizing her own importance in the DSC project, without as extensive of a technical background as her team, mirrors my own. In our partnership, I was the one without a significant DH background, and I felt that made me the less important collaborator. But as Maria explains in her post, “It’s important for digital humanities teams to foreground this “both-and” (“yes, and”?) approach, from forming research groups that meaningfully include both digital and disciplinary experts to making sure that each member knows their contributions are essential to the project.”⁸ What I did bring to the project was a conceptual background of Early Modern studies, and I came to understand that this expertise was important, even if it was not yet accompanied by great technical skill. Even though we are not yet at the stage of performing analysis on our corpus, I was able to consider what aspects of the dataset would be most helpful to researchers who would use it in the future and where it sits now in the available scholarship on the topic of Early Modern caregiving. Even with the necessary caveats outlined above, I still have reasonable confidence that our dataset is thorough, replicable, and helpful to audiences in Early Modern studies and beyond.

Julia Flanders and Trevor Muñoz, “An Introduction to Humanities Data Curation,” Digital Humanities Data Curation, https://archive.mith.umd.edu/dhcuration-guide/guide.dhcuration.org/glossary/intro/index.html) ↩
Katie Rawson and Trevor Muñoz, “Against Cleaning,” in Debates in the Digital Humanities 2019 (Minneapolis: University of Minnesota Press, 2019), https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/07154de9-4903-428e-9c61-7a92a6f22e51#ch23 ↩
Mary Fissell and Elaine Leong, “Reading Early Medicine (beta),” accessed May 4, 2022, https://reademed.mpiwg-berlin.mpg.de/. ↩
Rachel Mann, “Paid to Do by Not to Think: Reevaluating the Role of Graduate Student Collaborators, Debates in the Digital Humanities (Minneapolis: University of Minnesota Press, 2019), https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/ea501a60-dd3c-4c22-a942-3d890c3a1e72 ↩
Andrew Prescott and Lorna Hughes, “Why Do We Digitize?: The Case for Slow Digitization,” Archive Journal, (September 2018), http://www.archivejournal.net/essays/why-do-we-digitize-the-case-for-slow-digitization/). ↩
Catherine D’Ignazio and Lauren F. Klein, Data Feminism, (Cambridge: The MIT Press, 2020), 152-153. ↩
D’Ignazio and Klein, Data Feminism, 153. ↩
Maria Sachiko Cecire, “DSC #3: The Truth About Digital Humanities Collaboration,” The Data-Sitters Club. January 10, 2020, https://datasittersclub.github.io/site/dsc3.html. ↩

ENG 612 Final Project Abstract

2022-04-05T00:00:00+00:00

Team Members: Claire Richie and Kate Albrecht

Title of Dataset: Early Modern Care

Description of Dataset: Our data will contain bibliographic information of a carefully curated conceptual collection of entries from the English Short Title Catalog (ESTC) related to the act of caregiving. We will be using the date range of the ESTC (1472-1800) as the date range for our own dataset. We have begun collecting our data points by scraping the ESTC (ten entries at a time) using Zotero. Using a small corpus of women’s early print texts from the 1500s-1700s we have collected pivotal keywords the ESTC creators have attributed to this women’s writing. Then, we narrowed our keywords to a select list that contribute to our definition of Early Modern Care. For the purpose of this smaller, prototype dataset, we chosen 11 keywords that are scrapable in one setting (less than 200 datapoints each). Overall, we aim to scrape around 1000 texts under these 11 keywords for the purpose of this assignment. Choosing to scrape data already categorized by the ESTC makes our results more reproducible, as these categories are pre-defined by the catalog.

Why we have chosen these keywords as “Early Modern Care” is a little more nuanced. Cura Personalis, care of the whole person, body, mind, and spirit gives us boundaries to what we are including into our conception of care. Moving beyond traditional professions and actions of caretaking, we have chosen keywords that focus on the emotions, networks, and relationships of care. Ultimately, we would like to think about the household as a body and a body full of networks of care. In the early modern period, the body was intimately linked to its environment, its community, divinity, and its nourishment. By focusing on extended networks of caregiving and its practices, we can avoid imposing a modern, anachronistic, definition of care on these early modern texts. As we continue to refine this interlocking definition of care, we will create a comprehensive list of categories we will scrape the ESTC for.

Metadata Fields:

A. ESTC identification number: An identification number attached to each text when scraped from the ESTC.

B. Materiality of the text: The format of the text (most often “book”).

C. Author name: Name of the author or author(s) if applicable or known.

D. Full Title: The unabbreviated title of the original print text.

E. Publication Date: Date of publication (usually the year)

F. Page numbers: The number of pages in a complete copy of the specific edition

G. Gender of Author: Male, female, or unknown–gender of the author(s)

H. Keyword Search from ESTC Subject headings: The main keyword used to scrape the datapoint

I. Shortened Title: The ESTC’s abbreviated title

J. Printer: The printer of the edition

K. Publication Place: Place of publication, most often London

L. URL of ESTC: ESTC URL

M. Subject Keywords as defined by ESTC: Sum of all keywords attached to the datapoint by ESTC creators (will need to be cleaned and regularized by regular expressions at a future date)

Further metadata fields will be added for the number of extant copies and the repositories at which these extant copies are housed. This will most likely need to be done manually depending on whether we can determine a scraping method (Python or other script) that would do this for us.

Audience(s) for Dataset:

Early Modern scholars of book history
Scholars of caretaking or the medical humanities more broadly
Historians of the premodern periods
Interested people who don’t know a lot about early print texts and want a smaller, more manageable dataset to explore (who may be intimidated by the ESTC and its sometimes fiddly interface) as well as a popular audience who may be casually interested in this topic.
Educators and scholars of digital humanities who may want to work with a complete dataset
Kate and Claire

Questions for the Dataset

Roughly what percentage of surviving and cataloged texts from Early Modern England (1500-1800) were authored by women?
What were women writing about and/or allowed to print about?
How many editions of a particular text on caretaking survive and are cataloged?
How are keywords overlapping across texts? How can a certain text/edition/datapoint touch on different types of care?
How did different definitions of caregiving interact and overlap with each other?
What printers were publishing books on care?

Related Projects:

Reading Early Medicine

The Making and Knowing Project

Manuscript Cookbooks Survey

The Reading Early Medicine project is in some ways very similar to ours, especially in methodology. It also utilized the ESTC to create a database for texts on a certain topic in this case Early Modern medicine. We have modeled some of our methodology on this project, with some deviations. Our scraping approach is more standardized, as this project relied on keyword searches rather than utilizing the premade categories of the ESTC. This will hopefully make our project more reproducible. The content also differs. While there will be some overlap texts with Reading Early Medicine, we are not subscribing a to a strictly medical definition of care. Not all of Reading Early Medicine’s texts will appear in our dataset, and we provide an expanded definition of medical and caregiving practice that also includes household and spiritual texts among others.

The two other projects related to ours I have selected are projects involving manuscripts, and they interact with our project in two different ways. The Making and Knowing Project is a project out of Columbia University that seeks to understand the material and practical reality of the recipes in one particular manuscript. Their resource has documented the detailed process as their team works through the practical application of each recipe, providing teaching resources, illustrations, documentation of the process, and a digital edition of the manuscript in question. While this differs from our methods, focusing on one text rather than taking a wide view of these kinds of texts, this project is related to ours through the practical measures of understanding caregiving. Essentially the other side of our project’s coin, the Making and Knowing Project explores how texts of care might have actually been implemented on a personal level, while ours examines a larger culture surrounding networks of care.

The Manuscript Cookbooks Survey is an exciting project that attempts something similar to our dataset, except with manuscript recipe books. Their database contains the manuscript cookbooks produced prior to 1865 that are held at US repositories. It is searchable by institution, period, and keyword, allowing scholars to find exactly where these resources are held and understand the nature of these collections. This project has similar aims to ours in order to provide knowledge of different texts and where they are held, so scholars can utilize these resources and understand a more complex culture of manuscript recipe writing. Our project differs as we have cast a wider next with genre and are primarily focusing on print texts for the time being, but this database is also intimately connected with the culture of care we hope to capture with our dataset.

ENG 612 Lab Report 6

2022-03-08T00:00:00+00:00

Dataset Analysis

Link to The Reading Early Medicine Description

I selected a dataset from the Reading Early Medicine Project to analyze for this lab. The Reading Early Medicine project has gathered records from the English Short Title Catalogue (ESTC) of texts that have to do with health, healing, and medicine printed before 1700. The project has made the records available in a searchable database, as well as available for download in various CSV files. I have selected the ‘Remedies, Multiple’ dataset to use as my primary dataset for this lab since it contains the largest number of texts, but the datasets for other genres and topics are formatted in the same way, so my analysis can apply to the project as a whole.

One of the strengths of this dataset is simply that it exists. That all of these texts are now located in one central database, that otherwise would take hours of searching by different keywords and authors, is incredibly helpful for researchers in a variety of disciplines. The thorough nature of the search and dataset building process is one of its great strengths. The project’s website describes their search process as including a variety of keywords that have been truncated to allow for multiple spellings and versions, such as “medic*.” Some of these keywords might be intuitive even for a non-scholarly researcher, but many of them require period-specific and genre-specific knowledge. The inclusion of works by classical authors like Hippocrates and Galen that had been reprinted in the early modern period was also helpful, as ancient works reprinted may not share many of the keywords that were part of the original search terms. The project directors were incredibly through as they built the database. I am reminded of The Data Sitter’s Club article that discussed that members of DH projects with background knowledge are just as important to the success of projects as those with technical skills.¹ The knowledge of early printed texts and the language of medicine is crucial to building a resource like this one. The website lists a separate IT team that built the infrastructure of the site and database, while the topic-specific work has been left to those with more topical knowledge.

The structure of the multiple datasets available for download enables the user to specify the specific genre or topic that might be relevant to them. One minor oversight is that the entire corpus of texts is not available for download, only the datasets organized by particular genres or topic are available. It is difficult to get a sense of the entire corpus this way, as it would take downloading the CSV for each genre to have access to the entire corpus of texts. Apart from this technical oversight, there is a lot of value in the genre/topic model under which the dataset operates. Both genre and topic categories are defined on the website for any researchers who may not be familiar with genres from the early modern period and to clarify the topic labels imposed by the project directors. Genres were fairly defined terms within the early modern period, but one genre might cover multiple topics. Researchers who may seek to get a sense of the genre traditions in early medical texts can fairly easily seek out certain genres and cross reference the topics that appear in them. The datasets are presented in a way that are easily used in conjunction with visualization tools like Voyant that would, in a preliminary search, give a broad view of a certain genre or topic. This could point researchers to specific texts for close readings and deeper research on their own using Early English Books Online (EEBO) or in-person archival work. The dataset I am focusing on is the topic of “remedies, multiple” and this topic appears alongside other topics in books categorized by genre. For instance, in the text The Plagues Approved Physitian is listed under the “plague tract” genre, but it contains the topics of “remedies, multiple,” “disease, single,” and “plague.” Listings like this one could help answer a research questions about how many plague tracts contained remedies for the plague versus solely information about the plague.

The database is also searchable by printer and stationer street names, which gives researchers even more opportunity to track down specific data relevant to their topic of interest. However, while the database contains this field, the CSVs do not. Including this information on the downloadable datasets would increase what researchers can accomplish with the datasets they are given. It is possible to browse the location labels and printer names on the website itself, but having that data in hand would allow researchers to utilize that information for visualizations.

While most of the data has an excellent editorial rationale on the website, there are a few elements of the dataset that I have questions about regarding the decision-making process of inclusion and exclusion. The date range is the most pressing of those in my mind. Mary Fissell’s original date range extended from 1640-1800, the date range covered in her monograph. However, when the concept shifted to the database, Leong and Fissell shifted the date range to encompass ‘early medicine’ and the range was shifted to 1480-1700. The early modern period has long had fuzzy boundaries that occasionally bleed into the period of the restoration and the long eighteenth century. The tradition of early medicine extends into eighteenth century texts, so I would appreciate more documentation about this decision. More texts could easily be added, as the ESTC’s range extends into 1800.

I am excited to see the next stages of this project, as the website lists that the next description, author’s occupation, is currently under construction. Integrating more information about the authors into the database would be immensely helpful when considering questions of authority in early medical writings. This is particularly relevant for books authored by women, for which there is already a separate CSV available. The author occupation category can guide researchers towards further close readings of paratexts to determine how authors of early medical texts figured their authority in relationship to their occupation. Syllabi incorporating the REM project are also ‘coming soon’ and there are already multiple suggested teaching activities that would integrate the database into the university classroom. While the existence of this information combined in a searchable database in itself is a immensely helpful resource, the downloadable CSVs, pedagogical materials, and fairly extensive rationale allows makes the project useful across multiple contexts.

DSC #3: The Truth About Digital Humanities Collaborations (and Textual Variants!) https://datasittersclub.github.io/site/dsc3.html ↩

ENG 612 Lab Report 5

2022-02-27T00:00:00+00:00

Answers to Lab 5 Questions

Question 1: This function has split the individual words in the string and separated them by commas, but it did the same thing to the “s” that was part of the contraction “let’s.” “\W” matches any character which is not a word character, which includes the apostrophe in contractions. “+” matches repetitions of the regular expression that proceeds it. So, as well as splitting the words separated by commas and spaces in the string, the regular expression also splits the contraction and treats it as two words.

Question 2: I used the chorus of Bruce Springsteen’s “Dancing in the Dark” as my text:

You can’t start a fire You can’t start a fire without a spark. This gun’s for hire Even if we’re just dancin’ in the dark.

And the function returned the following: [‘you’, ‘can’, ‘t’, ‘start’, ‘a’, ‘fire’, ‘you’, ‘can’, ‘t’, ‘start’, ‘a’, ‘fire’, ‘without’, ‘a’, ‘spark’, ‘this’, ‘gun’, ‘s’, ‘for’, ‘hire’, ‘even’, ‘if’, ‘we’, ‘re’, ‘just’, ‘dancin’, ‘in’, ‘the’, ‘dark’, ‘’] I intentionally chose a string with a lot of non-word characters to see how it would break down, and it pretty much did what I expected. I now want to use my regular expression knowledge to write a function that would effectively preserve the syntax of the lyrics!

Question 3: The dataframe shows some pretty clear patterns that are also reflected in the graph of the entire dataset we looked at earlier. All but three of the text that use “virtu*” date from before 1650, and the results show that pretty overwhelmingly “virtu” and “vertu” are not used together, aside from the Erasmus text. Based on what we discussed in class, it makes sense that that text might contain Latin.

Question 4: The carat in the regular expression package matches the start of a string. So, it looks like this regular expression is looking for dates that begin with “20” which would match years of publication in the 2000’s. It looks like that if the first date value lists a date from the 2000’s (too modern to be on EEBO) this expression is telling the computer to include the second date value as well, which is hopefully the correct one.

A Reflection on Working with Python

I admittedly came into this lab a little deflated from the bursting of my aspirational-Python-bubble after reading DSC #12: The DSC and the New Programming Language. I had realized I was not going to be able to Duolingo my way to Python literacy with 15 minutes of practice each day, even with a digital humanist’s version of a helpful green owl (i.e. a textbook.) I wanted to learn Python in case I eventually had a research question about a dataset that I could use Python to answer, but the DSC made it abundantly clear that learning bits of code to just have in my back pocket is not terribly feasible, especially given the time constraints of academic work. Just looking at the function and all of the tiny, specific, details of it made me feel intimidated, not because I did not think I could do it, but because I realized just how much time and brainpower it would take if I wanted to get to a level of comprehension of it.

That being said, I did get through it. It took reading it through a couple of times to understand what exactly each command was doing, and there was plenty of syntax I had to put in the “Google later” folder in my mind, but I felt confident that I basically understood how the function got from Point A to Point B in the most rudimentary fashion. Benjamin Schmidt’s “Do Humanists Need to Understand Algorithms?” was helpful for perspective on this. While I did not fully understand the nuances of the code that was being run, I understood the transformations it was causing as it processed the data. Schmidt’s argument that “a transformation expresses a coherent goal that can be understood independently of the algorithm that produces it” makes a lot of sense in this context.¹ The chart that was produced at the end of the lab makes sense without knowing exactly how we got there.

In theory, I know that as a humanities scholar I could probably ‘get away with’ not knowing every piece of Python known to man and machine. And that is no longer my goal. But I do want to learn enough that I am familiar enough with the process that I can look at a problem and visualize how to get to a solution via coding. Like I mentioned in my first post for this class, I want to have enough DH literacy that I can understand what went on behind my favorite projects. I want to understand what Python can do enough that I can look at my own projects and have an idea how to answer my research questions via Python. Schmidt’s argument that no, humanists do not need to understand algorithms, makes sense in the context of this lab. However, I believe that humanists cannot go wrong by understanding the construction of codes, functions, and programs that produce the transformations we desire. So, I still want to learn Python. I just need a research question, a dataset, and some free time.

Lab 6 Dataset

The dataset I plan to work with for Lab 6 is from the Reading Early Medicine project. This project searched the English Short Title Catalogue (ESTC) for texts about health and healing and coded each title with information about title, genre, and information about the author. Their data is available for download as CSV files. I chose the category, “Remedies, multiple” because it was the largest dataset, containing over seven hundred entries. The project actually encourages others to download and work with their data and even includes links to data visualization tools like Voyant so users can create visualizations of their own with the REM data.

Schmidt, Benjamin. “Do Humanists Need to Understand Algorithms?” Debates in the Digital Humanities 2016, https://dhdebates.gc.cuny.edu/read/untitled/section/557c453b-4abb-48ce-8c38-a77e24d3f0bd#ch48 ↩

ENG 612 Lab Report 4

2022-02-20T00:00:00+00:00

Answers to Lab 4 Questions

Question 1: Including stop words in the word cloud ultimately clogs up the visualization with words common across most writing in English. Words that aren’t necessarily “important” or that have particular specific meaning replace other unique words. Some of the most common words that weren’t considered stop words still appear, but are much smaller. We get less of a sense of the actual content of the articles.

Question 2: Raw frequency doesn’t take into account length of articles where certain words may be repeated. The rate of the words being used is more telling than just noting that the word students being used more in a 2000 word article than a 200 word article. Relative frequency displays, not just the total number of occurrences, but where words are being used consistently in one context. We can count clusters of words rather than terms distributed through documents. It can show us where words are important, not just their appearance across an entire corpus.

Question 3: The document is an article of complaint, an open letter about cuts to an honors program at the Morrissey College of Arts and Sciences, abbreviated MCAS. All of the appearances of “mcas” are located in this singular document. Most of them accompany the names of future graduates of the college in a list format that contribute to the petition of the open letter. The context of number of appearances of “mcas” as a term tells us that in the context of the questions we may want to answer about how the humanities are written about in news articles, this term is not as significant as the statistics may tell us. While these metrics might not always show us answers to our questions, they can help us eliminate that is not relevant in context and lead us to new research questions. (For instance, Rachel suggested in class that this might lead us to investigate the genre of complain letters in the humanities.)

Question 4: Some of the terms that are more associated with the sciences than the humanities are perhaps obvious. “Engineering” is more common in the science corpus, as is “technology.” These are terms that are more commonly associated with the scientific discipline. “New” is also more common to the science corpus, which seems to reflect the idea that the sciences are perhaps more innovative than the humanities. (This idea should perhaps come under a bit more scrutiny. The humanities corpus reflects some discipline specific vocabulary as well, “history,” “arts,” and “English” are all terms that are more related to the humanities content. The terms also seem to reflect a more institution-focused theme in the humanities content. “Majors,” “class,” “academic,” and “school,” are all more common across the humanities corpus. Interestingly, or perhaps infuriatingly, terms like “study” and “research” are more common across the science corpus. This would seem to suggest that institutional concerns are more common in the humanities articles (perhaps related to the trend of funding cuts from the previous question) while issues of actual work, research, and study in the sciences are at the forefront of that corpus.

Question 5: I played around with a couple of different tools including the Bubbles, TextualArc, and finally WordTree. I found Bubbles and TextualArc, both intended to visualize keywords and their relationships to the text, to be engaging as they were both animated tools. I was put off as following their visualizations was not necessarily intuitive to me. The WordTree tool was very clear and created a simple diagram that showed a keyword and its associations to other words. It is essentially a breakdown of the Contexts tool, but it shows multiple sentence structures. What I had hoped to see was the most common words associated with the keyword (I chose “humanities”) and the ways sentences were structured around that word. However, the Help page for the WordTree tool does proclaim a disclaimer that “the branches shown are not necessarily based on frequency.” The available information is interesting, and could potentially be fruitful in a more linguistic analysis of the corpus. However, the lack of specificity in how the WordTree terms are being selected might not give a full picture of how the terms are being used.

A Reflection on Exploratory Data Analysis

One of the things I have been enjoying about the Data-Sitters Club is their exploratory approach to their project. It’s the approach I try to take with most of my work, both in DH and literary studies, but I did not really have a label for it until now. In “DSC #6: Voyant’s Big Day,” the question that Katherine Bowers asks is “what can Voyant tell us about the BSC slang?” This question really lets the tool and the data do the talking. Of course, this comes with challenges because when you let the tool do the talking, the tool can create its own biases. However, rather than hinging her research question on her own memory or reading of the slang in the Baby-Sitters Club books, Bowers allows the results of this analysis to lead her to each next question. Taking a similar approach with Voyant for this lab allowed me to both get to know the tool better and ask more specific questions of the data based on the tools we used for each tab. I keep coming back to the trial and error theme. For each new tool and task, I have tried and failed multiple times before getting to a satisfying end goal. I have been required to reframe my goals and ask new questions for each step of the lab. Not taking results at face value and continuing to question and try again is really essential to the exploratory work we have been doing.

As mentioned above, letting the tools lead the way can be extraordinarily beneficial for an exploratory approach, but it also feels dangerous close to “the numbers speak for themselves,” which Chapter 6 of Data Feminism demonstrated is not at all true. ¹ Oftentimes, the numbers actually say more about their context than the realities are meant to represent. I learned a lesson about making assumptions in data analysis in my exploration of the WordTree tool. Before I read a more detailed description of the tool, I made the assumption that the visualization would naturally include the most common associated words in the tree for each keyword. This was not the case, and the algorithm or system by which the associated words were decided is not given. I made an assumption that could have seriously threatened my analysis had I been doing significant work with this particular tool. While not as high stakes as the examples discussed by D’Ignazio and Klein in their books, this was a clear example of both the need to gather context and how the presentation of the data skews how it is read.

\6. The Numbers Don’t Speak for Themselves. (2020). In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/czq9dfs5 ↩

ENG 612 Lab Report 3

2022-02-14T00:00:00+00:00

Data Cleaning: Before and After

The dataset we tackled for this lab was, on the surface, relatively uniform. Unfortunately, the more I looked at it, the more irregularities I noticed. I was hoping to construct a regular expression that would tackle most of my needs in one go, but I quickly realized that was not at all feasible. Below I have tracked the RegEx expressions and substitutions that I used to reach the clean dataset at the end. On the whole, this took me over two hours with many trials and errors.

Beginning Dataset:

The Epoch Times, New York ed.; New York (NY)

“La Voz Bilingüe”; Denver, Colo. Jewish Advocate; Boston

Washington Informer; Washington, [D.C.]

News from Indian Country; Hayward, WI.?

Afro - American, 5 Star edition; Baltimore, Md.

Diverse Issues in Higher Education; Fairfax Virginia

The Gay & Lesbian Review Worldwide; Boston, MA

“The Hispanic Outlook in Higher Education; [Paramus N.J

Regex: “

Result: Removed all quotation marks (as shown in lab instructions)

Regex: [(),\?.[]]

Result: Removed all special characters in the expression from the dataset

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver Colo Jewish Advocate; Boston

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore Md

Diverse Issues in Higher Education; Fairfax Virginia

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: (Denver [A-Za-z]{2,8})

Substitution: Denver CO

Result: Substituted all words after “Denver” between 2-8 letters (CO-Colorado) with “Denver CO”

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver CO

Jewish Advocate; Boston

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore Md

Diverse Issues in Higher Education; Fairfax Virginia

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: (Baltimore [A-Za-z]{2,8})

Substitution: Baltimore MD

Result: Substituted all words after “Baltimore” between 2-8 letters (Md-Maryland) with “Baltimore MD”

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver CO

Jewish Advocate; Boston

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore MD

Diverse Issues in Higher Education; Fairfax Virginia

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: (Fairfax [A-Za-z]{2,8})

Substitution: Fairfax VA

Result: Substituted all words after “Fairfax” between 2-8 letters (VA-Virginia) with “Fairfax VA”

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver CO

Jewish Advocate; Boston

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore MD

Diverse Issues in Higher Education; Fairfax VA

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: (Boston\n)

Substitution: Boston MA\n

Result: Substituted all instances of “Boston” followed by a new line with “Boston MA” followed by a new line.

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver CO

Jewish Advocate; Boston MA

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore MD

Diverse Issues in Higher Education; Fairfax VA

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: (&)

Substitution: &

Result: HTML coding for ampersand replaced with ampersand character

The Epoch Times New York ed; New York NY

La Voz Bilingüe; Denver CO

Jewish Advocate; Boston MA

Washington Informer; Washington DC

News from Indian Country; Hayward WI

Afro - American 5 Star edition; Baltimore MD

Diverse Issues in Higher Education; Fairfax VA

The Gay & Lesbian Review Worldwide; Boston MA

The Hispanic Outlook in Higher Education; Paramus NJ

Regex: ;

Substitution: ,

Result: Semicolons replaced with commas

The Epoch Times New York ed, New York NY

La Voz Bilingüe, Denver CO

Jewish Advocate, Boston MA

Washington Informer, Washington DC

News from Indian Country, Hayward WI

Afro - American 5 Star edition, Baltimore MD

Diverse Issues in Higher Education, Fairfax VA

The Gay & Lesbian Review Worldwide, Boston MA

The Hispanic Outlook in Higher Education, Paramus NJ

A Reflection on Regular Expressions

The term “cleaning” was on my mind as I worked on this dataset. As Katie Rawson and Trevor Muñoz effectively argue against the term in “Against Cleaning,” one particular statement in the article really captured my previous attitudes towards digital humanities: “When humanities scholars recoil at data-driven research, they are often responding to the reductiveness inherent in this form of scholarship” ¹ I admit that when I first started in DH I was highly skeptical of the value of some of these practices. I thought the entirety of the field was in the vein of what Rawson and Muñoz caution against in their article. I was skeptical of the notion of handing over intricate humanities work over to unnuanced computers. That data could possibly even be ‘dirty’ did not sit well with me. Is not every single scrap of information we can gain from humanities data valuable to someone, in some context? This experience “cleaning” data emphasized that while it is painstaking process, it is possible to maintain some of the humanity while working with data. Making sure to capture the nuance and to not write expressions that jeopardized information that might be useful to researchers seems to be possible. It took a lot of time and a lot of familiarity with the dataset to do so, but that extra effort is indeed worth it. Of course, even with our best efforts this is no guarantee. There were some unique aspects of the data that gave me pause (that wayward question mark that seemed to call into doubt the place of publication of Hayward, Wisconsin comes to mind), but ultimately I felt confident that though it took time, I was able to make the data processable but maintain its integrity. I actually credit some of my archival experience and learning to value original order in having a sort of instinct to not over clean, so to speak.

The transparency of being able to work through the cleaning process also kept me accountable in a sense. I wanted to make sure I could explain my choices to anyone who asked and that my decisions were well thought out. I thought about implementing regular expressions that could perhaps be used on a larger dataset in a similar vein. The expressions I wrote to normalize the city and state names could potentially be used in a larger dataset, taking into account various forms that information could take. While a larger dataset will likely contain even more variations that I could imagine in my wildest dreams, I aimed to think of a model that could be used by those that work with the same data in the future. I thought of the Data Sitter’s Club as my model. Besides being an exciting (and delightful) exercise in nostalgia for me, that project is exemplary of an ideal collaborative, transparent project in the humanities. Down to sharing their email correspondence in “DSC #2: Katia and the Phantom Corpus” their transparency in their decision-making is really admirable. Though the reach of this particular post may not beyond ENG 612 at the University of Miami, documenting each step and each decision can help open up honest and fruitful conversations as we all learn these skills together to take into a larger DH sphere.

Katie Rawson and Trevor Muñoz, “Against Cleaning,” from Debates in the Digital Humanities 2019 (2019) ↩

ENG 612 Lab Report 2

2022-02-07T00:00:00+00:00

Slave Trade Database Dataset

Post45 HathiTrust Dataset

Working with the HathiTrust dataset was initially overwhelming; the metadata categories were unfamiliar and not all of them were intuitive. Even after reading through the metadata descriptions, deciding what questions I could ask of the data and how effectively those questions could be answered took a lot of time. Like building the website last week, getting to the visualizations I ended up with took a lot of trial and error. Ultimately, I landed on two questions

How many texts in this dataset were published in the United Kingdom and Republic of Ireland?
How many unique authors were published in each year covered by the dataset?

Initially, I tried to use the metadata category “imprint” and filter it by country and city to make this chart, which was hugely ineffective. Using the “place” field was much more effective. It took some time to track down the MARC country codes, but I was finally able to make what seems to be a fairly comprehensive chart. This visualization was the most fiddly, as after I figured out how to implement COUNTUNIQUE for my other pivot table, I was able to fairly quickly land on a visualization that would effectively display the data. I imagine this process would be at least a bit easier with data that I was more familiar with or had collected myself. This exercise really highlighted the necessity of intentionality of data collection for me. Being attuned to the questions that the data might be used to answer will help the longevity of the project.

Turning to the Trans-Atlantic Slave Trade Database data, the pivot table that troubled me most in light of our discussions and readings was the one charting African resistance. The stark category of “African resistance” marking whether or not a slave ship’s journey was disrupted by the captives or their communities removes the nuance of other kinds of resistance undertaken by enslaved people. As Chapter 4 of Data Feminism reminds us, “What Gets Counted, Counts.” In large scale records like those tracking slave voyages, individual acts of resistance are not likely to be noted in records. Larger-scale acts that significantly impacted the voyage itself is what gets counted, eliding instances of individual resistance from the historic record. Because the emphasis of the original record keepers was on profit margins and tracking the “cargo” of human beings, it was the events that damaged the bottom line that were counted. This is where some of the issues that Johnson tackles in “Markup Bodies” come into play. Reading these numbers in the tradition cliometrics might suggest that there were relatively few acts of African resistance during the slave trade (a mere 576 recorded instances among thousands of recorded voyages.) Johnson says of the cliometrician’s work: “Statistics on their own, enticing in their seeming neutrality, failed to address or unpack black life hidden behind the archetypes, caricatures, and nameless numbered registers of human property slave owners had left behind”¹. This visualization suffers in similar ways. Context is fully needed to understand the scope of different kinds of resistance available to captives, and not reduce the potential for individual actions to the recorded possibilities. Perhaps a change in the wording of the metadata category can partially accomplish this.

The aesthetics of these data visualization also troubled me. When I made the pivot table and the pie chart visualization, the chart was first generated in bright colors. I adjusted the colors to seem less cheerful, but the problem still seems to remain of how to effectively visualize data like this. Bright colors may be more engaging, but they may have the effect of minimizing the affective import of this data. Of course, minimizing human lives to data points is problematic in itself, and reproducing acts of agency by enslaved people as a pie chart is likely not the most effective way to communicate the importance of these actions. Of course, we are working in Google Sheets for now, and there are other options to visualize this data. Data visceralization may be an alternative to the route of straightforward charts and graphs, but retraumatizing participants in such an experience is also a risk. The Trans-Atlantic Slave Trade Database itself has done an excellent job of adding dimension to their data and adding context that does not reduce Black life to data points, but there are still many conversations to be had once that data is imported out of the context of the original site.

Works Consulted

D’Ignazio, Catherine and Lauren F. Klein. Data Feminism. MIT Press: 2020.

Johnson, Jessica Marie. “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” Social Text 137 36, no.4 (December 2018): 57-79.

Jessica Marie Johnson, “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads,” Social Text 137 36, no.4 (December 2018): 61. ↩

ENG 612 Lab Report 1

2022-01-31T00:00:00+00:00

Reflection on Lab 1

I found this process to be relatively straightforward, albeit frustrating at times where I could not do exactly what I wanted on my site. I’ve always found the process of creating a website somewhat intimidating, but the GitHub interface and the explicit instructions made the process very approachable. I think the most frustration I encountered was in the process of forking the theme that I wanted. Ultimately, my first choice, second choice, and third choice themes simply were not working for me, and I had to choose something that I was not particularly excited about. Not having the technical skills to troubleshoot the issues with the theme or even adjust things I don’t like on the one I have gone with is frustrating, but ultimately I hope to gain those skills in time to make my website more personalized. I have been able to adjust some things; I’ve added my social media links to the footer, changed the color scheme, and adjusted the post settings to show excerpts on the main page (with limited success, as all it shows at the moment is the header.) As I continue to play around with settings more, I am sure I will pick up on how to adjust other aspects of the site as well.

After going through the steps to set up my site, I did encounter some issues as I attempted my first post. I worked on this outside of class after forking a new theme and some of the tiny details that are essential to successfully posting escaped me while I was on my own. Not only did I forget the .md at the end of my post name, but I also miss entered the date. At first the post simply did not show up, and then I got a very aggressive looking error message. Luckily, the error messages on GitHub are quite detailed and let you know exactly where in the process the post went wrong. I was able to fix the date on the post and have it successfully show up on my actual site. The final step in this process, successfully writing in Markdown has been an interesting experiment. It has been quite a bit of trial and error, adjusting formatting to see what the effect will be. (An earlier version of this post had the text in a strange scrolling box, and I still don’t quite know how that came about.)

The process of downloading GitHub Desktop and Atom was surprisingly easy, and I actually found this portion the most exciting. I feel like a proper coder with my own program with which I can edit my Markdown files. (A software engineer friend of mine jokes that for every programming language you learn, you should acquire another monitor. I’ll stick with Atom for now.) I like the color scheme on Atom, and the prompts for formatting will definitely help ease me in as I learn the language of Markdown. I feel like the process of using this site as well as the rest of the tools we learn this semester will definitely be a lot of trial and error as this process has been, but I hope I can hang on to the feeling of excitement from seeing the changes I make on the back end of my site show up on the front end!

My Goals for This Semester

I have a few goals for this course, but the primary one is to become comfortable and literate in the tech we are working with. I want to be able to transfer the skills I gain in this course to other projects, and to do so I need to understand how to use these tools outside of the context of the classroom. I really want to get to know these skills so I can properly have them in my arsenal going forward, rather than only understand how to work with them in a controlled environment. I also want to be able to use this knowledge, of coding and data, to better understand other projects that I use. There has been a surge of DH projects in the field of Early Modern studies, and having this background knowledge will help me to understand the work that has already been done. I would like to be able to look at project like RECIRC and understand some of the digital methodology behind it.

Apart from a bit of digital literacy, I would like to make some serious headway on the DH project on which I am currently collaborating with Kate Albrecht and Katie Sanford. We have a pretty decent Wordpress site with integrated CSVs currently up, but I would like to take that further both aesthetically and practically. Our dataset, early modern print and manuscript works by women, is extensive and still incomplete. I want to continue adding to that dataset and start seriously thinking about everything our data can do. I want to start thinking about visualizations, processing, and otherwise putting our data to work for us. I think this course can really help us answer not only what our data can do for us, but what we can do for our data. One of my primary motivations in working on this project was increasing access to information like this. Incredible resources like Early English Books Online (EEBO) and digitized manuscript collections at institutions like the Folger Shakespeare Library has already increased our knowledge about early modern women’s writing, but there are not as many tools that actually make accessible the raw information all in one place. Something I have been thinking about since the Digital Humanities Research Institute has been audience. We have envisioned a primarily scholarly audience with a secondary focus on the general public, but we have not really talked through the differing needs of those audiences. To what extent will each of those audiences be utilizing our data and for what purpose? How can we best set up our data to meet those needs? Providing access to this information is certainly a priority, but the organization of the data is equally important for the varying audiences we have.

Finally, pedagogy is something I have been thinking more about lately, being surrounded by the really passionate educators in our PhD program. My personal style has a tendency to be stuck in the past, but I have been pressed to update my technology usage by my undergraduate students, even if that just looks like using more digital tools in the classroom over the whiteboard. Not only would I eventually like to be able to impart some basic skills, or at least the option to work with digital humanities tools on my students, but I am considering how students might get involved in DH projects. Using resources like The Pulter Project and EMROC as examples, I want to consider ways that undergraduate and graduate students might contribute to our project. Being active participants in a project that is making real headway in the field might just help students engage with older material as much as the modern.