"Black GPT": Interview with Harriett Jernigan and Adam Banks
The Program in Writing and Rhetoric’s Dr. Harriett Jernigan and Faculty Director, Professor Adam Banks received a grant from the Stanford Accelerator for Learning (SAL) to build an African American Englishes dataset and large language model (LLM) to study whether ethical, community-affirming practices could build better generative artificial intelligence (GenAI) which serves the communities that contribute their language expertise and life experiences.
With an all-Black team of computer science students, linguists, and rhetoricians, Harriett and Adam's project endeavors to develop a model that respects the cultural nuances of Black Englishes and prioritizes educational and research purposes over monetization. This work aspires to contribute novel insights to computer science, encourage Black students to pursue careers in AI, and explore new avenues in cultural rhetorics. As students test the model, they will study the impacts of GenAI on education, representation, and social justice.
Interview with Harriett Jernigan and Adam Banks
Ruth Starkman: What are you building and what inspired the project?
Harriett Jernigan: “College Writing with the Black Englishes Corpus for Generative Models” (aka BlackGPT) is a response to Adam Banks’ anticipation of an era of generative models and how they might affect student learning for Black English speakers. Banks calls for “bold, creative, innovative use of technologies” while also fashioning an “African American Rhetoric 2.0” which “must build a strong focus on studying and changing the relationships that endure between race, ethnicity, culture, rhetoric, and technologized spaces.”
The outputs generated by LLMs like ChatGPT are homogenous and often biased, if not just outright discriminatory, against marginalized and underrepresented folk. Most current LLMs lack cross-cultural competence, particularly when addressing matters or topics that center Black people. This has to do not only with the data on which those LLMs run, but more fundamentally who is building those LLMs.
Our BlackGPT takes a first step in remedying this problem. Our team includes Black engineers, data collectors and participants. We decided very early on that it is important for us to operate under the principles of data sovereignty, which means every participant has the right to control their data, to request it not be used in the LLM, and to have a direct say in how their data is used, rather than it being scraped/stolen under cover of night, so to speak, from the internet.
We also have no intention of monetizing our LLM, which is another distinguishing feature and speaks to disciplines such as cultural rhetorics and Black feminism. The LLM is for purely research purposes at this point. We are interested in knowing first what a LLM designed and built with Black language and from the insights and perspectives provided by Black people: what that team will look like, what kind of output it will generate, and how that will differ from conventional LLMs currently on the market. Right now, our engineering team is creating the embeddings and getting ready to fine-tune the model. I will start testing the LLM with students in the Fall.
RS: What is the potential impact of your work?
HJ: I think we could have a range of impacts on a number of things:
- There’s the technical aspect of contributing something novel to the field of computer science and generative AI.
- There’s the educational aspect, of encouraging more Black students–and students of color in general–to contemplate careers in the field of computer science more broadly and more specifically in generative AI.
- I think this can also open up some new avenues in the field of cultural rhetorics.
Adam Banks: The study of rhetoric–even at the undergraduate level–has to take up questions of how we communicate with and against digital tools and the programs and algorithms from which they’re built. One reason we’re taking a rhetorical approach here is that while AI tools are aiming to get closer to NLP, we’re a long way from being able to engage questions of the purposes behind the language that people use–the language that is being used to build LLMs. Beginning to take up rhetorical considerations like audience, situation and purpose might eventually help us build tools that reduce some of the harms that accompany so many AI tools. And studying rhetorical practices can also help students and writers work more in concert with AI tools, more fully cognizant of both their possibilities and their limitations. There are educational possibilities. As we discover more ways to incorporate LLMs into our classroom settings, we will see how something like BlackGPT can be used to complement creativity, scholarship, research, and communication. One of the things I think is most important is the centering of Black Englishes, Black rhetorics, Black rhetorical practices and culture. I think this is the kind of representation our students need, our society needs. This is the kind of work that can do things like instill a sense of pride among Black people who have been told most of their lives that their language is not on par with the dominant culture’s standard dialects. But it could one day go even further, doing things such as helping right wrongs in the justice system. The problems of court reporters mis-transcribing African American Vernacular English comes to mind for me.
RS: Where is the project going next?
HJ: There are many different varieties of Black English. I have recently started thinking that the next step of such a project would involve adding more regional dialects from different communities and increasing the parameters (words) to the program so that we can get even more robust responses. That would involve much further outreach to a wider variety of Black folk nationwide. Addressing the regional variety in dialects will include more collaborations with linguists.
What I enjoy daydreaming about is creating a program in which we can invite young Black CS students to work at a summer institute dedicated solely to building and expanding upon this first baby version of BlackGPT. I’d love to see a group of Black engineers and linguists working in a community, learning from each other, as they build and improve a program that could help so many other people achieve a variety of tasks, transmit knowledge to their family members, colleagues, and friends. I’d love to see a BlackGPT that one day will be devoted to attending to and resolving social justice issues.
Of our three Stanford computer scientists involved in the project, Christian Davis, Kenaj Washington, and Elliot Rodgers, Christian is graduating in June and has the idea that we could build out our dataset as a non-profit, data-secure mental health app for Black people. There are some funders from the Black community interested. So, we’ll cautiously consider this part of the project as well.