"My interest in 'data as medium' is partly motivated by the horror of that, the misery of the being watched, being gathered, with no real opportunity to revoke consent. And also, in truth, by the closeness of that — the touching of me against the whole world." [...] "I want queerness not as a condition under which we are labeled and are made to suffer, or even as a condition under which we are labeled and find individual joy, but as one (of many) gifts for the world."
Priya22/project-dialogism-novel-corpus: The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.
(via data is plural): "every quotation from 22 novels, plus who speaks each line, who they’re addressing, the characters they mention, and more. With 35,000+ quotations, the corpus 'is by an order of magnitude the largest dataset of annotated quotations for literary texts in English.'"
"a unique catalog of oral traditions spanning approximately 1,000 societies"
AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability - Waxy.org
andy does a great job of summarizing the economic and ethical aspects of this issue, and pointing at useful related resources
open source version. I've done this by hand a thousand times haha
interesting product and workflow
wiktionary word frequency lists
"...my ethics of care says that we should be working for a radical data science: a data science that is not controlling, eliminationist, assimilatory. A data science premised on enabling autonomous control of data, on enabling plural ways of being. A data science that preserves context and does not punish those who do not participate in the system."
"a visually forensic tool to detect text that was automatically generated from large language models"
dan jurafsky intro lecture