Public Domain Image Archive
a "hand-picked collection of 10,046 out-of-copyright works, free for all to browse, download, and reuse" from the makers of the Public Domain Review
a "hand-picked collection of 10,046 out-of-copyright works, free for all to browse, download, and reuse" from the makers of the Public Domain Review
"It is in the financial interest of streaming services to discourage a critical audio culture among users, to continue eroding connections between artists and listeners, so as to more easily slip discounted stock music through the cracks, improving their profit margins in the process"
good thread on the use of "public" social media posts in ML datasets
these are kinda fun
"We present data in interesting, exciting, and revolutionary new ways, stopping people in their tracks and making them want to learn more"
practical, straightforward advice
"Despite its name, Red (the album) does not have the most Red (the color) references. That honor goes to Midnights." via data is plural
"Ethics rule of thumb: If you start to sound like The Zuck, stop."
"My interest in 'data as medium' is partly motivated by the horror of that, the misery of the being watched, being gathered, with no real opportunity to revoke consent. And also, in truth, by the closeness of that — the touching of me against the whole world." [...] "I want queerness not as a condition under which we are labeled and are made to suffer, or even as a condition under which we are labeled and find individual joy, but as one (of many) gifts for the world."
(via data is plural): "every quotation from 22 novels, plus who speaks each line, who they’re addressing, the characters they mention, and more. With 35,000+ quotations, the corpus 'is by an order of magnitude the largest dataset of annotated quotations for literary texts in English.'"
"a unique catalog of oral traditions spanning approximately 1,000 societies"
andy does a great job of summarizing the economic and ethical aspects of this issue, and pointing at useful related resources
open source version. I've done this by hand a thousand times haha
interesting product and workflow
wiktionary word frequency lists
"...my ethics of care says that we should be working for a radical data science: a data science that is not controlling, eliminationist, assimilatory. A data science premised on enabling autonomous control of data, on enabling plural ways of being. A data science that preserves context and does not punish those who do not participate in the system."
"a visually forensic tool to detect text that was automatically generated from large language models"
dan jurafsky intro lecture