this show is... actually kind of amazing
Daniel Temkin on Code Poetry
"a game about going to the museum to get away for a bit and to think about feelings you like to feel"
intuitive and fun blackout poetry interface, using snippets from project gutenberg
The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic's con
"The chatbot’s answers sound extremely specific to the current context but are in fact statistically generic. The mathematical model behind the chatbot delivers a statistically plausible response to the question. The marks that find this convincing get pulled in." this is really good but i wish it approached the topic of psychics with a bit less bro-ey skepticism
"It matters that the first staticky voices we’ve dialed in with our massive, multi-billion-parameter arrays are dreamers, confabulators, and improvisers. It matters that Chess and Go, the sites where we first encountered their older, more serious siblings, are artworks. Artworks carved out of instrumental reason. Artworks that, long before computers existed, were spinning beautiful webs of logic and attention. Art is not a precious treasure in need of protection. Art is a fearsome wellspring of human power from which we will draw the weapons we need to storm the gates of the reality studio and secure the future."
"Snelson plays a wandering bard – misusing the system to produce the most unlikely of scrawls: small poems scattered across the game’s landscape. The book is a documentation of that performance in a prosody marked by the poetics of fandom. They are recorded here as movie captures, static images, and poetic texts, arranged in four parts spelling a newly coherent object."
field report of the tools in the hands of working writers
computer-generated books ("a non-human reading")
How a flawed idea is teaching millions of kids to be poor readers | At a Loss for Words | APM Reports
if you think horses and ponies are the same thing, and are content for children to remain ignorant of this fact, you live in a world devoid of wonder and joy
incredibly insightful review. "Their exact words, not just their paraphraseable meaning but their precise choices of phrasing, become full of comprehensible information about character, and this gives the characters themselves an unusual reality and presence. As in all good poetry, it is the language itself, and not just the plot and worldbuilding, that makes us care."
wonderful list of resources relating to early forms of exquisite corpse
generating from a markov model by hand
"Alphabetical approaches upended accepted hierarchies"
annagarbier/simple_dialogues: Simple dialogues converted into ridiculously detailed phonetic descriptions.
replaces dialogue with detailed phonetic descriptions of the dialogue. very cool
"a website that produces event scores for performance. The material objects, locations and activities within each score are based on the performance archives of Nathan Walker between 2009-2014 and work towards shuffling and redistributing the archival record to create an anarchive."
zach whalen's notes! very thoughtful!
'I started to ask myself the question – how long will it take before we start seeing “documentary photojournalism” that has no other basis in reality than the photographer’s fantasy and a powerful computer graphics card? Will we be able to tell the difference? How hard is it to do? How skilled will our own community of photographers and editors be in sniffing out what are deep fakes and what is real?'
"While these languages are obviously not in common use today, we find it fascinating to think about the world that might have been. Even more surprisingly, it happens that many of these other options include features which developers would love to see appear in CSS even today."
"BLABRECS is a rules modification for the wordgame SCRABBLE that swaps out the dictionary of real-if-obscure English words for a capricious artificial intelligence. In BLABRECS, real English words aren't allowed! Instead, you have to play nonsense words that sound like English to the AI. These nonsense words are called – you guessed it – BLABRECS."
alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
"Vosk is an offline open source speech recognition toolkit. [...] Vosk models are small (50 Mb) but provide continuous large vocabulary transcription, zero-latency response with streaming API, reconfigurable vocabulary and speaker identification." Bindings for various languages, "scales from small devices like Raspberry Pi or Android smartphone to big clusters."
"...the method here is quite different. DALL-E is trained end-to-end for the sole purpose of producing high quality images directly from language, whereas this CLIP method is more like a beautifully hacked together trick for using language to steer existing unconditional image generating models." good history of the emergence of CLIP art
"At the top of the machine, an LED panel endlessly regurgitates its own new neoist verses into the eyes of the audience, equally brainwashing humans, cyborgs, robots, and other technobiological systems. Anyone can directly hack into the system's artificial neural synapses by unplugging, replugging, and criss-crossing jack cables directly on the machine, thus deconstructing, reconstructing, and even destroying the generative capabilities of the system in real-time."
"Application for the sonification of text which can be transformed according to various triggers and parameters to facilitate the learning and analysis of literacoustics, reading by listening."
"Voluntarily provocative, The Hater Box transforms the principle of old split flap displays into a random generator of contestations, cold and impersonal." 2018 – Wood, motor, cardstock, print, 3D print
"Rainbow Zero is a... toy? widget? thingy? that allows you to explore a part of the space defined by the GloVe word vectors."
some text to image stuff
similarity of images based on semantic similarity between automatic captions
"a recurrent neural network that generates little stories about images"
"TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning."
okay I had no idea this existed: an interactive CD-ROM of various Oulipo texts
char-rnn trained on ansi artwork
🌷🤡💋🌷🤡💋Prof. Grace Lavery💋🤡🌷💋🤡🌷 on Twitter: "Trans narratology teaches us that neither a singular narrative of becoming, nor the laying out of life as a causal sequence, will do justice to the complexity of trans identif
"Trans narratology teaches us that neither a singular narrative of becoming, nor the laying out of life as a causal sequence, will do justice to the complexity of trans identification. Trans lives slip and slide, forward and backward in time."
"a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics"
allennlp's version of the c4 dataset
"Icons are not primitive or rudimentary attempts to duplicate the physical world; they are nuanced and complex attempts to embody the spiritual world."
"How many words can we form by making folds in the straw-paper slogan? I could not have answered that question in 1967. I couldn’t have even asked it. But times change. Enumerating all the foldable messages now strikes me as an obvious thing to do when presented with the straw wrapper. Furthermore, I have the computational means to do it—although the project was not quite as easy as I expected."
"Despite impressive performance on standard benchmarks, deep neural networks often fail when deployed to real-world systems, due to distribution shifts, training artifacts, and noisy data. To address these vulnerabilities, we introduce Robustness Gym: a simple and extensible toolkit for robustness testing that supports the entire spectrum of evaluation methodologies, from adversarial attacks to rule-based data augmentations."
"graphs the usage of words (whether in description or dialogue) over time, distinguishing that usage both by the gender of the fictional characters the terms are associated with, and by the gender of the authors who used them"
"detects toxic, disruptive, or otherwise problematic speech in real-time, and gives you the option to have our software respond immediately, or to escalate to your moderation team"
"Alt-text is an essential part of web accessibility. It is often overlooked or understood through the lens of compliance, as an unwelcome burden to be met with minimum effort. How can we instead approach alt-text thoughtfully and creatively?" (presented at wordhack dec 2020)
"Karrik is an open source typeface designed by Jean-Baptiste Morizot and Lucas Le Bihan. This font was originally commissioned by ‘Cercle’ magazine for their 2020 issue—dedicated to the topic of ghosts. The design started in March 2019 and ended in October of the same year. [...] Karrik is rooted in vernacular typography. The weight disadjustments, the lack of optical corrections, the uneven width of the letters are some of the features of early sans serif typefaces that inspired us..." (hey plus it's open source!)
"a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema."
Julian "fake-deep deepfake" Jarboe on Twitter: "@BigEchoSF I also came here to say @_vajra lol. But also @METROPOLARITY -- if you haven't read their (collective) works you're missing out on some of the best experimental and very very radical SFF." / Twitt
"Short form SF published over the last ten years that aggressively challenged form/language/expectation? Avant-garde-ish, wildly exuberant, austere to the point of impoverishment. Whatever crazy shit somehow slipped past the gatekeepers and caught you by surprise. Suggestions?"
generates (via markov chain) and speaks made-up words from various corpora
"StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession."
"The exhibition OPEN SCORES brought together a series of practices through which artists articulate their specific forms of digital commons. From online archives to digital tools/ infrastructure and educational formats, the projects envision a (post-)digital culture in which notions of collaboration, free access to knowledge, sustainable use of shared resources, and data privacy are central. For the exhibition, each of the projects created a unique score to present their practice."
"[a] dictionary and graphical data for over 9000 of the most common simplified and traditional Chinese characters. Among other things, this data includes stroke-order vector graphics for all these characters." (via gábor ugray's !!con 2020 talk)
"freely available for research with the condition that the research be used for the benefit of children"
should make sure I know all this stuff
Chinese WeChat Users Are Sharing A Censored Post About COVID-19 By Filling It With Emojis And Writing It In Other Languages
"[T]o avoid the censorship, people have converted parts of the interview into Morse code, filled it up with emojis, or translated it into fictional languages like Sindarin from The Lord of the Rings or Klingon from Star Trek. In one particularly creative example, someone inserted it into the iconic opening crawl of Star Wars."
sam and tega. very good
'...very boring stories that did not even satisfy my youngest children... I tried these stories on my very small children but after some minutes they grew very irritable, because nothing actually happened. This shows that even small children of three can measure entropy'
"Face lets you edit both the text and the font it is rendered in. In text mode you can type and edit text normally. Press escape to enter font mode, where you can select a character to edit. Any changes to a character are visible immediately."
"CCMatrix is the largest data set of high-quality, web-based bitexts for training translation models. With more than 4.5 billion parallel sentences in 576 language pairs pulled from snapshots of the CommonCrawl public data set, CCMatrix is more than 50 times larger than the WikiMatrix corpus that we shared last year."
"Conclusion: We hypothesized that radical swings in affective posture would make the writer more emotionally flexible. Likewise, we hypothesized that attempting to discern the emotional valences of a machine learning model derived from achingly sensitive Tumblr posts would make the writer more empathetic. Unfortunately, no conclusions could be drawn from a single poem."
python-based mush/moo thing! could be fun to play around with
"Even the fancier controllers of Valve’s Index kits don’t let you separate your fingers to produce the Ws or Vs necessary for some words. [...] It’s a lovely avenue of human connection, but I can also imagine linguists frothing over VR sign language. There’s a great example in Syrmor’s video where a currently learning interpreter called Quentin explains that because the W restriction means they can’t use the normal word for ‘world’, they instead mimic the appearance of a portal opening up in VR. They’ve also got different ways of signing words depending on your gear, which is both fascinating and mildly concerning." that must feel weird
sesame street on phonaesthetics
"an esoteric programming language that closely follows the grammar and tone of classical Chinese literature. Moreover, the alphabet of wenyan contains only traditional Chinese characters and 「」 quotes, so it is guaranteed to be readable by ancient Chinese people." (from one of Golan Levin's students)
mhagiwara/github-typo-corpus: GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
"a large-scale dataset of misspellings and grammatical errors along with their corrections harvested from GitHub. It contains more than 350k edits and 65M characters in more than 15 languages, making it the largest dataset of misspellings to date."
"I run datasets of iconic feminist texts through a simple textRNN, generating new feminists texts in the legendary words of bell hooks, Simone De Beauvoir, Betty Friedan and Audre Lorde. Some are funny. Some are poetic. Some make no sense at all and some are way too real. Information about the model and settings can be found under each post."
another tutorial from emnlp-19
overview + materials for emnlp-19 workshop
"In the movie-oriented CCPE dataset, individuals posing as a user speak into a microphone and the audio is played directly to the person posing as a digital assistant. The “assistant” types out their response, which is in turn played to the user via text-to-speech. [...] The Taskmaster-1 dataset makes use of both the methodology described above as well as a one-person, written technique to increase the corpus size and speaker diversity—about 7.7k written “self-dialog” entries and ~5.5k 2-person, spoken dialogs. For written dialogs, we engaged people to create the full conversation themselves based on scenarios outlined for each task, thereby playing roles of both the user and assistant."
"On October 19th of 1955, Pulitzer Prize-winning poet, Marianne Moore, was approached by a Mr. Robert Young of the Ford Motor Company and asked to assist them in naming a new series of cars."
"ways to make huge models like BERT smaller and faster": quantization, pruning, distillation
Joel Simon on Twitter: "New work in my Dimension of Dialogue series :) Two neural nets learn to communicate through their own emergent visual language. The resulting alphabet is a product of their adversarial and cooperative relationship. Here set in clay
"Two neural nets learn to communicate through their own emergent visual language."
gpt-2 on svg for generated emoji and letterforms
a "talk about the history and environment of Hershey’s creation, and touch on the current state of resurrection."
"Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 200 contributors producing more than 100 treebanks in over 70 languages."
"The father/daughter team of Trevor F. Smith and Sparks Webb have gathered all available documentation about the memex and carefully fabricated the Memex #001 to match Dr. Bush's specifications."
I assume this is the full text of chapter one of the landow book?
"Recommendation engines like the ones powering the endless feeds on Twitter, Facebook and YouTube, are designed to maximize ad revenue, and therefore to keep you online for as long as possible. In doing so they promote the most reactionary content on their platforms. Yet, these recommendation systems are nothing more than sorting mechanisms. Other Orders provides an alternate set of sorts, optimized for other outcomes."
"Artists aim differently than sharpshooters. They are not typically trying to take something out, but to draw something out. The mark Holzer hits in this case is the mark in the most cave-drawing sense: the effort to leave (or find) a trace of something that is not an opinion, but a register of some kind, certifying a lived experience. There may be no such thing as a permanent record, but the fact that the Washington Post contributor found Holzer’s work dangerous is a sign in and of itself that it has achieved one of its goals: it has carved a deep enough mark to leave a strong impression (for that writer, a menacing one). That’s the most any language or other kind of mark-making can hope to accomplish."
well this looks like a dream come true?
"a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing"
"The Transformer is nothing more than an architecture where the core functional unit is attention. You stack attention layers on top of attention layers, just like you would do with CNN or RNN layers."
wiktionary word frequency lists
You elected them to write new laws. They’re letting corporations do it instead. – Center for Public Integrity
uses my poetry corpus! though points out some shortcomings.
"Yulia Tsvetkov's research group at Language Technologies Institute of Carnegie Mellon University. Our work focuses on natural language processing, particularly cross-lingual approaches, low-resource settings, and social good."
"...the fiction that speech casts visible shadows. [...] converts speech into whimsically animated letters and shapes that appear to float upwards from the shadow of the speaker's head. Visitors can also manipulate these forms directly, using the shadow of their own body. When a phoneme is recognized by the software with sufficient confidence, it is spelled out on the installation's display."
"In this interactive installation participants enter the first word that comes to their mind in one of two input terminals in any language. These words are then the seed of a generative process that develops a poem, bifurcating and mutating, merging languages, poetic styles, sense and nonsense. Poems overlap and degrade over time, eventually fading away. Phonetics are remapped to a new alphabet of sound referencing the body and incidental noises, creating a unique expression for each word and making literal the arbitrariness of the language. This installation was projected on a massive scale covering the walls and ceiling and filling the hall of the old imperial castle in Poznan, Poland. This video shows a demonstration of the generated poetry."
kyle mcdonald's take
"I love reading postmortems. They're educational, but unlike most educational docs, they tell an entertaining story."