Allison's bookmarks (tagged machinelearning)

Most recent

New Words

"a speculative research project exploring the use of machine learning for the evolution of language. Large language models (LLM's) are fantastic at capturing our language as it currently is - but language is constantly evolving and adapting. Can machine learning help us create something truly new and unbounded by its training data?"

Generative AI Takes Stereotypes and Bias From Bad to Worse

"In the US, women are underrepresented in high-paying occupations, but data shows that gender representation across most industries has improved significantly over time. Stable Diffusion depicts a different scenario, where hardly any women have lucrative jobs or occupy positions of power. Women made up a tiny fraction of the images generated for the keyword “judge” — about 3% — when in reality 34% of US judges are women, according to the National Association of Women Judges and the Federal Judicial Center. In the Stable Diffusion results, women were not only underrepresented in high-paying occupations, they were also overrepresented in low-paying ones."

Proud Employees

"The models are just an acceleration of processes that are already in place. The ideal outcome of a machine-learning driven society is overfitting: a world where, whether all our jobs are automated or we've just trained ourselves to work with the machine, no new information will need to be processed, all will be determined and predicted."

Thoughts on Stable Diffusion and Free Culture

"These reactions, I think, help us to imagine what a public culture might be after free culture’s demise. What remains, however, is to recapture that productive engine of Kelty’s recursive publics that drive cryptocurrency and web3 today in building other hyper-capitalist futures. If commons-based peer production is a joke. If free culture is a ruse. What ways of doing with technology remain? Just as Jackie Wang has productively re-read theories of control through racial capitalism, I see a continued challenge of reimagining democracy, addressing its historic exclusions, and colonial underpinnings."

"BLABRECS is a rules modification for the wordgame SCRABBLE that swaps out the dictionary of real-if-obscure English words for a capricious artificial intelligence. In BLABRECS, real English words aren't allowed! Instead, you have to play nonsense words that sound like English to the AI. These nonsense words are called – you guessed it – BLABRECS."

Alien Dreams: An Emerging Art Scene - ML@B Blog

"...the method here is quite different. DALL-E is trained end-to-end for the sole purpose of producing high quality images directly from language, whereas this CLIP method is more like a beautifully hacked together trick for using language to steer existing unconditional image generating models." good history of the emergence of CLIP art

The Sense of Neoism?! | Sofian Audry

"At the top of the machine, an LED panel endlessly regurgitates its own new neoist verses into the eyes of the audience, equally brainwashing humans, cyborgs, robots, and other technobiological systems. Anyone can directly hack into the system's artificial neural synapses by unplugging, replugging, and criss-crossing jack cables directly on the machine, thus deconstructing, reconstructing, and even destroying the generative capabilities of the system in real-time."

"TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning."

Layout Parser

could be fun to play with. "With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process."

Robustness Gym

"Despite impressive performance on standard benchmarks, deep neural networks often fail when deployed to real-world systems, due to distribution shifts, training artifacts, and noisy data. To address these vulnerabilities, we introduce Robustness Gym: a simple and extensible toolkit for robustness testing that supports the entire spectrum of evaluation methodologies, from adversarial attacks to rule-based data augmentations."

Hateful Memes Challenge winners

"Hate speech can come in many forms, including memes that combine text and images. This kind of multimodal content can be particularly challenging for AI to detect because it requires a holistic understanding of the meme." that is not the reason that hate speech is difficult to detect, and it's actually harmful that you think it's the reason, sorry

"StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession."

From Panic to Profit - Data & Society: Points

"As panic around AI-generated fake news and videos have shown, new technologies described as overwhelmingly advanced, conceptually inscrutable, and deeply conspiratorial make for headlines that draw attention. As AI-supported disinformation technologies advance, it is possible we will see panic around these technologies wielded to justify technological closure in the name of “the public interest.” While caution and care is warranted, we should not accept fast and seemingly easy technological closures for these problems without pushing for social, cultural, legal, and historical explanations."

Kyle Booten — Tentacular

"Conclusion: We hypothesized that radical swings in affective posture would make the writer more emotionally flexible. Likewise, we hypothesized that attempting to discern the emotional valences of a machine learning model derived from achingly sensitive Tumblr posts would make the writer more empathetic. Unfortunately, no conclusions could be drawn from a single poem."

How Big Tech Manipulates Academia to Avoid Regulation

"To be fair, some of the research is useful and nuanced, especially in the humanities and social sciences. But the majority of well-funded work on 'ethical AI' is aligned with the tech lobby’s agenda: to voluntarily or moderately adjust, rather than legally restrict, the deployment of controversial technologies. [...] It is strange that Ito, with no formal training, became positioned as an 'expert' on AI ethics, a field that barely existed before 2017. But it is even stranger that two years later, respected scholars in established disciplines have to demonstrate their relevance to a field conjured by a corporate lobby."

Neuraxio/Neuraxle: Build neat pipelines with the right abstractions to do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

"a Machine Learning (ML) library for building neat pipelines, providing the right abstractions to both ease research, development, and deployment of your ML applications. [...] [T]he optimizer is a model itself that maps features of datasets and features of the hyperparameter space to a guessed performance score to predict the best hyperparameters."

Teaching AI Feminism and Making Art

"I run datasets of iconic feminist texts through a simple textRNN, generating new feminists texts in the legendary words of bell hooks, Simone De Beauvoir, Betty Friedan and Audre Lorde. Some are funny. Some are poetic. Some make no sense at all and some are way too real. Information about the model and settings can be found under each post."

How you’re feeling when machine learning might help - Quartz AI Studio

"We’ll never be able to read all of these documents. What’s unique about this text compared to all the rest? My eyes sting from searching these images for the same thing. We need to find more records like these in a huge pile of data. I could really use a heads-up before this happens again. (Post to come.)" I *reeeeeally* appreciate approaches to ml like this that start with problems to be solved (instead of just taking for granted that ai/ml is useful)

"a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing"

"As machine learning algorithms are commoditized, those who can work along the entirety of the applied machine learning arc will be the most valuable."

Tsvetshop: Home

"Yulia Tsvetkov's research group at Language Technologies Institute of Carnegie Mellon University. Our work focuses on natural language processing, particularly cross-lingual approaches, low-resource settings, and social good."

