Allison's bookmarks (tagged corpora)

dell-research-harvard/AmericanStories · Datasets at Hugging Face

"a collection of full article texts extracted from historical U.S. newspaper images [that] includes nearly 20 million scans from the public domain"

datasets corpora language text history

Saved 2023-09-13T18:51:55.137457Z

Priya22/project-dialogism-novel-corpus: The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.

(via data is plural): "every quotation from 22 novels, plus who speaks each line, who they’re addressing, the characters they mention, and more. With 35,000+ quotations, the corpus 'is by an order of magnitude the largest dataset of annotated quotations for literary texts in English.'"

data datasets corpora text

Saved 2023-02-01T20:16:10Z

Leveraging Machine Learning to Fuel New Discoveries with the arXiv Dataset | arXiv.org blog

text poetics datasets corpora

Saved 2020-08-19T13:45:05Z

htrc/htrc-feature-reader: Tools for working with HTRC Feature Extraction files

python interface for the HTRC Extracted Features dataset

python programming text corpora datasets

Saved 2020-06-24T19:27:27Z

whipson/PoKi-Poems-by-Kids: PoKi: A Large Dataset of Poems by Children

"freely available for research with the condition that the research be used for the benefit of children"

text datasets corpora poetry poetics

Saved 2020-04-29T15:37:13Z