dell-research-harvard/AmericanStories · Datasets at Hugging Face
"a collection of full article texts extracted from historical U.S. newspaper images [that] includes nearly 20 million scans from the public domain"
"a collection of full article texts extracted from historical U.S. newspaper images [that] includes nearly 20 million scans from the public domain"
(via data is plural): "every quotation from 22 novels, plus who speaks each line, who they’re addressing, the characters they mention, and more. With 35,000+ quotations, the corpus 'is by an order of magnitude the largest dataset of annotated quotations for literary texts in English.'"
python interface for the HTRC Extracted Features dataset
"freely available for research with the condition that the research be used for the benefit of children"