Search results
Results From The WOW.Com Content Network
OpenML: [493] Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms. PMLB: [494] A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms ...
Download as PDF; Printable version; ... Pages in category "Datasets in machine learning" ... Text is available under the Creative Commons Attribution-ShareAlike 4.0 ...
Wikipedia-based Image Text Dataset 37.5 million image-text examples with 11.5 million unique images across 108 Wikipedia languages. 11,500,000 image, caption Pretraining, image captioning 2021 [7] Srinivasan e al, Google Research Visual Genome Images and their description 108,000 images, text Image captioning 2016 [8] R. Krishna et al.
Download as PDF; Printable version; In other projects ... List of datasets for machine-learning research. List of datasets in computer vision and image processing ...
Download as PDF; Printable version; ... Redirect page. Redirect to: List of datasets for machine-learning research#COCO; ... Text is available under the Creative ...
Download as PDF; Printable version; ... Datasets in machine learning (1 C, 12 P) S. Statistical data sets (18 C, 32 P) Pages in category "Datasets"
The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed by EleutherAI in 2020 and publicly released on December 31 of that year. [1] [2] It is composed of 22 smaller datasets, including 14 new ones. [1]
The dataset consists of around 985 million words, and the books that comprise it span a range of genres, including romance, science fiction, and fantasy. [ 3 ] The corpus was introduced in a 2015 paper by researchers from the University of Toronto and MIT titled "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching ...