When.com Web Search

Search results

  1. Results From The WOW.Com Content Network
  2. List of datasets for machine-learning research - Wikipedia

    en.wikipedia.org/wiki/List_of_datasets_for...

    Information about this dataset's format is available in the HuggingFace dataset card and the project's website. The dataset can be downloaded here, and the rejected data here. 2016 [343] Paperno et al. FLAN A re-preprocessed version of the FLAN dataset with updates since the original FLAN dataset was released is available in Hugging Face: test data

  3. List of datasets in computer vision and image processing

    en.wikipedia.org/wiki/List_of_datasets_in...

    The dataset is labeled with semantic labels for 32 semantic classes. over 700 images Images Object recognition and classification 2008 [56] [57] [58] Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla RailSem19 RailSem19 is a dataset for understanding scenes for vision systems on railways. The dataset is labeled semanticly and ...

  4. BLOOM (language model) - Wikipedia

    en.wikipedia.org/wiki/BLOOM_(language_model)

    BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) [1] [2] is a 176-billion-parameter transformer-based autoregressive large language model (LLM). The model, as well as the code base and the data used to train it, are distributed under free licences. [3]

  5. GPT-2 - Wikipedia

    en.wikipedia.org/wiki/GPT-2

    GPT-2 was pre-trained on a dataset of 8 million web pages. [2] It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019. [3] [4] [5] GPT-2 was created as a "direct scale-up" of GPT-1 [6] with a ten-fold increase in both its parameter count and the size of its training dataset. [5]

  6. Hugging Face - Wikipedia

    en.wikipedia.org/wiki/Hugging_Face

    The company received a $2 billion valuation. In February 2023, the company announced partnership with Amazon Web Services (AWS) which would allow AWS customers access to Hugging Face's products. The company also said the next generation of BLOOM will be run on Trainium, a proprietary machine learning chip created by AWS.

  7. MMLU - Wikipedia

    en.wikipedia.org/wiki/MMLU

    [1] [2] The MMLU was released by Dan Hendrycks and a team of researchers in 2020 [ 3 ] and was designed to be more challenging than then-existing benchmarks such as General Language Understanding Evaluation (GLUE) on which new language models were achieving better-than-human accuracy.

  8. List of large language models - Wikipedia

    en.wikipedia.org/wiki/List_of_large_language_models

    Apache 2.0 Trained on crowdsourced open data Jurassic-2 [69] March 2023: AI21 Labs: Unknown Unknown Proprietary Multilingual [70] PaLM 2 (Pathways Language Model 2) May 2023: Google: 340 [71] 3.6 trillion tokens [71] 85,000 [57] Proprietary Was used in Bard chatbot. [72] Llama 2: July 2023: Meta AI: 70 [73] 2 trillion tokens [73] 21,000: Llama ...

  9. BookCorpus - Wikipedia

    en.wikipedia.org/wiki/BookCorpus

    The dataset consists of around 985 million words, and the books that comprise it span a range of genres, including romance, science fiction, and fantasy. [ 3 ] The corpus was introduced in a 2015 paper by researchers from the University of Toronto and MIT titled "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching ...