When.com Web Search

Search results

  1. Results From The WOW.Com Content Network
  2. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected.Text corpora are used by corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and variation, and teaching language proficiency.

  3. Lexical lists - Wikipedia

    en.wikipedia.org/wiki/Lexical_lists

    Plants, archaic word-list; proto-Aa, bilingual version of Proto-Ea with a number of Akkadian translations for each of the Sumerian values (Old-Babylonian) proto-Ea, the designation for two different texts, a syllabary and a vocabulary, a format with, and one without glosses, expounding polyvalency (Old-Babylonian) [6]: 620

  4. Ancient text corpora - Wikipedia

    en.wikipedia.org/wiki/Ancient_text_corpora

    Ancient text corpora are the entire collection of texts from the period of ancient history, defined in this article as the period from the beginning of writing up to 300 AD. These corpora are important for the study of literature , history , linguistics , and other fields, and are a fundamental component of the world's cultural heritage .

  5. Text corpus - Wikipedia

    en.wikipedia.org/wiki/Text_corpus

    There are two main types of parallel corpora which contain texts in two languages. In a translation corpus, the texts in one language are translations of texts in the other language. In a comparable corpus, the texts are of the same kind and cover the same content, but they are not translations of each other. [2]

  6. Corpus linguistics - Wikipedia

    en.wikipedia.org/wiki/Corpus_linguistics

    Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural corpora). [1] Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. [1] Today, corpora are generally machine-readable data collections.

  7. List of florilegia and botanical codices - Wikipedia

    en.wikipedia.org/wiki/List_of_florilegia_and...

    1827-38 London Edinburgh The Birds of America John James Audubon (1785-1851) (botanical and ornithological plates) 1828 Japan Honzō zufu (Illustrated Manual of Medicinal Plants) [61] Kan'en Iwasaki/Iwasaki Tsunemasa (1786–1842) 1828 London The Pomological Magazine John Lindley (1799–1865) 1828–32 Paris Flore medicale François-Pierre ...

  8. TenTen Corpus Family - Wikipedia

    en.wikipedia.org/wiki/TenTen_Corpus_Family

    The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.

  9. List of datasets for machine-learning research - Wikipedia

    en.wikipedia.org/wiki/List_of_datasets_for...

    Images, text Classification, clustering 2015 [313] [314] T. Munisami et al. Oxford Flower Dataset 17 category dataset of flowers. Train/test splits, labeled images, 1360 Images, text Classification 2006 [315] [316] M-E Nilsback et al. Plant Seedlings Dataset 12 category dataset of plant seedlings. Labelled images, segmented images, 5544 Images