When.com Web Search

Search results

  1. Results From The WOW.Com Content Network
  2. List of datasets for machine-learning research - Wikipedia

    en.wikipedia.org/wiki/List_of_datasets_for...

    Ling-Spam Dataset Corpus containing both legitimate and spam emails. Four version of the corpus involving whether or not a lemmatiser or stop-list was enabled. 2,412 Ham 481 Spam Text Classification 2000 [38] [39] Androutsopoulos, J. et al. SMS Spam Collection Dataset Collected SMS spam messages. None. 5,574 Text Classification 2011 [40] [41]

  3. Enron Corpus - Wikipedia

    en.wikipedia.org/wiki/Enron_Corpus

    A visualization of the email network in the Enron Corpus, with coloring representing eight communities. The corpus is valued as one of the few publicly available mass collections of real emails easily available for study; such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access, such as non-disclosure agreements and ...

  4. Kaggle - Wikipedia

    en.wikipedia.org/wiki/Kaggle

    Kaggle is a data science competition platform and online community for data scientists and machine learning practitioners under Google LLC.Kaggle enables users to find and publish datasets, explore and build models in a web-based data science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

  5. Naive Bayes spam filtering - Wikipedia

    en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

    Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s.

  6. The Spamhaus Project - Wikipedia

    en.wikipedia.org/wiki/The_Spamhaus_Project

    The Combined Spam Sources (CSS) [12] is an automatically produced dataset of IP addresses that are involved in sending low-reputation email. Listings can be based on HELO greetings without an A record, generic looking rDNS or use of fake domains, which could indicate spambots or server misconfiguration.

  7. Email spam - Wikipedia

    en.wikipedia.org/wiki/Email_spam

    An email box folder filled with spam messages.. Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email ().The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. [1]

  8. Common Crawl - Wikipedia

    en.wikipedia.org/wiki/Common_Crawl

    The donated data helped Common Crawl "improve its crawl while avoiding spam, porn and the influence of excessive SEO." [11] In 2013, Common Crawl began using the Apache Software Foundation's Nutch webcrawler instead of a custom crawler. [12] Common Crawl switched from using .arc files to .warc files with its November 2013 crawl. [13]

  9. List poisoning - Wikipedia

    en.wikipedia.org/wiki/List_Poisoning

    Syntactically invalid email addresses used to poison a mailing list could be easily filtered out by the spammers, while using email addresses that are syntactically correct could cause problems for the mail server responsible for the email address. [1] Implementations of spam poisoning systems can be avoided, if spammers learn of their location ...