When.com Web Search

Search results

  1. Results From The WOW.Com Content Network
  2. List of datasets for machine-learning research - Wikipedia

    en.wikipedia.org/wiki/List_of_datasets_for...

    Ling-Spam Dataset Corpus containing both legitimate and spam emails. Four version of the corpus involving whether or not a lemmatiser or stop-list was enabled. 2,412 Ham 481 Spam Text Classification 2000 [38] [39] Androutsopoulos, J. et al. SMS Spam Collection Dataset Collected SMS spam messages. None. 5,574 Text Classification 2011 [40] [41]

  3. Naive Bayes spam filtering - Wikipedia

    en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

    Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s.

  4. Bogofilter - Wikipedia

    en.wikipedia.org/wiki/Bogofilter

    Bogofilter examines tokens in the message body and header, and refers to wordlists stored by BerkeleyDB, SQLite or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and HTML and supports reading multi-part MIME message including base64, quoted-printable , and uuencoded text or HTML.

  5. Spam and Open Relay Blocking System - Wikipedia

    en.wikipedia.org/wiki/Spam_and_Open_Relay...

    The list consisted of 78,000 proxy relays and rapidly grew to over 3,000,000 alleged compromised spam relays. [1] In November 2009 SORBS was acquired by GFI Software, to enhance their mail filtering solutions. [2] In July 2011 SORBS was re-sold to Proofpoint, Inc. [3] On June 5, 2024 SORBS was shut down and no longer available. [4]

  6. Enron Corpus - Wikipedia

    en.wikipedia.org/wiki/Enron_Corpus

    The Enron Corpus is a database of over 600,000 emails generated by 158 employees [1] of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the Federal Energy Regulatory Commission (FERC) during its subsequent investigation. [ 2 ]

  7. Common Crawl - Wikipedia

    en.wikipedia.org/wiki/Common_Crawl

    Amazon Web Services began hosting Common Crawl's archive through its Public Data Sets program in 2012. [9]The organization began releasing metadata files and the text output of the crawlers alongside .arc files in July 2012. [10]

  8. Apache SpamAssassin - Wikipedia

    en.wikipedia.org/wiki/Apache_SpamAssassin

    Apache SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) which is usually used to filter all incoming mail for one or several users.It can be run as a standalone application or as a subprogram of another application (such as a Milter, SA-Exim, Exiscan, MailScanner, MIMEDefang, Amavis) or as a client (spamc) that communicates with a daemon (spamd).

  9. Email spam - Wikipedia

    en.wikipedia.org/wiki/Email_spam

    An email box folder filled with spam messages.. Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email ().The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. [1]