Spam/Junk Mail Filter CSE Mini Project Report

To block spam the filter implemented is used and it is also called as unsolicited email. It makes use of statistical approaches which is known as Bayesian filtering, blocking the spam is the main function of this system. At the beginning the program has to be properly trained using non-spam and spam mail sets. Further this is put in database and the the more the number of training you get the more will be level of performance.

At the arrival of a new mail it is tokenized and by looking into the database the probability of each word is found. After finding out the entire probability if it reaches more than .9 then it is termed as a spam. A proper training can block 99 % of the the mails that are spam along with 0 false positives. Spam is a common problem that is growing with time.

For email users various solutions have been proposed. The main part that we all know is not to accept messages from unknown people a sit increases the chances of spam to a large extent. Spam filtering is this a great way to reduce the spamming problem.

As soon as you have enough non spam messages and spam messages you can start using a Bayesian filter. It is necessary that before you start using this filter you correctly classified the messages. Make sure that you get 100 or even more of each of messages and also that it is not an unidentified feature of the messages. Just for example: if you do not use non-spam messages for over 6 months and spam messages then the algorithm can determine that messages with old dates are non spammed messages while messages with new dates are spam messages. Do not mix the pad numbers with the duplicate ones as it will overtrain the filter of the message features.

Leave a Reply

Your email address will not be published. Required fields are marked *