Sunday, March 16, 2008

Spam and Anti-Spam [4] Email Anti-Spam

Anti-Spam refers to the set of techniques and steps that can prevent or prohibit the effectiveness of potential spammers to send spam to a large number of recipients. The term recipient here refers not only to email recipients although it is the major protocol for spam but to other victims as well such as blog, forum or mobile phone spam. Various techniques have been introduced to counter these spam mechanisms. Some of these procedures are reactive that is take action once the spam is received. Another variation is to prevent the spammer from even sending spam.

In this section we will discuss the various techniques for implementing Anti spam. Emphasis will be on Email anti spam techniques since it is the major protocol for proliferating spam and the topic of our report. We will describe the various types of techniques employed. Later we will also briefly take a look at the ongoing research in Anti spam techniques.

Email Anti Spam

As discussed in the previous section email spam is the most common, easy and effective method of sending out spam. One of the primary reasons that it is so easy is the fact that it requires minimal effort and no cost. The cost is incurred by the recipient and carrier (usually the ISP) rather than the sender. Therefore a spammer can have a bot/script that can send spam emails to millions of email addresses with very little effort and cost. These email addresses are initially collected through special scrapers or scripts that harvest valid email addresses from various sources. To summarize we can say that sending of email spam depends on the following factors:

o Cost of Harvesting or collecting email.
o Monetary Cost of sending spam email.
o Computational Cost of sending spam email.
o Legal Cost of sending spam email.
o Anonymity of the sender.

The following section describes techniques some of which target each of these areas to either increase the cost associated with them or to prevent it altogether to block spam.
A brief description along with each technique insights into how it could be made to work in a system that is prone to spamming.

We can divide Email anti spam techniques into 3 broad categories:
1. End User Techniques.
2. Automated Techniques for Email Administrators.
3. Automated Techniques for email Senders.


1. End User Techniques

These are techniques that a potential recipient of spam can take in order to restrict the availability of his/her email address to prevent spam.

o Address Munging

This refers to a set of techniques which prevents email harvesters or bots to scrape valid email addresses from web pages which can be later used to send email to. The first approach is to post anonymously or place dummy addresses on web pages. Another way is to print the email address in such a way so as to trick the email harvester into thinking that this is not a valid email address. For eg.

myname@email.com could be munged to look like:

“myname at email dot com”

Another way is to use transparent address munging. This allows users to see the actual address but obfuscate it from automated email harvesters with methods such as displaying all or part of the e-mail address on a web page as an image, a text logo shrunken to normal size using in-line CSS, or as jumbled text with the order of characters restored using CSS.

The disadvantage is that authentic users might have problems reading the email address and thus might not be able to send messages

o Disable HTML in e-mail

Mail programs today include the ability to display HTML that is part of on email body. This can expose the user to offensive images as well as malicious JavaScript code that can take advantage of vulnerabilities in the web container that displays the HTML content. These may include sending the email address of the recipient back to the server to validate that it is valid, install spy-ware or mal-ware and direct users to advertised pages. To avoid this most email clients do not display HTML content by default. The user can turn it on if the email is valid.

o Reporting spam

Another popular approach is to track down a spammer's ISP and report the offense. This could lead to the spammer's service being suspended. However, it can be difficult task to track down this information for a spammer since spammers might use netblocks to avoid this kind of tracking. There are tools that can be used for this particular purpose such as the SpamCop and Network Abuse Clearinghouse. They are semi-automated tools that can report spam to ISPs. One disadvantage of this method is that it depends on the ISP to block the spammer. This might not happen immediately and in addition the actual machines that are sending out the spam might only be a part of a bot net i.e a they might be a zombie in which case the ISP will only be able block the zombie not the spammer and most probably not block the whole bot net either.

o Disposable Email addresses

In this technique the user gives a disposable email address to a web site that requires one. This disposable email address is temporarily set up to forward emails to a valid email address for a certain period of time after which the disposable email address becomes invalid. This prevents a malicious site from collecting valid email addresses from the user.

o No Response to Spam:

Another method to avoid spam is by not responding to potential spam emails. Responses received by spammers will indicate to the Spammers that this is a valid email address. As a result the email address can thus be used with surety that it is valid for future spamming. Another thing to note is that some spam emails might contain links posing to unsubscribe the user from the spam list. The best approach is to avoid even clicking on these links since they might introduce mal-ware and will not remove the recipient from the spam list.

o Aggressive Response to Spam

Aggressively responding to spam is another technique however it is controversial because some experts advocate not responding to spam at all as mentioned above. The basic idea is to, "spam the spammer". This increases the cost incurred by the spammer for sending out spam. One of way to achieve this by sending the spam email back to the machine that is sending it. The machine information can be obtained from the IP address in the email headers. This way the machine will become overloaded if it was sending millions of spam emails. However, as mentioned before the problem that the machine might be a zombie and only a part of a bot net would counter this method. Some spam emails contains links to advertised pages which contain forms. Automated tools can be used to fill these forms and then submit them for each spam email received. For a large number of spam emails this would bring down the spammer’s web site.


2. Automated Techniques for Email Administrators

The techniques described in this section can be used by email administrators to prevent spam email from coming in. The 2 most significant ones have been mentioned first followed by a comparison between the two. After that other techniques employed by email administrators will also be discussed.

o Rule Based Systems

As the name implies, Rule Based systems uses a set of rules to determine whether an incoming email is spam or not. These systems usually parse the email for specific keywords or content after which the system applies rules to decide. Rule based systems are usually complimented by special purpose algorithms which utilize a distributed community approach. In this approach the users of the email server are then mark emails they believe is spam and let the email server know. A database of such emails is kept through which the rules can be inferred. This is a more reactive approach since the first few spam emails will be delivered to the recipients in their inboxes. Once a significant amount of users have marked these emails as spam any more similar incoming spam emails will then be filtered. The spam emails are also marked with a specific flag or value indicating that the email is spam. This allows for mail clients to place such emails in the Junk or Spam folders.
Even though the constant updating of the database through the community has its advantages there are some drawbacks as well. One of which is that such systems can sometimes consider valid emails as spam as well. As a result this requires the user to continuously monitor his or her spam folder as well to make sure no valid emails have been sent to the spam folder.

o Challenge-Response Systems

Challenge Response systems is another popular method of anti-spam. This system exploits the fact that the spammer usually needs anonymity and thus does not give a valid reply to address.
The system introduces the concept of a ‘white list’ and a ‘black list’. Each of these lists contains a list of possible email senders. When an email is received from someone on the white list the email is delivered however if the sender is on the black list the email is rejected and deleted. In the case where the email is not on either of the lists then a challenge reply is sent to them. If the unknown sender replies to the challenge response in the appropriate manner usually by replying to the challenge message the sender is added in the white list and the original message is delivered. However if no appropriate challenge response is received the original email is not delivered and is considered spam. Since most spammers do not provide valid reply addresses no response is received and as a result the spam mail is thus rejected.
Comparison of Rule based and Challenge Response systems

Both Rule Based and Challenge-Response methods have their pros and cons. Challenge response mechanism for fighting spam is gaining popularity and has some clear advantages over Rule based systems such as the fact that no Spam folders need to be maintained.

The following table shows comparison of the Advantages and Disadvantages of the two techniques just discussed:

 

 

Advantages

 

Disadvantages

Rule Based Systems

No email "challenge"
No "lists" to maintain

Rules must be updated
Must maintain "spam" folders
False positives possible

Challenge-Response Systems

No "rule" database
No "spam" folders
No false positives

Emails "challenged"
Uses "white"/"black" lists



o Authentication and Reputation

One approach to counter is to have a certain kind of reputation and authentication rating particular servers. This would indicate to the recipient server that the sender is a legitimate user and does not send spam. Instead of keeping lists of addresses or domains to block spam the system basically uses a reputation system for valid and legitimate addresses and domains. Any email coming from such domains is considered to be valid. However such systems cannot detect spam and neither is there is to do so. They can however be used in conjunction with spam filtering systems so that computationally expensive filtering mechanisms are not applied to emails coming from such legitimate domains.


o Checksum-based filtering

This technique relies on the fact that spam email being sent to large number of recipients is inevitably going to be very similar to each other. As such a filter can be placed which takes each email removes the section which would vary and then calculate a checksum of the remaining part. In addition it maintains a database of checksums of spam emails. It compares the computed checksum of the current email to the ones in the database and if there is a match it considers the email as spam.
The database is created by users marking email they have received before as spam. Checksums of such emails are then kept in the database for future use. Since the user participates in the anti spam system it can be extremely effective. However spammers have come up with strategies to counter this technique. They do this by inserting unique garbage data in the email message that acts as ‘hashbuster’ which results in unique hash for each message.


o DNS Based Blackhole Lists

DNS-based Blackhole Lists, or DNSBLs, uses heuristic filtering for blocking spam. A site can publish lists (usually IP addresses) through the Domain Name Server, in such a way that mail servers can be set up to reject mail from such sources. The advantage of this approach is that there are scores of such DNSBLs, each of which shows different policies: some list sites known to send spam; others list open mail relays or proxies; others list ISPs known to support spam.

o Enforcing RFC standards

The RFC standards defined by the Simple Mail Transfer Protocol (SMTP) can also be used to effectively block spam. This can be done by enforcing technical requirements of the SMTP protocol as defined by the RFC standards. This method is effective since spammers usually use software to send email that does not fully comply with the SMTP standards. Also if spam is being sent from a zombie machine in a bot net the spammer has limited control over the machine and therefore will only be able to send email that might not comply to the standards. As a result the receiving administrator can enforce checking of conformance to the standards to prevent spam.

o Greylisting

Greylisting is the set of steps to reject messages temporarily from unknown sender mail servers. Greylisting is based on the fact that spammers will not re-try to send their messages. Instead, they will move on to the next message and next address. As a result spam will not be delivered. However it has a drawback that valid senders of email will have to resend the email. This can be handled by mail client from the sender side though.


3. Automated Techniques for Email Senders:

o Background checks on new users and customers

Spammers and service providers are in a constant race with each other. ISPs or network providers kick off suspected spammers from their network so that the service can not be abused for spam. However spammers are continuously trying to create new accounts and then use them for sending spam. Even if the accounts last for only a few hours the damage to reputation of the service provider is immense and the amount of spam that can be sent during this window is enough for the spammer. As result one way that ISPs and web mail providers employ is to do a background check on any new customers by verifying the credit cards numbers etc. For e.g. if a credit card number is stolen then the account is disabled. Similarly other background checks can be done to verify the legitimacy of the new account user.

o Confirmed opt-in for mailing lists

An issue with legal and valid mailing lists is that a spammer might subscribe a list of email addresses to a mailing list even if the list is no way connected to the spammer. This would result in bad reputation for a valid and legitimate mailing list and spam for the victims whose email addresses were input by the spammer. To counter this issue an opt-in mechanism is recommended. This basically involves sending a confirmation email to the email addresses that have requested subscription. Once the real user receives the confirmation email he or she can decide that the subscription was real or fake and if valid then confirm the subscription (usually by clicking on a link in the email).



To be continue, please catch other parts of this posting series....
------------------------------
You can read other parts of this posting list from the list below.

This posting series provide information about SPAM and Anti-Spam. It is included attack methods and techniques that are useful for administrators and users
Please note that : This posting is copied from a report of the software security class that I attened at San Jose State University in Fall 2007

0 comments: