michael orlitzky

Overview of email spam and forgery countermeasures

Introduction

An insane number of technologies comprise a modern email system. Among these technologies are a growing list of spam and forgery countermeasures; the former are meant to prevent unwanted junk mail, while the latter are meant to ensure that the “sender” is who he says he is.

The situation is going to get worse before it gets better. The following is a very non-comprehensive (incomprehensive? ha ha) list of the anti-spam and anti-forgery techniques used today.

Spam

Bare-newline Test

The SMTP (email) protocol is line-based. Its lines are delineated by carriage-return/linefeed pairs, i.e. <CR><LF>.

However, the authors of junk mail software have yet to read this specification, because in their free time, they're too busy fucking little children and robbing holocaust survivors. So instead they send a bare newline (i.e. <LF>) character. By rejecting clients who send a bare newline, you can prevent a good deal of botnet zombie spam.

All real SMTP servers implement the protocol correctly; therefore the risk of false positives is low. A few legitimate senders may have e.g. ancient internal Java software that could cause trouble.

For an example implementation of this restriction, see Postscreen .

Blacklists

Blacklists are probably the most common anti-spam measure. If a mail server is sending spam, there are a number of organizations who monitor and keep a record of that fact. See for example Spamhaus. These organizations then publish a “black” list of the servers that are spamming.

When someone is receiving a message that could potentially be spam, he can look up the sender on these blacklists. If the sender is blacklisted, it is likely that the message in transit is spam, and the recipient can reject it.

Blacklisting is prone to false positives due to lazy shared hosts (who rent to spammers), and freemail companies like Yahoo, AOL, and Hotmail, who are all assholes of an unrivaled caliber. Nevertheless, blacklisting is the single most effective anti-spam measure that we have today.

For a very lengthier discussion of blacklisting, see So you're Blacklisted….

Forward-confirmed Reverse DNS

This is a stronger version of the Reverse DNS (PTR) Requirement.

The “forward DNS” for a mail server's hostname should contain an IP address. The reverse DNS for that IP address should then contain a hostname, and that hostname should match the one you started with: the mail server's hostname.

For example, if mail.example.com resolves to 192.168.1.100, and 192.168.1.100 has a PTR record, then that record should contain mail.example.com. This is simply common sense and good DNS design. For a mail server, there is no reason why this should not be the case.

However, there is no requirement in the RFC or elsewhere that mail servers have forward-confirmed reverse DNS. So while it is a good practice, many legitimate mail servers fail to implement it. The risk of false positives is high.

For an example implementation of this restriction, see Postfix's reject_unknown_client_hostname .

Fully-qualified HELO/EHLO Hostname Requirement

The specification for SMTP, RFC5321, states,

The domain name given in the EHLO command MUST be either a primary host name (a domain name that resolves to an address RR) or, if the host has no name, an address literal…

The HELO/EHLO hostname is under the control of the administrator of the sending server. By rejecting names that are not fully-qualified hostnames, you reject mail from people who are too stupid to configure their servers according to the specification. This includes a large number of junk mailers, but also a good number of Windows admins (if you send “HELO EXCHGSVR01”, you-a culpa).

The risk for false positives here is medium, but that risk is offset by how trivial it is to fix. If there is a real human responsible for the sending server, all he or she needs to do is pick a hostname—any hostname!— and set it as the HELO/EHLO name.

For an example implementation of this restriction, see Postfix's reject_non_fqdn_helo_hostname .

Generic RDNS

The Reverse DNS (PTR) Requirement is great for stopping botnet zombies, but some ISPs have inadvertently neutered it. Comcast, for example, assigns a generic reverse DNS entry for every IP address on its network. For example, 192.168.1.100 might have a PTR for c-192-168-1-100.hsd1.md.comcast.net. When every host has reverse DNS, it doesn't distinguish the bad guys any more.

Fortunately, these ISPs all use a pretty obvious pattern in their generic RDNS. So it is possible to reject mail from machines that have RDNS, but whose RDNS looks generic. The most popular list of such patterns is fqrdns.pcre, which is designed to be used with Postfix.

The risk of false positives would be low if one were careful to audit the patterns. However, when using someone else's list, there are simply too many patterns to verify. So the risk is medium to high.

For an example implementation of this restriction, see Postfix's check_reverse_client_hostname_access combined with the fqrdns.pcre file.

Greylisting

The name “greylisting” is meant to indicate something between “whitelisting” and “blacklisting.”

Greylisting delays some incoming mail (usually from new senders/servers) for a few minutes. The specification for SMTP, RFC5321, states that if the sender receives a temporary 4xx error, it must retry sending at a later time. Any real mail server will do so. But the implementation details of a retry queue are messy, and so most botnet zombies don't have the ability to resend their failed spam. Therefore, they never attempt to send it a second time, and the spam is never accepted.

Any legitimate mail software (that has not been misconfigured) will try the message again. Therefore the risk of false positives is extremely low. The downside of greylisting is instead that some messages will be delayed for a few minutes. Caching can eliminate this penalty most of the time.

For an example implementation of this restriction, see Postscreen, which implements greylisting by accident when deep protocol tests are enabled:

When any "deep protocol tests" are configured, postscreen(8) cannot hand off the “live” connection to a Postfix SMTP server process in the middle of the session. Instead, postscreen(8) defers mail delivery attempts with a 4XX status, logs the helo/sender/recipient information, and waits for the client to disconnect. The next time the client connects it will be allowed to talk to a Postfix SMTP server process to deliver its mail. postscreen(8) mitigates the impact of this limitation by giving deep protocol tests a long expiration time.

HELO/EHLO Requirement

The specification for SMTP, RFC5321, states,

A session that will contain mail transactions MUST first be initialized by the use of the EHLO command. An SMTP server SHOULD accept commands for non-mail transactions (e.g., VRFY or EXPN) without this initialization.

Some mail servers will accept mail even if the sender does not send this HELO or EHLO greeting. By requiring the sender to follow the rules, you eliminate a large number of junk mailers that don't implement the specification properly.

Almost all mailers (even junk mailers) send HELO/EHLO these days, so the risk here is low.

For an example implementation of this restriction, see Postfix's smtpd_helo_required .

HELO/EHLO Syntax Validity

The specification for SMTP, RFC5321, states,

The domain name given in the EHLO command MUST be either a primary host name (a domain name that resolves to an address RR) or, if the host has no name, an address literal…

Since the HELO/EHLO name needs to be a valid hostname, it cannot contain (for example) the characters @#$%!, etc. If it does, it is obviously invalid and can be rejected right away.

This is much safer than the Fully-qualified HELO/EHLO Hostname Requirement, since even Microsoft Exchange doesn't come configured with invalid characters out-of-the-box.

For an example implementation of this restriction, see Postfix's reject_invalid_helo_hostname .

Non-SMTP Command Test

This test is similar to the bare newline test. If a sender attempts to use invalid commands, it was likely written in a hurry by someone who didn't know what he was doing. In other words, it isn't a real mail server.

For an example implementation of this restriction, see Postscreen .

Premature Pipelining Test

This test is similar to the bare newline test. If a sender attempts to use SMTP pipelining before the recipient has agreed to it, then the client was likely written in a hurry by someone who didn't know what he was doing. In other words, it isn't a real mail server.

For an example implementation of this restriction, see Postscreen .

Reverse DNS (PTR) Requirement

A good deal of spam comes from infected home/office PCs. These PCs usually have dynamic IP addresses, and dynamic IPs don't often have reverse DNS associated with them, since the host behind the IP address changes frequently.

Legitimate mail servers, on the other hand, have fixed IP addresses. If you have a fixed IP address, and are in fact the owner of that IP address (and not just the author of some virus/malware that inhabits it), then it is possible and desirable to create a reverse DNS record for that IP address.

By rejecting mail from addresses that do not have reverse DNS—the idea is—you eliminate mail from the former but not the latter. There is a small risk of false positives for legitimate hosts that do not have reverse DNS in place. However, the risk is small due to the widespread use of this restriction across the internet. The risk of false positives is also mitigated by the ease with which it can be corrected: the administrator of the sending server simply needs to create a PTR record—any record!—for the IP address of the server.

For an example implementation of this restriction, see Postfix's reject_unknown_reverse_client_hostname .

SpamAssassin Content Filtering

While most of the other tests are based on the SMTP protocol and the reputation of the sending server, it is also possible to inspect the body of the message. This is a much larger class of tests, and each one is assigned a score by the SpamAssassin program. Messages with too high a score are classified as spam and either quarantined or rejected outright. Virus scanning is included in this category.

Inspection of the message body and attachments requires more resources than the other, more superficial tests. It is therefore impractical to perform these body tests on every message. In most cases, a message is scanned only if the sender is not blacklisted. This is discussed in So you're Blacklisted….

The risk of false positives is variable, since the “spam threshold” is up to the administrator and can be set to taste. The administrator can also override the “spam score” for certain tests.

Valid Domain Requirement

If the sender is using a domain that doesn't exist, the email address is certainly forged. There is no reason to accept mail from a domain that doesn't exist. There is also no risk of false positives.

The only imaginable scenario where a legitimate message could come from a nonexistent domain would be if there was a typo in the sender's email client or mail server configuration. But in that case, very few things would work, and the problem would soon be corrected.

For an example implementation of this restriction, see Postfix's reject_unknown_sender_domain .

Forgery

Sender Policy Framework (SPF)

It is a little-known fact that the “From” address of a message is utter nonsense. It is set by the sender, and can be whatever he or she wants it to be. SPF does not address this problem, but it solves a related one: it makes sure the bounce address (i.e. the envelope sender) is not forged.

SPF works through DNS. The administrator for a domain—say, example.com—knows which mail servers should be sending mail for addresses @example.com. He places an SPF record in the DNS that lists those servers. Anyone receiving a message from an address @example.com can then check the record, and if the sending server is not listed, reject the message as a forgery.

SPF is completely opt-in. By implementing it as a sender, you can only hurt yourself and cause more of your mail to be rejected. Therefore, you would think that the false positive risk is low: anyone worried about rejections could do… nothing! That noblest of things! Alas, there is an unending supply of people who think that they need an SPF record, but can't be bothered to figure out what SPF is or how it works. So false positives do occur. But the fix is trivial: tell them to delete the goddamn record!

For an example implementation of SPF, see pypolicyd-spf .

Domain-based Message Authentication, Reporting & Conformance (DMARC)

DMARC is a companion to SPF and DKIM. The purpose of DMARC is to allow domain owners to indicate what recipients should do if a message fails either SPF or DKIM. For example, some domain owners would prefer recipients to reject the (potential) forgery outright. Others may wish the message to be accepted normally—for example, while they are testing their SPF/DKIM implementations. This preference is indicated with a DNS record.

DomainKeys Identified Mail (DKIM)

DKIM does for the “From” address what SPF does for the envelope sender: it prevents forgery. It involves cryptographically signing portions of the message, and then essentially listing the valid signatures in DNS. If someone forges an address in a domain which implements DKIM, the message will lack the corresponding signature, and recipients can reject it.

Since signature verification involves the body of the message, it usually takes place in a content filter like SpamAssassin.

People are still figuring this out, so it is not safe to reject a message due to an invalid DKIM signature at this time. However, an invalid signature can still be given a bad “spam score.”