'Knock-Knock' - A Modest Proposal For Client-Side SPAM Suppression
Tim Daneliuk (tundra@tundraware.com)
TundraWare Inc.
August 24, 2002
Copyright © 2002 TundraWare Inc. Permission to freely reproduce this material is hereby granted under the following conditions: 1) The material must be reproduced in its entirely without modification, editing, condensing, or any change. 2) No fee may be charged for the dissemination of this material. Commercial use such as publishing this material in a book or anthology is expressly forbidden. 3) Full attribution of the author and source of this material must be included in the reproduction.
OK, we all are sick of the garbage we see in our email boxes. Electronic
junk mail clutters our computers, wastes our time, and costs businesses untold
millions in the resources consumed passing this nonsense around. Unsolicited
emails are also probably the single biggest vector for passing around computer
viruses. Worst of all, they are the prime vehicle for propagating fraudulent
and otherwise illegal schemes..
So, what would a "perfect" SPAM suppression look like? In my view,
it would exhibit these properties:
- Only pass email from desired senders - i.e., 100% suppression of SPAM.
- Never suppress or discard email from legitimate senders, even if they are new to us.
- Trivially simple to learn and administer, even for a technically unsophisticated user.
- Break nothing. Must be compatible with all existing email clients/servers and standards - Worst case should be that SPAM suppression does not work with older technology, but email still flows.
- Unbreakable by any evolving bulk mailer technology - i.e., Make "harvesting" of email address ineffective.
- Easily implemented across all OS and email client platforms.
This sounds ambitious, but it may turn out to be pretty simple to do.
I hasten to point out that I'm "thinking out loud" here. This
idea may be bad, dangerous, or just dumb. If so, let me know, and I'll
slink off quietly into the weeds ...
Some technology has been brought to bear on this problem on the server
side (MAPS/RBL) as well as the client side (SPAM detection systems), but
they suffer from a number of limitations. The server side solutions
are effective only to the extent that a well-known point of SPAM origination
is known. Keeping the databases up-to-date with every new open relay
on the Internet is essentially impossible. Other solutions such as the
'sendmail' access database are administratively intensive. They're fine
for a small business or SOHO operation, but are impractical if the mail server
is responsible for a large user community getting SPAM from lots of different
places. Moreover, most SPAM has forged headers making it effectively
impossible to determine the true point-of-origination of the message.
The client-side solutions also work to some degree by doing textual analysis
and scoring a message to see if it is legitimate. However, this too
has a number of problems. First, all such approaches to-date (at least
that I have seen) are heuristic in method - there is not a canonical method
of separating junk email from desired email with 100% correctness. Secondly,
these kinds of systems are typically more complex to set up than the
average non-technical user can probably handle. Thirdly, these systems
require a fair amount of CPU horsepower to run the textual analysis heuristic.
That's fine on a Pentium, but what about a mobile device like a PDA
or a cell phone? Here computation time translates directly into reduced
battery life, the bane of all traveling devices.
I make no claim the the approach outlined below is entirely new or novel.
Elements of this exist already, but I've not yet seen it all packaged
into a single mail client (if it has been, please let me know), and it is
this integration of features that makes this approach work, IMHO. I
call this the 'Knock-Knock" method.
The central idea here is that instead of trying to identify SPAM, we design
a system to recognize legitimate senders and discard all else as SPAM.
In effect, we want to push the "opt in" mechanism to the end-user and
take it away from the bulk mailer. This has to be a client-side technology
because who a "legitimate sender" is will vary considerably by individual
email recipient. To do this, I would suggest adding the following behaviors
to every mail client. They're not complicated and ought to serve pretty
much every one of the goals stated above:
There are also some advanced twists we might want to add to our client:
- By default, disallow any message not originating from an address found in the Address Book. Most modern email clients already have a filtering mechanism. This is effectively just another kind of filter - "If Sender Not Found In Address Book, Discard Message", is roughly the semantic here.
- Now we have to solve the problem of receiving email from someone who is not yet in our Address Book. Easy. Define an X-Header to carry a 'Knock-Knock" access ID (KKID). When I hand you my business card, it has my email address and that ID on it. I could even put it in my email signature using some the same subterfuge people already are using to mask their email addresses from 'bot address harvesters ("KKID is 2 2 3 4 backwards without spaces divided by 2 ..."). When you send me email, your 'Knock-Knock' enabled client has a place for you to enter this ID. If you add me to your address book, there would also be a place to permanently store that ID. When I receive the mail, my client first looks at the Address Book. If you're not there, it looks for the 'Knock-Knock' ID. Upon finding it, the message is displayed.
It is important that the ID would be stripped off in any reply or forwarding of that message. There is some remote possibility that a 'bot could be written to harvest your email address and ID, but doing so would require access to upstream MTA that handles the senders and your email. This is not impossible, but unlikely. If the major ISPs like AOL or Hotmail started doing this kind of harvesting, it would provide a nice commerical opportunity for someone like Earthlink to guarantee they did not do this.
- Enable/disable passing email from a particular sender based on whether/not it is PGP (or whatever) encoded. In effect, this could force senders to encrypt their messages if they wanted to talk to us. This not only raises the general level of email security, it would help enable corporations to enforce security policy systemically (by configuring/buying a client that made this mandatory). This might be a bad idea, I dunno...
- For advanced users, give the client the similar functionality to that of the 'sendmail' access database, but in this case, for the client. The user would enter email addresses or wildcards and whether to allow/disallow them. This is a bit dangerous because email headers are so easily spoofed. If I allow "mydomain.com", it wouldn't be real hard for a bulk emailer to start sending me emails that looked like they came from within my own domain. Then again, the existing 'sendmail' access scheme allow access control based on IP address as well, and this would work pretty well here also.
- Provide an option to respond to email that fails the delivery test - "Hi, your email could not be delivered because you're not in my Address Book and didn't send a legitimate 'Knock-Knock" ID." Obviously, under no circumstances should the KKID also be sent in such a reply.
The first two alone ought to knock out pretty much all SPAM, which meets our original objectives. For really strict SPAM suppression, you could turn off the last three and require presence in an Address Book for validation. Moreover, it is being done it a way that is entirely compliant with existing RFCs. The X-Header would be ignored by non-'Knock-Knock' clients who would deliver such mail unconditionally. The only problem here is that such clients could potentially forward such a message with the KKID intact. The more this gets passed around electronically, the more exposed you become to having it harvested and used to get into your system. In that case, you could switch to a new KKID - only people who were not yet in your Address Book and had your old KKID would be affected. This is still way better, IMHO, than putting up with bags of SPAM.
- Sender known via Address Book
- Sender recognized via 'Knock-Knock" ID
- Sender allowed via access database
- Sender recogized via encryption
- This does not reduce the load on email servers. Until something better comes along, servers are going to continue to need services like RBL/MAPS to have some control of their mail volume. One approach that might work would be to add a mechanism to automatically export each user's client 'Knock-Knock' and address data to the upstream MTA to create a system-wide database. This would be a major undertaking because it would require retooling the common MTAs like 'sendmail'. Also, the issue of data cleanliness would have to be considered - you don't want a cobbled client database blowing up the central email server. It is also problematic because many of us (I among them) would be unwilling to have my address book exported to my ISP. But, it would be a Pretty Good Thing in a corporate environment running its own mail MTAs. It would also be pretty straightforward to implement something like this if the organization was running a central LDAP (or whatever) server for addresses.
- The majority of email-borne viruses work by forging the mail envelope so it looks like the email is from a legitimate sender. Thus, even if 'Knock-Knock' is implemented, such email would be recognized as legitimate and be passed to the recipient. Fortunately, the current crop of anti-virus tools seems to have this scenario pretty well covered at least for the known virus types.