'Knock-Knock' - A Modest Proposal For Client-Side SPAM Suppression

Tim Daneliuk (tundra@tundraware.com)
TundraWare Inc.

August 24, 2002

Copyright © 2002 TundraWare Inc. Permission to freely reproduce this material is hereby granted under the following conditions: 1) The material must be reproduced in its entirely without modification, editing, condensing, or any change. 2) No fee may be charged for the dissemination of this material. Commercial use such as publishing this material in a book or anthology is expressly forbidden. 3) Full attribution of the author and source of this material must be included in the reproduction.

Defining The Problem

OK, we all are sick of the garbage we see in our email boxes. Electronic junk mail clutters our computers, wastes our time, and costs businesses untold millions in the resources consumed passing this nonsense around. Unsolicited emails are also probably the single biggest vector for passing around computer viruses. Worst of all, they are the prime vehicle for propagating fraudulent and otherwise illegal schemes..

So, what would a "perfect" SPAM suppression look like? In my view, it would exhibit these properties:

Only pass email from desired senders - i.e., 100% suppression of SPAM.

Never suppress or discard email from legitimate senders, even if they are new to us.

Trivially simple to learn and administer, even for a technically unsophisticated user.

Break nothing. Must be compatible with all existing email clients/servers and standards - Worst case should be that SPAM suppression does not work with older technology, but email still flows.

Unbreakable by any evolving bulk mailer technology - i.e., Make "harvesting" of email address ineffective.

Easily implemented across all OS and email client platforms.

This sounds ambitious, but it may turn out to be pretty simple to do. I hasten to point out that I'm "thinking out loud" here. This idea may be bad, dangerous, or just dumb. If so, let me know, and I'll slink off quietly into the weeds ...

Why Current Anti-SPAM 'Solutions' Aren't

Some technology has been brought to bear on this problem on the server side (MAPS/RBL) as well as the client side (SPAM detection systems), but they suffer from a number of limitations. The server side solutions are effective only to the extent that a well-known point of SPAM origination is known. Keeping the databases up-to-date with every new open relay on the Internet is essentially impossible. Other solutions such as the 'sendmail' access database are administratively intensive. They're fine for a small business or SOHO operation, but are impractical if the mail server is responsible for a large user community getting SPAM from lots of different places. Moreover, most SPAM has forged headers making it effectively impossible to determine the true point-of-origination of the message.

The client-side solutions also work to some degree by doing textual analysis and scoring a message to see if it is legitimate. However, this too has a number of problems. First, all such approaches to-date (at least that I have seen) are heuristic in method - there is not a canonical method of separating junk email from desired email with 100% correctness. Secondly, these kinds of systems are typically more complex to set up than the average non-technical user can probably handle. Thirdly, these systems require a fair amount of CPU horsepower to run the textual analysis heuristic. That's fine on a Pentium, but what about a mobile device like a PDA or a cell phone? Here computation time translates directly into reduced battery life, the bane of all traveling devices.

A Modest Proposal - 'Knock-Knock"

I make no claim the the approach outlined below is entirely new or novel. Elements of this exist already, but I've not yet seen it all packaged into a single mail client (if it has been, please let me know), and it is this integration of features that makes this approach work, IMHO. I call this the 'Knock-Knock" method.

The central idea here is that instead of trying to identify SPAM, we design a system to recognize legitimate senders and discard all else as SPAM. In effect, we want to push the "opt in" mechanism to the end-user and take it away from the bulk mailer. This has to be a client-side technology because who a "legitimate sender" is will vary considerably by individual email recipient. To do this, I would suggest adding the following behaviors to every mail client. They're not complicated and ought to serve pretty much every one of the goals stated above:

By default, disallow any message not originating from an address found in the Address Book. Most modern email clients already have a filtering mechanism. This is effectively just another kind of filter - "If Sender Not Found In Address Book, Discard Message", is roughly the semantic here.

Now we have to solve the problem of receiving email from someone who is not yet in our Address Book. Easy. Define an X-Header to carry a 'Knock-Knock" access ID (KKID). When I hand you my business card, it has my email address and that ID on it. I could even put it in my email signature using some the same subterfuge people already are using to mask their email addresses from 'bot address harvesters ("KKID is 2 2 3 4 backwards without spaces divided by 2 ..."). When you send me email, your 'Knock-Knock' enabled client has a place for you to enter this ID. If you add me to your address book, there would also be a place to permanently store that ID. When I receive the mail, my client first looks at the Address Book. If you're not there, it looks for the 'Knock-Knock' ID. Upon finding it, the message is displayed.

It is important that the ID would be stripped off in any reply or forwarding of that message. There is some remote possibility that a 'bot could be written to harvest your email address and ID, but doing so would require access to upstream MTA that handles the senders and your email. This is not impossible, but unlikely. If the major ISPs like AOL or Hotmail started doing this kind of harvesting, it would provide a nice commerical opportunity for someone like Earthlink to guarantee they did not do this.

There are also some advanced twists we might want to add to our client:

Enable/disable passing email from a particular sender based on whether/not it is PGP (or whatever) encoded. In effect, this could force senders to encrypt their messages if they wanted to talk to us. This not only raises the general level of email security, it would help enable corporations to enforce security policy systemically (by configuring/buying a client that made this mandatory). This might be a bad idea, I dunno...

For advanced users, give the client the similar functionality to that of the 'sendmail' access database, but in this case, for the client. The user would enter email addresses or wildcards and whether to allow/disallow them. This is a bit dangerous because email headers are so easily spoofed. If I allow "mydomain.com", it wouldn't be real hard for a bulk emailer to start sending me emails that looked like they came from within my own domain. Then again, the existing 'sendmail' access scheme allow access control based on IP address as well, and this would work pretty well here also.

Provide an option to respond to email that fails the delivery test - "Hi, your email could not be delivered because you're not in my Address Book and didn't send a legitimate 'Knock-Knock" ID." Obviously, under no circumstances should the KKID also be sent in such a reply.

In summary, then, we enable email passing into an Inbox based on multiple selection criteria:

Sender known via Address Book

Sender recognized via 'Knock-Knock" ID

Sender allowed via access database

Sender recogized via encryption

The first two alone ought to knock out pretty much all SPAM, which meets our original objectives. For really strict SPAM suppression, you could turn off the last three and require presence in an Address Book for validation. Moreover, it is being done it a way that is entirely compliant with existing RFCs. The X-Header would be ignored by non-'Knock-Knock' clients who would deliver such mail unconditionally. The only problem here is that such clients could potentially forward such a message with the KKID intact. The more this gets passed around electronically, the more exposed you become to having it harvested and used to get into your system. In that case, you could switch to a new KKID - only people who were not yet in your Address Book and had your old KKID would be affected. This is still way better, IMHO, than putting up with bags of SPAM.

What This Will NOT Solve

Assuming this all works as it should, there are still some problems this aproach does not solve:

This does not reduce the load on email servers. Until something better comes along, servers are going to continue to need services like RBL/MAPS to have some control of their mail volume. One approach that might work would be to add a mechanism to automatically export each user's client 'Knock-Knock' and address data to the upstream MTA to create a system-wide database. This would be a major undertaking because it would require retooling the common MTAs like 'sendmail'. Also, the issue of data cleanliness would have to be considered - you don't want a cobbled client database blowing up the central email server. It is also problematic because many of us (I among them) would be unwilling to have my address book exported to my ISP. But, it would be a Pretty Good Thing in a corporate environment running its own mail MTAs. It would also be pretty straightforward to implement something like this if the organization was running a central LDAP (or whatever) server for addresses.

The majority of email-borne viruses work by forging the mail envelope so it looks like the email is from a legitimate sender. Thus, even if 'Knock-Knock' is implemented, such email would be recognized as legitimate and be passed to the recipient. Fortunately, the current crop of anti-virus tools seems to have this scenario pretty well covered at least for the known virus types.

What About 'Free Speech'?

Finally, we should dispose of the (really stupid) argument that suppressing unwanted email is a form of limiting Free Speech - a value dear to all free societies. This argument has been used by bulk emailers in legal action against some of the existing anti-SPAM mechanisms, and this argument is complete nonsense. In a free society, you are absolutely entitled to speak your mind as you see fit with few limitations. "Speech", in this case, has been defined by the courts to pretty much embrace any form of personal expression including written works, music, photographs, movies, and yes, email. There are a few limitations on such expression - your Free Speech does not include the right to engage in fraud, violence, threats, and so forth, but that's pretty much it. However, and this is key, an individual's right to free expression does not include the right to be heard! I have no moral or legal obligation to listen just because you are talking. Arguing that suppressing SPAM inhibits Free Speech is effectively arguing that the mass emailers have an even more specific right to make you listen to their foolishness. Hogwash!

One other reason I am drawn to this approach is that it would remove the instinct for the Government to stick its Big Nose into the issue. The Congress Critters are making noises about finding a regulatory "solution" to SPAM. This means it will be ineffective, complicated, expensive, and useless. Better we should find our own solution without "help" from Washington.

Conclusions

Like I said, I'm just Thinking Out Loud here. I'd be very interested in comments, input, and improvements...