See who is already coming to #socialrecruiting summit in November!

Blog Network

CyberSleuthing!

Expert sourcing strategy from http://aces.arbita.net/shally/

Eradicate spurious spam! follow this blog post

There are four main methods of killing spam. They can be classed as heuristic, bayesian, dictionary, and fingerprinting.

Sorry Maureen! Sometimes when its really early in the morning my fingers fail to find their correct resting places, you know, asdf and jkl; on the keyboard.

What I'm trying to say is that its a typo ;)

Thank you for finding it. I misspelled it entirely unintentionally. The correct spelling is heuristic and the meaning as I intended in that context can be summarized as software that utilizes exploratory problem-solving methods that utilize self-educating techniques to improve performance. Or specifically in this case to catch evil spamwankers. 

Heuristic scans create a profile of an email message from its headers and other core attributes to rate its likelihood of being spam. Bayesian filters use a statistical approach whereby the filtering system is trained to distinguish between spam and legitimate email using an algorithm. Dictionary scans are used to filter against particular words and phrases in the headers or body of an email. You know those words... the ones in the sometimes funny subject lines which can cause embarassment at work. Finally, email fingerprinting is used to create a hash uniquely representing known spam messages, which is a reactive rather than a predictive technique.

Instead of using a fixed set of virus definition files and known spammers blocklists which must be updated, advanced heuristic spam (and virus) detection may explore the message in a "lets try it and see" sandbox. Leting the program do what its going to do inside a "quarantine" environment it can clearly detect wheter it will behave badly if let loose on the rest of your system, without the need to have the most current definition of the virus or spam message. With spam, it explores other things besides executables like third party links, web bugs, malformed headers, spoofed addresses, spoofed URLs, etc. 

The idea with most of thinking around spam killing is not to have to have a list of spammers before you can detect who they are. Some mail providers are getting wise to these types of programs and building them right into the mail server. The host I use for Jobmachine.net employes SmarterMail which uses black lists like spamCop and ORDB, reverse DNS checks, and Bayesian Filtering to dynamically block incoming spam. I then pass everything through SpamArrest, and finally overlay my heuristics on that for messages that actually make it to my inbox in the event neither SmarterMail nor SpamArrest caught it. In addition to the heuristics I have an IP blockfile loaded in my personal proxy that blocks hundreds of thousands of known spamwanking, anti-privacy and eavesdropping IPs, and an inbox management system that runs whitelists along with rules to procees incoming mail. All in all my system handles 2,000 messages per day of which I read about 100 at most.
 
Hope this helps you all kill some spam! Die spam, die!

3 comments

Log in or register to post a reply.

  • 1 point 4 years ago

    This is what I've noticed in the last 60 days or so: Some of my email isn't getting through to people - I have a tendency to put the gist of what I'm sending in the subject lines - do you think that's why I'm encountering problems?

    And if so, isn't this an example of the spamblockers getting TOO stringent?

  • 1 point 4 years ago

    Wow, Steve, thanks for the translation. You pretty much nailed the concept of heuristics. Hey, if we replace the Bayesian Expected Value Calculation by the Risk Threshold of the Plausibility Theorem, then what is the statistical probability I would get you to translate some of my other posts?

  • 1 point 4 years ago

    I'm offering this free translation service for readers of Shally's blog...

    Having spent a few of my non-recruiting years in AI (that would be artificial intelligence; yes, there is also AS - artifical stupidity), I learned early on that heuristics are shortcuts to decision making, rules - statistical really - that we learn that can be applied to any situation to aid us in mentally moving from point A to point B. For instance, as a non-Internet names sourcer, you have learned that certain phrases can be used to open up the flood gates to org charts. When a junior names sourcer goes through the process, they may move procedurally from A to B to C to D whereas an experienced sourceress can go from A to B to D.

    In the spam world, heuristic can help decide the spaminess of an email and deal with it accordingly.

    I'm sure Probability 102 will be posted shortly...