Fighting Spam
With the recent news being splashed around about spam appearing in ‘open’ systems like Yahoo and Google, some have taken a very harsh approach on UGC.
At iBegin we were soliciting ‘reviewed’ submissions (like Yahoo) over two years, and then instantaneous-updates (ala Google) over a year ago (over at iBegin Source), and over that time we’ve become quite aware of a dozen different kinds of spam. While I don’t want to throw open the entire book, some interesting things that should be in the conversation:
Track it All (User Behavior)
This goes without saying, but all updates submissions are tracked. It doesn’t take a genius to figure out when a single IP comes out of nowhere, submits a business, edits three businesses in a related category, and then doesn’t come back. It does require some processing power to do this, but the problem is smaller than you think (by focusing on recent modifications, and not all listings).
Still this covers most of it. I was shocked when I read that Google Maps only allowed 5 edits. Literally - I was shocked! The world’s most powerful search/data processing company cannot deal with more than 5 edits at a time? User behavior includes building a user trustrank - how reliable have they been in the past? One good user may do 100 legitimate updates, whereas 5 bad users may do a total of 10 bad updates. Is it worth pissing off the good even though he produces 10x? Even more of an issue when you consider that all edits can be reverted with a click of a button.
Still, the truth is that …
Most Spammers are Quite Stupid
And by spammers I mean anyone trying to ruin the intent of the local site (modifying competitors, submitting faux reviews, etc). When we still actively ran out city site like iBegin Toronto, the amount of spam being ‘obvious’ was stunning. Of course the my favorite involved user ‘companyx’ who would submit ‘Company X’, review it saying how it was the best, all while using ‘xxx@companyx.com’ This happens a lot more than you think!
Users want to do Good
Looking back at our numbers, far less than 1% of updates/submissions are incorrect. Less than half of those that are incorrect are with an ‘evil’ intent. Most are accidents - typos, didn’t realize they had moved, etc. Empowering the user is a damn good thing
You Need to Verify
Yahoo has a ‘review submission’, yet somehow let in over a thousand affiliate links. Google doesn’t seem to be verifying (or is overwhelmed) - I submitted a slight typo update to the local Marriott (eg changed the URL to mariot.com, which redirects fine), and it hasn’t been caught in 3+ business days. It is unfortunate, it is a pain in the ass … but you have to verify. Most verifications can easily be done via eye-balling - eg adding a website, you just click on it and can quickly see via WHOIS/info on the page if it matches.
Addresses Match
Part of getting to that verification process is to flag entries that are questionable. So if someone submits an address that matches 3+ businesses or is the address of an established mailbox source, you flag it. There are many other combinations of data analysis that can help aid you in flagging questionable records.
Businesses where their legal name is stuffed with keywords
We ran into this about a year ago. Some company had legally changed its name to have over 10 keywords stuffed in their name. Eg if I was a fishing company, it was something like ‘Bass Fishing Bait Reeling Fly Ice’ etc etc. They had successfully submitted to companies like YellowPages.com and SuperPages.com, and were asking why we kept shortening their name.
The conundrum is obvious - the intent behind their legal name is obvious, but is it any good for our users (and customers?) There is a lot of talk about optimizing for search engines (including this hilarious riff on ‘If Google had to optimize for Google’), but no one yet has mentioned the use of keyword stuffing in the naming of a legitimate company.
While the above deals with 99% of the issues, there are still borderline issues to deal with. Eg Pay Day - what is to stop me from registering ‘Pay Day Loans of Toronto’ as an incorporated company, and then selling lead gens just through my website? Even though I am nothing more than a glorified affiliate site, I can claim I am just like the hundreds of mortgage brokers found throughout North America. I am sure they get listed - why wouldn’t I?
Going on my above ‘Addresses Match’, what about businesses that are sole proprietorships and work from home? I wouldn’t want to give out my home address, which means I need to use a mailbox. Does that preclude me from local listings? Most local search has a distance factor built into it - so where do I get established? My mailbox? City center? Nowhere?! Not an easy answer.
Hopefully my post has helped think about the context of spam in local search, and I would love to hear ideas on both fighting spam, and also borderline issues (I love gray-area issues - source of many interesting conversations).
March 28th, 2008 at 8:57 pm
all of it seems to require manual work, so manual verification and moderation is a must
March 29th, 2008 at 5:07 am
True Andre, but with the help of algorithms (like the point with “the single IP out of nowhere” mentioned in paragraph 3) you can minimize the amount of data that needs to be verified manually.