Tag Archives: Search Engines

Trackback Spam Gateway

It’s over. My referrer experiment is over… at least, in its current form. Today, I roll out firsttube.com referrer gateway version 1.0. That makes it sound fancy, but it’s not. Basically, it’s PHP to prevent trackback spam.

Traffic at firsttube.com has grown steadily, for some reason, and the logs reveal it: we get a TON of traffic from search engines, and the most popular terms are surprising – sensitive readers beware – here are the terms that most frequently drive people here:

cumtube, red-tube, uporn, adult youtube, milf, gay tube, tube 8 and many more equally odd terms.

You know why? Because, in a shrewd move that search engines seem to love, I display links back to my referrers, thinking they are trackbacks. But when it’s not from Google, Yahoo, Live.com, or OSNews, it’s most often spam. Why? Because not only are we using the name “tube” in our title, but with each erroneous entry, we tell the search engine it’s a good thing by back-linking to that search. In short, I’m perpetuating the problem. As a result, dozens of spammers have begun issuing basic GET requests in the hundreds placing their sites in my referrer lists.

Some time ago, I began the battle by adding rel=”nofollow” to all outgoing links not added via the admin section. But alas, that wasn’t good enough, the spammer didn’t care, so I implemented a pre-check, whereby referrers are, via regular expressions, matched against a list of known crap. As of today, there are 36 terms that I actively filter. In time, this will be performance intensive, if it isn’t already.

Thus, a gateway. Now, *all* referring traffic goes into a temp table, and each entry must be approved. I wrote a nice tool to batch import, batch delete, or even approve based on certain filters, such as domain or term. As it matures and I get an idea of time, I will “whitelist” certain domains that can immediately post to the referrer table. In the meantime, I need to decide if I want to filter referrers with obscene unrelated terms or just leave them and let the magic run its course; after all, these are not “spam,” they are simply organic mistakes. An argument could be made that it’s interesting, and therefore, mostly the reason to post referrers, to see what terms and sites around the internet drive traffic to a site.

Anyway, spammers, take note: I gotcher number! Stop referrer spamming me! That means you , you stupid lyrics sites!

Tagged , , ,

Trackback Spam, Again

Once again, I am dealing with trackback spam, aka referrer spam. Since firsttube.com records the pages that refer hits to us, I’ve had to deal with jerks who issue HTTP requests so that they get a link back. Too bad they don’t realize that every referrer gets a rel=”nofollow” attribute (more here).

So, I had to issue these SQL statements to the database today:

DELETE FROM user_agent_table
WHERE (referrer LIKE 'http://mp3%' OR referrer LIKE '%mp3.com%')
 
DELETE FROM user_agent_table
WHERE referrer LIKE '%musicforum.org%'

Musicforum.org has some asshole posting all sorts of links that pass a GET variable with a firsttube.com URL in it, which appears to do nothing other than ping the page. So, effective immediately, we run a regex validator on referrers and will be doing a more frequent clean up.

Hear that spammers? Take your crap elsewhere.

Tagged , , ,

Trackback Spam

There is a new trend out there, one that hasn’t received much coverage, but it’s a big deal, and it’s getting bigger. As user generated content becomes more and more prevalent, we have a new type of spam out there: trackback spam. On my blog, beneath all of the entries (above the comments), there is a section that shows you the user agents that loaded that page as well as the refering pages. I recently discovered something: people gaming the system. Read on…
Continue reading

Tagged , , ,