<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>firsttube.com &#187; Trackback Spam</title>
	<atom:link href="http://www.firsttube.com/tag/trackback-spam/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.firsttube.com</link>
	<description>crunchy nuggets, served semi-daily</description>
	<lastBuildDate>Tue, 03 Jan 2012 00:14:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Trackback Spam Gateway</title>
		<link>http://www.firsttube.com/read/Trackback-Spam-Gateway/</link>
		<comments>http://www.firsttube.com/read/Trackback-Spam-Gateway/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 20:33:46 +0000</pubDate>
		<dc:creator>Adam S</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Trackback Spam]]></category>

		<guid isPermaLink="false">http://firsttubecom/read/Trackback-Spam-Gateway</guid>
		<description><![CDATA[It&#8217;s over. My referrer experiment is over&#8230; at least, in its current form. Today, I roll out firsttube.com referrer gateway version 1.0. That makes it sound fancy, but it&#8217;s not. Basically, it&#8217;s PHP to prevent trackback spam. Traffic at firsttube.com &#8230; <a href="http://www.firsttube.com/read/Trackback-Spam-Gateway/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s over.  My referrer experiment is over&#8230; at least, in its current form.  Today, I roll out <a href='http://firsttube.com'>firsttube.com</a> referrer gateway version 1.0.  That makes it sound fancy, but it&#8217;s not.  Basically, it&#8217;s PHP to prevent <a href="http://firsttube.com/tag/trackback_spam">trackback spam</a>.</p>
<p>Traffic at <a href='http://firsttube.com'>firsttube.com</a> has grown steadily, for some reason, and the logs reveal it: we get a TON of traffic from search engines, and the most popular terms are surprising &#8211; sensitive readers beware &#8211; here are the terms that most frequently drive people here: </p>
<p>cumtube, red-tube, uporn, adult youtube, milf, gay tube, tube 8 and many more equally odd terms.   </p>
<p>You know why? Because, in a shrewd move that search engines seem to love, I display links back to my referrers, thinking they are trackbacks.  But when it&#8217;s not from <a href="http://google.com">Google</a>, <a href="http://yahoo.com">Yahoo</a>, <a href="http://live.com">Live.com</a>, or <a href="http://osnews.com">OSNews</a>, it&#8217;s most often spam.  Why? Because not only are we using the name &#8220;tube&#8221; in our title, but with each <b>erroneous</b> entry, we tell the search engine it&#8217;s a good thing by back-linking to that search.  In short, I&#8217;m perpetuating the problem.  As a result, dozens of spammers have begun issuing basic GET requests in the hundreds placing their sites in my referrer lists.  </p>
<p>Some time ago, I began the battle by adding <a href="http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html">rel=&#8221;nofollow&#8221;</a> to all outgoing links not added via the admin section.  But alas, that wasn&#8217;t good enough, the spammer didn&#8217;t care, so I implemented a pre-check, whereby referrers are, via regular expressions, matched against a list of known crap.  As of today, there are 36 terms that I actively filter.  In time, this will be performance intensive, if it isn&#8217;t already.  </p>
<p>Thus, a gateway.  Now, *all* referring traffic goes into a temp table, and each entry must be approved.  I wrote a nice tool to batch import, batch delete, or even approve based on certain filters, such as domain or term.   As it matures and I get an idea of time, I will &#8220;<a href="http://en.wikipedia.org/wiki/Whitelist">whitelist</a>&#8221; certain domains that can immediately post to the referrer table.  In the meantime, I need to decide if I want to filter referrers with obscene unrelated terms or just leave them and let the magic run its course; after all, these are not &#8220;spam,&#8221; they are simply organic mistakes.   An argument could be made that it&#8217;s interesting, and therefore, mostly the reason to post referrers, to see what terms and sites around the internet drive traffic to a site.  </p>
<p>Anyway, spammers, take note: I gotcher number! Stop referrer spamming me! That means you , you stupid lyrics sites!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.firsttube.com/read/Trackback-Spam-Gateway/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Trackback Spam, Again</title>
		<link>http://www.firsttube.com/read/trackback-spam-again/</link>
		<comments>http://www.firsttube.com/read/trackback-spam-again/#comments</comments>
		<pubDate>Wed, 23 Jan 2008 15:30:34 +0000</pubDate>
		<dc:creator>Adam S</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Trackback Spam]]></category>

		<guid isPermaLink="false">http://firsttubecom/read/Trackback-Spam-Again</guid>
		<description><![CDATA[Once again, I am dealing with trackback spam, aka referrer spam. Since firsttube.com records the pages that refer hits to us, I&#8217;ve had to deal with jerks who issue HTTP requests so that they get a link back. Too bad &#8230; <a href="http://www.firsttube.com/read/trackback-spam-again/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Once again, I am dealing with <a href="http://firsttube.com/read/Trackback-Spam">trackback spam</a>, aka referrer spam.  Since <a href='http://firsttube.com'>firsttube.com</a> records the pages that refer hits to us, I&#8217;ve had to deal with jerks who issue HTTP requests so that they get a link back.  Too bad they don&#8217;t realize that every referrer gets a <a href="http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html">rel=&#8221;nofollow&#8221; attribute</a> (<a href="http://en.wikipedia.org/wiki/Nofollow">more here</a>).</p>
<p>So, I had to issue these SQL statements to the database today:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">DELETE</span> <span style="color: #993333; font-weight: bold;">FROM</span> user_agent_table
<span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">&#40;</span>referrer <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'http://mp3%'</span> <span style="color: #993333; font-weight: bold;">OR</span> referrer <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'%mp3.com%'</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">DELETE</span> <span style="color: #993333; font-weight: bold;">FROM</span> user_agent_table
<span style="color: #993333; font-weight: bold;">WHERE</span> referrer <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'%musicforum.org%'</span></pre></div></div>

<p>Musicforum.org has some asshole posting all sorts of links that pass a GET variable with a <a href='http://firsttube.com'>firsttube.com</a> URL in it, which appears to do nothing other than ping the page.  So, effective immediately, we run a regex validator on referrers and will be doing a more frequent clean up.</p>
<p>Hear that spammers? Take your crap elsewhere.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.firsttube.com/read/trackback-spam-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Trackback Spam</title>
		<link>http://www.firsttube.com/read/trackback-spam/</link>
		<comments>http://www.firsttube.com/read/trackback-spam/#comments</comments>
		<pubDate>Tue, 13 Feb 2007 14:51:00 +0000</pubDate>
		<dc:creator>Adam S</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[OSNews]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Trackback Spam]]></category>

		<guid isPermaLink="false">http://firsttubecom/read/Trackback-Spam</guid>
		<description><![CDATA[There is a new trend out there, one that hasn&#8217;t received much coverage, but it&#8217;s a big deal, and it&#8217;s getting bigger. As user generated content becomes more and more prevalent, we have a new type of spam out there: &#8230; <a href="http://www.firsttube.com/read/trackback-spam/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There is a new trend out there, one that hasn&#8217;t received much coverage, but it&#8217;s a big deal, and it&#8217;s getting bigger.  As user generated content becomes more and more prevalent, we have a new type of spam out there: trackback spam.  On my blog, beneath all of the entries (above the comments), there is a section that shows you the user agents that loaded that page as well as the refering pages.  I recently discovered something: people gaming the system.  Read on&#8230;<br />
<span id="more-217"></span><br />
The thing is, I sometimes follow those links to see who is linking to my site.  There are several reasons I do this, but part of it is the effect on search engines.  Anyone who links a blog entry to my particle immediately is linked back on subsequent page load.  It&#8217;s good search engine karma.  That said, I started seeing some sites that didn&#8217;t have a link back, but somehow referred someone to my site.  And then I realized they were unrelated  &#8211; car insurance, casino, etc.  Typical spam crap.  </p>
<p>All were coming from different IPs.  All had different user agents.  All had different referer links.  This is spam, pure and simple.  It&#8217;s someone trying to piggyback off of my pagerank.  </p>
<p>These are the jerks who spammed me:<br />
<small>theonlineslotsmachine .com<br />
online-casino-special .com<br />
adencitycasino .com<br />
onterminsurance .com<br />
ontermlifeinsurancerate .com<br />
onslotmachinesonline .com<br />
actoncasino .com<br />
onusinter .com<br />
iloanmortgageonline .com<br />
scrail .net<br />
ppplastic .com<br />
mysteryclips .com<br />
e-z-ly-treat-e-d .com<br />
onhomecontentsinsurance .com</small></p>
<p>And that&#8217;s just a small slice of the pie.  </p>
<p>I&#8217;ve removed all of the spam links I could find, added some tighter controls to try to avoid recording these faked headers, and also added &#8216;rel=&#8221;nofollow&#8221;&#8216; to the links, which means I still reward referrers with a link, but bots won&#8217;t follow them, so they get no pagerank bonus until I manually change it.</p>
<p>Trackback spam is going to be a big problem, particularly as people continue to use commenting engines that allow you to link your name to a URL.  It makes sense to start posting fake comments just to get that link on a worthwhile website with a high pagerank.  So combating this early will be important.  </p>
<p>I&#8217;ve thought about some ways we might combat this, and was thinking that on OSNews, I might only light up your blog/homepage link if you have a positive &#8220;trust&#8221; level.  Otherwise, it will be just plain text.  Or maybe add the &#8220;nofollow&#8221; to links of untrusted users.  Not sure yet.  </p>
<p>Either way, trust me, though the subject may catch on with a different name, you haven&#8217;t heard the last of trackback spam.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.firsttube.com/read/trackback-spam/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

