Tag Archives: Nerd

A Guide to Base Changing For Short URLs

Some time ago, I developed a VERY simple way to fake a bit.ly-style short URL. On any server that uses any form of an integer to identify an article (either in the database or the URL), on an Apache server that supports mod_rewrite, you edit your .htaccess file like so:

RewriteEngine Off
<ifmodule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^index.php$
RewriteRule . index.php [NC,L]

This essentially tells your server to redirect anything that isn’t a file or directory to index.php.

Then index.php looks like this:

$url = str_replace("/","",$_SERVER['REQUEST_URI']);
if(isset($url) && trim($url)!='') {
$id = base_convert($url,36,10);
//this is where you either query your database for a slug or build the URL
$uri = 'http://your-site-goes-here.com/path/to/article';
if($uri) {
header('HTTP/1.1 301 Moved Permanently');
header('Link: < '.$uri.">; rel=shortlink");
header('Location: '.$uri);
} else {
header("Location: http://your-site-goes-here.com/"); exit;

How do you get your short links? That’s easy. Just run this function:

$shorturl = base_convert($id,10,36);

However, this isn’t the most compact way to condense. Obviously, this is base36, the highest PHP can go. But what about uppercase letters? And other characters?

So I set out, for some reason, to build a better condenser.

This is the result of several hours of work, mostly wasted, on some intellectual pursuit that was more a case of simply not letting it defeat me. A few notes: I’m quite confident that given enough time, and if I cared, I could make the code cleaner and more efficient in some places. I’m also aware that on 32 bit machines, it maxes out at the integer limit. I does support signed integers though, from min to max.

$ft_str = '0123456789abcdefghijklmnopqrstuvqwxyzABCDEFGHIJKLMNOPQRSTUVQWXYZ';
# uncomment the next line if you prefer to use potentially non-URL-safe base96
# $ft_str .= '0123456789abcdefghijklmnopqrstuvqwxyzABCDEFGHIJKLMNOPQRSTUVQWXYZ_@$!#%^&*()=+\|}{][,;:~'; }

$powers = array();
for($p=0;$p&lt;=10;$p++) {
function ft_unconvert($str) {
if(substr($str,0,1)=='-') { $pfx='-'; } else { $pfx=''; }
global $ft_str,$powers;
$base = strlen($ft_str); $q=0;
$s = str_split(strrev($str)); $len = sizeof($s);
foreach($s as $k=>$v) {
$sp = strpos($ft_str,$v);
$decimal += pow($base,$k)*$sp;
return $pfx.$decimal;
function ft_converter($int) {
if($int > 0) { $pfx='-'; $int=abs($int); } else { $pfx=''; }
global $ft_str,$powers;
$base = strlen($ft_str); $q=0;
$p = str_split($str);
foreach($powers as $k=>$v) {
if($int>=$v) {
$timesinto = floor($int/$v);
$digit .= $ft_str{$timesinto};
$int = $int % $v;
} elseif($int>0 && $k==0) {
$digit .= $ft_str{$int};
$int = $int % $v;
return $pfx.$digit;
function ft_convert_demo($num) {
global $ft_str;
$ftc = ft_converter($num);
return "Converting ".$num." into base ".strlen($ft_str).": ".$ftc."<br />Unconverting ".$ftc." to base 10: ".ft_unconvert($ftc);
echo ft_convert_demo('50687');
Tagged , ,

Using the abbr tag

Kroc Camen, long time OSNews reader and frequent IM buddy of mine, has an interesting piece examining the use of the <abbr> HTML tag.  Kroc is one of those people who is very serious about the presentation and efficiency of his code, a trait I do not share, at least in practice, at least, to the same degree that he does, and it makes us good companions.  My focus is typically on clean, fast, scalable code that forsakes beauty in favor of performance.  My code, in the form of OSNews, has sustained a simultaneous Digging and Slashdotting, something of which I’m very proud.

But my CSS isn’t going to win any awards, my javascript could be collapsed a lot and made much more efficient, and my HTML often suffers from “div-itis” and “class-itis.” Enter Mr Camen, whose motto, “code is art,” is evident upon initial inspection.  Kroc’s code is not only well written, the source itself is actually beautiful.  We have collaborated on both CSS and PHP in the past and both are the better for it.  

That said, we have strikingly different positions about publshing on the web.  Kroc writes his website for himself, and as a result, publishes in HTML 5; his site doesn’t work in IE, his mindset being “if you choose to use a subpar browser, you will have a subpar experience. ”  Indeed, his site is a complete mess in IE 7, the fault only of IE and its abysmal CSS support, not the code itself.   I, conversely, attempt to code with a much more conservative bend, coding to the masses, at the expense of using several great tricks.   

Getting back on track, when it came to discussing the <abbr> tag, both of us found ourselves remarkably on the same page.  Although one can get into the nitty-gritty details and find the whole conversation trivial, I think there’s something to be said for using tags properly and getting your information properly parsed.  After all, screen readers exist with regularity today, XML is very popular (most commonly in the form of RSS), and search engines spider the majority of popular websites several times times a day if not every hour.   Using tags, and using them properly, should be important to content publishers and republishers.  

I also agree with Kroc’s point that it’s not your job to educate your reader like an encyclopedia.  The <abbr> tag is not so much about education as it is about properly marking up your  code.  

As the second wave of the browser war heats up – as Tracemonkey, Squirrelfish Extreme, and V8 start really setting themselves apart from IE in even larger ways, coding to standards will become even more important.  Understading lesser used tags is elemental in writing the best, most concise code and ranking well in search engines.

Tagged , ,

Facebook Translations

Did you know that Facebook is offered in both Pirate and l33t sp34k?

Facebook Translations

Facebook Translations

Tagged , ,

PHP Weirdness

Beware: this post is definitely not for the feint of heart. It includes a lot of code. You have been warned.

I wrote an application some time ago for my company that looks up the longitude and latitude of an address for use in our geocoding initiative. It relied on yahoo_geo(), a function written by PHP creator Rasmus Lerdorf and the Yahoo Maps API. It was largely dependent on this function:

function yahoo_geo($location) {
	$q = 'http://api.local.yahoo.com/MapsService/V1/geocode?appid=rlerdorf&amp;location='
	$tmp = '/tmp/yws_geo_'.md5($q);
	request_cache($q, $tmp, 43200);
	$xml = simplexml_load_file($tmp);
	$ret['precision'] = (string)$xml-&gt;Result['precision'];
	foreach($xml-&gt;Result-&gt;children() as $key=&gt;$val) {
		if(strlen($val)) $ret[(string)$key] = (string)$val;
	return $ret;

This function worked for over two years for us with no problems at all. Then suddenly, in the last month, it started getting spotty. I fixed things by commenting out the caching parts of the function and forcing each execution to run again. Then I got errors about the libxml_use_internal_errors() function, so I commented that out. But today, the function just flat out failed, every single time returning the same error:

Warning: file_get_contents(http://XXXXXXXXXX/XXX) [function.file-get-contents]: failed to open stream: HTTP request failed! in /home/intranet/html/fetch.php on line X

What the heck? This code is all over the web. I’ve tried a million permutations of this function, including using fopen() and ob_get_contents(), and none have worked. And most frustratingly, I could load the URL successfully in Lynx and eLinks, so the machine could quickly and easily fetch the URL.

So I ventured into a sandbox I’ve never really played before: cURL. cURL is an interesting animal. But the interesting thing is, once I got it working, it worked faster than ever! So, without further ado, here is the new and improved yahoo_geo() function:

function yahoo_geo($location) {
	$q = 'http://api.local.yahoo.com/MapsService/V1/geocode?appid=rlerdorf&#038;location='.urlencode(trim($location));
	$ch = curl_init($q);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	$stream = ob_get_contents();
	if($stream) {
		$xml = simplexml_load_string($stream);
		$ret['precision'] = (string)$xml->Result['precision'];
		if($xml) {
			foreach($xml->Result->children() as $key=>$val) {
				if(strlen($val)) $ret[(string)$key] =  (string)$val;
		return $ret;
	} else {
		return FALSE;

Note: If you’re reproducing these functions elsewhere, be careful – WordPress may have converted the quotes into smart quotes that will need to be fixed before this script will run properly.

Tagged , , , ,

OSNews vs. WordPress

I’ve spent quite a bit of time, over the last 5 or 6 days, diving into WordPress and learning what makes it tick.  Parts of WordPress are really impressive – just flat out cool. The way some of it works is fairly complex and deciphering it sometimes means reading page after page after page to understand an entire routine.  But sometimes, when you finally see, end to end, how something in WordPress works –  I mean really see individual bits of the engine – you have to admit it teaches you a little about PHP.  WordPress, underneath it all, is a pretty big beast and its strength and ubiquitous presence comes largely, I think, from the fact that it can do virtually anything.  The really sweet plugin system, the ways hooks work, “The Loop,” the dynamic options panel – it’s all very educational.  

The interesting thing here is that I’ve browsed the source of Slash, Scoop, phpNuke, and now WordPress, and all of them are definitively more complex and much heavier than the entire OSNews codebase. Now, before you jump all over me – firstly, Slash and Scoop are Perl, and I don’t really read Perl, so I can’t speak as an expert there.  Secondly, WordPress and Nuke both are very portable and dynamic, whereas OSNews has a narrow focus and, location-wise, is very static.  But that aside, OSNews has withstood simultaneous link bombs from Slashdot and Digg.  As amazing as WordPress is, it’s mostly amazing that it functions at all and loads in less than 2 minutes per page with as much going on as I can see behind the scenes.   That’s not a cut on WordPress, by the way.

In fact, if anything , what is really impressed upon me is how smooth and simple OSNews code is, if I may be so bold.  OSNews runs superfast due, in part, to lots of creative caching, some on-demand, some via cron.  But it also does so because of highly efficient queries that are measured for speed on their JOINs, meaning in some cases, it’s faster to do 20 simple queries than one complex one, or build a long and scary chain of “OR x=a OR x=b OR x=c OR x=d…”  Watching WordPress code in action is really fun for me, but watching OSNews work knowing what I now know about how much work PHP can cram into its threads is even more fun.

Tagged , , , , , ,

Hacking WordPress, Day Two

Thus far, my move to WordPress has been an adventure.  Here’s a few lessons learned.

First off, I was very excited about the features of WordPress.  I was really excited, most specifically, about the API, and about the rich text WYSIWYG of the backend.  I’ve done a lot of work on Small Axe’s backend, but it’s still nothing compared to WordPress.

When I imported my stuff, it worked well, but the “slugs” — or URL-friendly post titles — did not convert properly.  They converted as WordPress friendly, properly escaped slugs.  The problem was, my slugs needed to stay intact, because I didn’t want all old links to break.

Understanding the way WordPress functions is really tough for a WP newbie, because the code is so spread out, yet compact, voluminous, yet digestible. Start with index.php, onto wp-blog-header.php, into wp-settings.php, and then you find the massive list of files in the wp-includes directory.  You’ll dig all over trying to find files to find includes in includes in includes. I finally found a great article that tries to explain the WordPress slug architecture. It’s fairly complex. Much of it lives in/wp-includes/query.php. However, my problem was very specific.

Many of my post slugs had periods in them. The period does not interfere with the URL, but WordPress doesn’t like them, and somewhere in the massive beast. So I had to find the page that “gets” posts. Lo and behold, there is a function called “get_posts” that lives in /wp-includes/query.php. I kept poking around. Like anyone who keeps digging, eventually, you’ll find yourself in wp-includes/formatting.php. And there it is.

Slug posts get sanitized – like everything, virtually all input is strictly sanitized – by a function called sanitize_title_with_dashes(). This function generates the slug. In order to include dots in your slug titles, just replace lines 366 and 267 (on WordPress 2.6.0) with this:

$title = preg_replace('/&amp;+?;/', '', $title); // kill entities
$title = preg_replace('/[^%a-z0-9 _.-]/', '', $title);

Then your slug titles will not strip periods. Of course, I don’t recommend you actually use periods, I just wanted them to work when fetching old posts created before I knew any better.

After that adventure, I have to tell you, I’m really loving WordPress. There are some incredible plugins that have done some amazing functionality extension for me. So far, so good.

Tagged , , ,

Offline: The Silly Script Disaster

I have several websites. The way my web host has them set up, like many hosts who use cPanel, is that one site is a “master” and the others essentially exist as directories within that site. My master site is smallaxesolutions.com, which is the “company” under which I sometimes do my web design and network support business.

One of the things I (used to) do as Small Axe Solutions was publish the core code of the engine that powers firsttube.com, Small Axe. Small Axe code was built up as 0.1, then 0.2, then 0.3. At that point, I had added several features to firsttube.com that I had yet to merge upstream into Small Axe. So, I created a build system so I could slowly integrate the changes. In short, it worked like this: I had a directory called “build_source” which contained my current code. Of course, it had all kinds of problems out of the box, like the config files which pointed to nonsensical location like /path/to/your/blog/. It had no valid database connection info. The flatfiles were unwritable. So, in short, the code was (usually) solid, but PHP couldn’t compile it.

Meanwhile, another directory called “demo” was waiting silently.

Lastly, a third directory, outside the web root, called “static” was sitting with pre-built config files, db connection files, and some other stuff.

Then it was just a matter of a simple shell script. The script did the following: it deleted everything in the “demo” directory. Then it copied all of the files in the “build_source” directory into the demo directory. It deleted the config file and overwrote it with a copy from the “static” directory. Same for the db connection and a few other files. It left the demo directory as a live, fully functional build of the current code. Then it zipped everything in the “build_source” directory and put it into my downloads section. It ran this script every 30 minutes for probably 2 years now. I only chose 30 minutes because it made sense from a development standpoint to see the updates quickly. I stopped working on that version some time ago, but never got around to updating or changing the script.

Fast forward to a few weeks ago, I was cleaning out a bunch of old directories. Within 5 minutes, EVERYTHING was gone: my mail, *all* of my sites, my temp files, everything in my home directory that wasn’t a hidden file preceded with a dot. I didn’t realize this for several hours, but I then I restored from a backup and within 45 minutes, everything was gone again! Oh noes!

I immediately begin researching security and disabling all of my upload scripts. Something is wrong, I thought. I searched high and low. But, as you guessed, I didn’t find anything wrong, because there was nothing wrong. In my cleanup, as you may have gussed by now, I decided to delete the “demo” folder. The first line of my shell script is “cd /home/adam/public_html/build_source.” Then second, scary line, is “rm -rf *“. Since there was no “build_source” folder, the first line flat out failed, leaving the script in /home/adam. Then, unfortunately, it ran rm -rf * in the root of my home directory. Killer!

It took my some time to swallow my own stupidity. All I had to do was comment out the cron job to prevent this disaster. But alas, I dropped the ball. We’re back online now, and a little smarter.

Tagged , , ,

Math in Real Life, Part 1: Fruit Algorithms

I recently went to Costco and bought a rather large tub of blueberries. I am a huge fan of blueberries – in fact, the engine of this blog was once named “blueberry,” – and I am a huge fan of fresh fruit in general. While picking from said tub, I mentioned to a friend that as I munch away, I frequently scan the entire viewable area of berries and quickly select the “best” one in view for my next berry. I do this not just with blueberries, but with strawberries, raspberries, blackberries… in fact, I probably do it with many more foods. But in this case, we did an experiment, which goes thusly:

Shake a tub of berries so it’s a fresh “layout” and have a friend peruse it. Then, you each reveal the “best” berry – the one you’d go for if you were choosing. The first FIVE throws we matched 100%. The next few took us up to two or three picks to match. But the fact remains, we agreed that as we ate, we’d do a quick scan – all in an instant, of course, the deliberation is almost entirely subconscious – and choose the best remaining berry/berries. And furthermore, in the first five throws, we were able to agree with no debate as to which was the best remaining berry, without defining what qualities should be prized in an assessment of “best.” I think many people do this, and not just with fruit, but with all sorts of things. Is it just human nature?

There you go: math in real life.

Tagged ,

An Argument for PHP

Currently, over on Slashdot, there is an article on forthcoming features in PHP version 6. And, like most PHP articles, the comments section is flooded with jackasses arguing that PHP sucks as a language. I get frustrated by the entire “PHP sucks” campaign, largely because it’s like the HTML e-mail argument – mostly driven by the fact that it’s stylish to hate them – but I’m going to go further. I argue than everyone posting about how PHP is a bad language as a whole is an idiot. Every single one. Each is a foolish, arrogant, nerd sheep who can’t think for themselves. Update 5/14/08 20:39 UTC: Okay, this piece was linked by several sources, and the truth is, I had just read some George Carlin, so I was probably more aggressive than I intended to be. What I really mean is that people posting about how PHP is a bad language as a whole without citing any reasons are generally following a trend, trying to look cool, or too narrow-minded to be considered credible. And the responses I’ve seen across the net have, thus far, supported this argument.

Why? Let’s argue for a second that everything people say about PHP is true, as many of the complaints are sound.

It’s true the primary namespace has way too many functions – over three thousand, I’m told. It’s true that the function names are inconsistent, some have underscores, some don’t. It’s true that the function names are often verbose. It’s true that OOP was weak until recently, it’s true that register_globals was a security nightmare. All those things are potential issues, and all languages have them. As the “real programmers” who write Perl would never admit, reading other people’s terse Perl is often a f’ing disaster, even for seasoned Perl-ites. And when using compiled ASP.net – for best performance, natch – you must update your entire site (well, all the concerned ASPX pages and DLLs) to make elementary changes.

That said, PHP is easy. Really easy. And it’s a trivial task to get a website up and running fairly quickly. And you can serve enormous amounts of traffic as proven not only by OSNews (who have been dugg and Slashdotted concurrently), but by Yahoo!, Wikipedia, Flickr, Facebook, and many, many others. And why are there so many open source PHP frameworks, apps, CMSes, etc? Because PHP is installable virtually everywhere, it’s very portable, and it’s really simple to hack up. Try installing something dependent on mod_perl (e.g. Slash or Scoop) and get back to me on the ease of the install.

The fact is, even if everyone’s fears about writing insecure code is true, the ability to make mistakes does not mean everyone does, and those who would forsake “the right tool for the job at hand” shouldn’t be trusted even to water your plants, because they are obviously nitwits. If you can’t concede that PHP can be the right tool some of the time for some situations, you shouldn’t be trusted to code or make adult decisions. No, I argue that the reason they dislike PHP is because many start with PHP and thus, admitting to liking it would make them appear to be a “noob.” It’s because they must appear to be seasoned pros. It’s the bragging rights on the 21st century.

Nobody has ever claimed PHP is the solution to everything, but it is a remarkably easy tool for scripting dynamically generated HTML. And, in my opinion and experience, it does so better than Perl, better than Ruby, and a hell of a lot better than both ASP.net and JSP.

Tagged , ,

Dope Wars for the iPhone

I love my jailbroken iPhone, and I am always looking for a new “game of the week.” I’ve been through several, at first, it was LightsOff, but that ends at 225 levels or so. Then it was Five Dice. Then 4 Balls, Domino, and finally PuzzleManiak. I was so happy recently when someone decided to port Dope Wars to the iPhone in the form of “iDope.”

iDope iDope currently has a lot of bugs. Mainly, your jacket storage is irrelevant, you can actually store unlimited items, you just can’t buy unlimited items unless you hit “buy all.” You can’t store money in a bank. It never ends until you die. You are mugged or fight the cops maybe 80% of the time you travel. But most importantly, this:

Notice my dollars? That’s right, I have $2,147,483,647. Two billion, one hundred forty seven million, four hundred eighty three thousand, six hundred forty seven dollars. Recognize that number? If you read my blog regularly, you might. After all, it’s the upper limit of signed integers. The game is officially boring – no matter what I do, I’m always capped at that number, I can never get more money. I wonder if the iPhone can support BIGINT.

Anyway, I really hope to see iDope get some love and attention, because Dope Wars is a fabulous and addictive game, but as is, I eventually get to the upper limit and have to start over… and over… and over.

Tagged , , , ,