PHP Weirdness

Beware: this post is definitely not for the feint of heart. It includes a lot of code. You have been warned.

I wrote an application some time ago for my company that looks up the longitude and latitude of an address for use in our geocoding initiative. It relied on yahoo_geo(), a function written by PHP creator Rasmus Lerdorf and the Yahoo Maps API. It was largely dependent on this function:

function yahoo_geo($location) {
	$q = 'http://api.local.yahoo.com/MapsService/V1/geocode?appid=rlerdorf&location='
		.rawurlencode($location);
	$tmp = '/tmp/yws_geo_'.md5($q);
	request_cache($q, $tmp, 43200);
	libxml_use_internal_errors(true);
	$xml = simplexml_load_file($tmp);
	$ret['precision'] = (string)$xml->Result['precision'];
	foreach($xml->Result->children() as $key=>$val) {
		if(strlen($val)) $ret[(string)$key] = (string)$val;
	}
	return $ret;
}

This function worked for over two years for us with no problems at all. Then suddenly, in the last month, it started getting spotty. I fixed things by commenting out the caching parts of the function and forcing each execution to run again. Then I got errors about the libxml_use_internal_errors() function, so I commented that out. But today, the function just flat out failed, every single time returning the same error:

Warning: file_get_contents(http://XXXXXXXXXX/XXX) [function.file-get-contents]: failed to open stream: HTTP request failed! in /home/intranet/html/fetch.php on line X

What the heck? This code is all over the web. I’ve tried a million permutations of this function, including using fopen() and ob_get_contents(), and none have worked. And most frustratingly, I could load the URL successfully in Lynx and eLinks, so the machine could quickly and easily fetch the URL.

So I ventured into a sandbox I’ve never really played before: cURL. cURL is an interesting animal. But the interesting thing is, once I got it working, it worked faster than ever! So, without further ado, here is the new and improved yahoo_geo() function:

function yahoo_geo($location) {
	$q = 'http://api.local.yahoo.com/MapsService/V1/geocode?appid=rlerdorf&location='.urlencode(trim($location));
	$ch = curl_init($q);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	ob_start();
	curl_exec($ch);
	$stream = ob_get_contents();
	ob_end_clean();
	if($stream) {
		$xml = simplexml_load_string($stream);
		$ret['precision'] = (string)$xml->Result['precision'];
		if($xml) {
			foreach($xml->Result->children() as $key=>$val) {
				if(strlen($val)) $ret[(string)$key] =  (string)$val;
			}
		}
		curl_close($ch);
		return $ret;
	} else {
		return FALSE;
	}
}

Note: If you’re reproducing these functions elsewhere, be careful – WordPress may have converted the quotes into smart quotes that will need to be fixed before this script will run properly.

OSNews vs. WordPress

I’ve spent quite a bit of time, over the last 5 or 6 days, diving into WordPress and learning what makes it tick.  Parts of WordPress are really impressive – just flat out cool. The way some of it works is fairly complex and deciphering it sometimes means reading page after page after page to understand an entire routine.  But sometimes, when you finally see, end to end, how something in WordPress works –  I mean really see individual bits of the engine – you have to admit it teaches you a little about PHP.  WordPress, underneath it all, is a pretty big beast and its strength and ubiquitous presence comes largely, I think, from the fact that it can do virtually anything.  The really sweet plugin system, the ways hooks work, “The Loop,” the dynamic options panel – it’s all very educational.  

The interesting thing here is that I’ve browsed the source of Slash, Scoop, phpNuke, and now WordPress, and all of them are definitively more complex and much heavier than the entire OSNews codebase. Now, before you jump all over me – firstly, Slash and Scoop are Perl, and I don’t really read Perl, so I can’t speak as an expert there.  Secondly, WordPress and Nuke both are very portable and dynamic, whereas OSNews has a narrow focus and, location-wise, is very static.  But that aside, OSNews has withstood simultaneous link bombs from Slashdot and Digg.  As amazing as WordPress is, it’s mostly amazing that it functions at all and loads in less than 2 minutes per page with as much going on as I can see behind the scenes.   That’s not a cut on WordPress, by the way.

In fact, if anything , what is really impressed upon me is how smooth and simple OSNews code is, if I may be so bold.  OSNews runs superfast due, in part, to lots of creative caching, some on-demand, some via cron.  But it also does so because of highly efficient queries that are measured for speed on their JOINs, meaning in some cases, it’s faster to do 20 simple queries than one complex one, or build a long and scary chain of “OR x=a OR x=b OR x=c OR x=d…”  Watching WordPress code in action is really fun for me, but watching OSNews work knowing what I now know about how much work PHP can cram into its threads is even more fun.

An Argument for PHP

Currently, over on Slashdot, there is an article on forthcoming features in PHP version 6. And, like most PHP articles, the comments section is flooded with jackasses arguing that PHP sucks as a language. I get frustrated by the entire “PHP sucks” campaign, largely because it’s like the HTML e-mail argument – mostly driven by the fact that it’s stylish to hate them – but I’m going to go further. I argue than everyone posting about how PHP is a bad language as a whole is an idiot. Every single one. Each is a foolish, arrogant, nerd sheep who can’t think for themselves. Update 5/14/08 20:39 UTC: Okay, this piece was linked by several sources, and the truth is, I had just read some George Carlin, so I was probably more aggressive than I intended to be. What I really mean is that people posting about how PHP is a bad language as a whole without citing any reasons are generally following a trend, trying to look cool, or too narrow-minded to be considered credible. And the responses I’ve seen across the net have, thus far, supported this argument.

Why? Let’s argue for a second that everything people say about PHP is true, as many of the complaints are sound.

It’s true the primary namespace has way too many functions – over three thousand, I’m told. It’s true that the function names are inconsistent, some have underscores, some don’t. It’s true that the function names are often verbose. It’s true that OOP was weak until recently, it’s true that register_globals was a security nightmare. All those things are potential issues, and all languages have them. As the “real programmers” who write Perl would never admit, reading other people’s terse Perl is often a f’ing disaster, even for seasoned Perl-ites. And when using compiled ASP.net – for best performance, natch – you must update your entire site (well, all the concerned ASPX pages and DLLs) to make elementary changes.

That said, PHP is easy. Really easy. And it’s a trivial task to get a website up and running fairly quickly. And you can serve enormous amounts of traffic as proven not only by OSNews (who have been dugg and Slashdotted concurrently), but by Yahoo!, Wikipedia, Flickr, Facebook, and many, many others. And why are there so many open source PHP frameworks, apps, CMSes, etc? Because PHP is installable virtually everywhere, it’s very portable, and it’s really simple to hack up. Try installing something dependent on mod_perl (e.g. Slash or Scoop) and get back to me on the ease of the install.

The fact is, even if everyone’s fears about writing insecure code is true, the ability to make mistakes does not mean everyone does, and those who would forsake “the right tool for the job at hand” shouldn’t be trusted even to water your plants, because they are obviously nitwits. If you can’t concede that PHP can be the right tool some of the time for some situations, you shouldn’t be trusted to code or make adult decisions. No, I argue that the reason they dislike PHP is because many start with PHP and thus, admitting to liking it would make them appear to be a “noob.” It’s because they must appear to be seasoned pros. It’s the bragging rights on the 21st century.

Nobody has ever claimed PHP is the solution to everything, but it is a remarkably easy tool for scripting dynamically generated HTML. And, in my opinion and experience, it does so better than Perl, better than Ruby, and a hell of a lot better than both ASP.net and JSP.

HAXX0RED

So, I updated firsttube.com to “revision 9″ on Friday, and when I went to show someone last night, imagine my surprise when I found the whole thing hosed. The site was missing entire chunks – random, non-sequential directories, missing entirely.

I’ll spare you the details: I got hacked. Someone either brute forced their way into the admin site (which is now pretty locked down, until I figure this all out) or brute forced into SSH and uploaded several malicious PHP scripts. They are scary, I actually have them intact in a backup from a few days ago. How much has been revealed? My MySQL passwords? It’s impossible to tell. Virtually everything will need scrubbing.

In the meantime, excuse any wonkiness until all is repaired. The good news is this finally forces me to finish work on the new administrative area I’ve been playing with.

Integers on the Intertubes

Some time ago, I wrote an application for my company. Like most weblets I’ve written, this used PHP and either MySQL or MSSQL for the backend. This particular application logged all phone calls. As part of the record, it would record the caller’s account number, which is a 5 or 6 digit integer.

So, I got a phone call from the director of our customer contact department this week. He was concerned about the reports. He made a decision last week that when a call came in that was a lead – in other words, a non-customer, that his people would fill the phone number from the caller ID into the account number field. But when he ran his export reports, he found that hisn techs had entered this phone number for ALL of the calls: 429-496-7295. That’s weird, he said. So he called me and asked why that was. I checked all the calls and most were from one woman, so my first instinct was “Check if her browser has autocomplete turned on”. But he swore that he tried it too and gotten the same result.

I checked the database and sure enough, it was right there: 429-496-7295, in all of the fields. So I went back to the code. In short, I took the input from the form, and declared it like this:

$accountnum = (int) $_POST['accountnum'];

Pretty straightforward: explicitly declare the type. So, I started my debugging by attempting to manually enter the data into the database. Sure enough: the account key field showed this: 4294967295.

So, I went back to the PHP and started by dumping out the raw SQL query:

INSERT INTO calls ('','x','x','x','4294967295','x','x');

What? So the database automatically converts it to this weird phone number and PHP does too? Suddenly it occured to me. One of the benefits of 64-bit computing is the ability to address more memory. There are limits to what can be done in 32-bit computing, and one is that integers have a limit! In this case, a database field called “integer” is limited to numbers between -2,147,483,648 and +2,147,483,647. It just so happens that the number is the same length as a US phone number – 10 digits. Changing the db field to “BIGINT” allowed me to manually run the SQL query and it worked. But the app still didn’t.

PHP’s int() and (int) $var syntaxes both conform to the integer limit. So I devised a work around:

$ac = $_POST['accountnum'];
if(!is_numeric($ac) { $ac = (int) $ac; }

It’s not gorgeous, but it will more than suffice for an internal app. We web programmers don’t usually have to deal with big integers, so it’s totally possible that web developers would never have had occasion to handle a situation like this. Here’s looking forward to native 64-bit for our next server, though.

PHPsuexec and My Adventure With Hostgator

I left for vacation on June 28, and before doing so, I took a quick glance over firsttube.com and jotted a quick blog post about it. firsttube.com was fully functional and officially dormant for 10 days as of June 28.

Imagine my surprise when on Monday, my wife said, “Hey, your site isn’t working!” The index page worked, but none of the other pages.

In short, my webhost, Hostgator decided to implement PHPsuexec. Here’s the gist of this awesome program: typically, your web server runs as the “nobody” user on a server, but you login as yourself, say your username is “jdough.” You need to use certain tricks, like using .htaccess files and chmodding to get around certain limitations. PHPSuexec makes php run *as you,* removing the need for world writable directories and creating a need for custom php.ini files to replace certain php directives in your .htacess files.

Since my site doesn’t use file extensions on most files, I used a directive called DefaultType to make everything PHP. This stopped functioning when Hostgator made the changes on Monday. Instead, every one of the pages that relied upon that value for parsing stopped working and started displaying HTTP error 500.

When I returned into town on Sunday, I opened a high priorityt ticket with Hostgator. An hour later, I called the support line and was told an admin would reply presently. An hour later, I replied to my confirmation email to their email support line. Another hour later, I called again. After 35 minutes on the phone, they finally helped me get the pages running. But images across the site were broken. They were generating parsing errors! They were being interpretted by PHP. Yikes! Another 25 minutes on the phone today resulted in new .htaccess files everywhere. I should tell you that today’s phone calls were with two “gators” who were both very friendly and helped me very enthusiastically.

Hostgator did not email me about these changes, even though they have my email address. They did not call me, even though they have my phone number. They did not post anything in my control panel, even though they can. Instead, they posted it in their own support forums and expected me to check it. A major change to the very core of the server behavior and they simply didn’t tell me. And as a result, my sites were down for a week plus. So if you tried visiting firsttube.com in that time, I’m sorry for the interruption: the view page, the print page, the comments page, and nearly every other meaningful page failed to parse.

If I were a business and monetized my site in any way, I would immediately cancel. But to be fair, Hostgator has unparalleled uptime, unmatched availability, awesome tools (cpanel based), a competitive rate, and a friendly support staff. So I decided to give them one more chance. They have burned all the trust they gained with me, and I will not be recommending them to anyone right now, but I am not taking my business elsewhere just yet.

PHPsuexec is a great tool that provides a nice security boost, but do some serious testing before you implement it. It can dramatically alter the way your websites work.

PHP vs. ASP.NET

We have a new web-based client portal application we are going to use for my company extranet. However, because it was originally designed to be a hosted application, there are several variables involved in all areas that don’t apply to us, since we host it ourselves.

When using said portal, every URL looks something like:

domain.com/login.aspx?QS=jasbndfiaubnfoaeuifwoeifbwfe

The only difference is that the “QS” GET variable is even longer. I made the request of our developers to get rid of this query string for the login page, and the login page only. This is what that code looks like in PHP, inserted at line 1.

if(!$_GET['QS']) { 
     $_GET['QS'] = 'jasbndfiaubnfoaeuifwoeifbwfe'; 
}

That’s it. One line of code. In ASP.net, this cost me 3 hours of developer time. THREE hours.

Then I asked our old developers to make a change to their code. It was doing a check in login if they are customers from the new app or the old one. If they are old, it processes the login. If it’they are new, it gives them an error message. So I said, instead of giving them the error, let’s redirect them to /new-directory/login.aspx?email=[base64_encoded email]&password=[base64_encoded password].

This is that code in PHP:

if($is_new) { 
     header("Location: /newdirectory/login.aspx?email="
.base64_encode(stripslashes($_POST['email'])) . "&password="
.base64_encode(stripslashes($_POST['password'])));
} else {
     //process login
}

This cost me 2 hours at $165. Am I getting taken for a ride? I keep telling them – this would take 30 seconds in PHP. And they tell me, yes but ASP.net doesn’t work that way, and we need to change the web.config, and we need to recompile the entire site, etc, etc. If it were just one vendor, I’d be more suspicious, but two separate, unrelated developers are giving me crazy quotes like this.

I hear people bitch about PHP online ad nauseum. Every time I see real code, it appears PHP is FAR faster and far more friendly when it comes to customization.

PHP Lesson 2: Behind the Scenes of Threading

This is going to be a very nerdy post, because I’m going to get into some actual PHP code. I’ve been thinking a lot about efficient threading. The implementation of threading on OSNews is very complex, because it involves lots of math in order to properly construct and align tables. Furthermore, because we don’t use CSS for positioning, it’s accomplished via ‘align’ commands and TWO templates, which is really clumsy, because between flat mode, admin mode, collapsed threading mode, and expanded threaded mode, we have several templates, and since they are all independent, they tend to unintentionally vary, so you might see different things in replies and threads. My goal in writing a threaded display for firsttube.com was to avoid all of the pitfalls in that implementation and come up with something clean. Read on for the gory details.
Continue reading

Slug Transition Complete

The transition to a smart, modern, fancy URL system is complete. This is how it works: every item has a title called a “slug.” The slug of *this piece* is “Slug-Transition-Complete.”

Now, your basic operations are read, print, and comment.

So, the URLs work like so:

http://firsttube.com/read/Slug-Transition-Complete
http://firsttube.com/print/Slug-Transition-Complete
http://firsttube.com/comment/Slug-Transition-Complete

This requires no tweaking of Apache (other than permitting .htaccess files). There is no mod_rewrite going on here. It’s all in PHP. Huzzah.