E_NOTICE stays off.
I'm sure you've used this idiom a lot when writing JavaScript code
options['a'] = options['a'] || 'foobar';
It's short, it's concise and it's clear what it does. In ruby, you can even be more concise:
params[:a] ||= 'foobar'
So you can imagine that I was happy with PHP 5.3's new ?: operator:
<? $options['a'] = $options['a'] ?: 'foobar'; ?>
In all three cases, the syntax is concise and readable, though arguably, the PHP one could read a bit better, but, ?: still is better than writing the full ternary expression, spelling out $options['a'] three times.
PopScan, since forever (forever being 2004) runs with E_NOTICE turned off. Back in the times, I felt it provided just baggage and I just wanted (had to) get things done quickly.
This, of course, lead to people not taking enough care for the code and recently, I had one too many case of a bug caused by accessing a variable that was undefined in a specific code path.
I decided that I'm willing to spend the effort in cleaning all of this up and making sure that there are no undeclared fields and variables in all of PopScans codebase.
Which turned out to be quite a bit of work as a lot of code is apparently
happily relying on the default null that you can read out of undefined
variables. Those instances might be ugly, but they are by no means bugs.
Cases where the null wouldn't be expected are the ones I care about, but I
don't even what to go and discern the two - I'll just fix all of the instances
(embarrassingly many, most of them, thankfully, not mine).
Of course, if I put hours into a cleanup project like this, I want to be sure that nobody destroys my work again over time.
Which is why I was looking into running PHP with E_NOTICE in development
mode at least.
Which brings us back to the introduction.
<? $options['a'] = $options['a'] ?: 'foobar'; ?>
is wrong code. Any accessing of an undefined index of an array always raises a notice. It's not like Python where you can chose (accessing a dictionary using [] will throw a KeyError, but there's get() which just returns None). No. You don't get to chose. You only get to add boilerplate:
<? $options['a'] = isset($options['a']) ? $options['a'] : 'foobar'; ?>
See how I'm now spelling $options['a'] three times again? ?: just got a
whole lot less useful.
But not only that. Let's say you have code like this:
<?
list($host, $port) = explode(':', trim($def))
$port = $port ?: 11211; ?>
IMHO very readable and clear what it does: It extracts a host and a port and sets the port to 11211 if there's none in the initial string.
This of course won't work with E_NOTICE enabled. You either lose the very concise list() syntax, or you do - ugh - this:
<?
list($host, $port) = explode(':', trim($def)) + array(null, null);
$port = $port ?: 11211; ?>
Which looks ugly as hell. And no, you can't write a wrapper to explode() which always returns an array big enough, because you don't know what's big enough. You would have to pass the amount of nulls you want into the call too. That would look nicer then above hack, but it still doesn't even come close in conciseness to the solution which throws a notice.
So. In the end, I'm just complaining about syntax you might think? I though so too and I wanted to add the syntax I liked, so I did a bit of experimenting.
Here's a little something I've come up with:
The wrapped array solution looks really compelling syntax-wise and I could totally see myself using this and even forcing everybody else to go there. But of course, I didn't trust PHP's interpreter and thus benchmarked the thing.
pilif@tali ~ % php e_notice_stays_off.php
Notices off. Array 100000 iterations took 0.118751s
Notices off. Inline. Array 100000 iterations took 0.044247s
Notices off. Var. Array 100000 iterations took 0.118603s
Wrapped array. 100000 iterations took 0.962119s
Parameter call. 100000 iterations took 0.406003s
Undefined var. 100000 iterations took 0.194525s
So. Using nice syntactic sugar costs 7 times the performance. The second best solution? Still 4 times. Out of the question. Yes. It could be seen as a micro-optimization, but 100'000 iterations, while a lot is not that many. Waiting nearly a second instead of 0.1 second is crazy, especially for a common operation like this.
Interestingly, the most bloated code (that checks with isset()) is twice as fast as the most readable (just assign). Likely, the notice gets fired regardless of error_reporting() and then just ignored later on.
What really pisses me off about this is the fact that everywhere else PHP doesn't give a damn. '0' is equal to 0. Heck, even 'abc' is equal to 0. It even fails silently many times.
But in a case like this, where there is even newly added nice and concise syntax, it has to be anal and bitchy. And there's no way to get to the needed solution but to either write too expensive wrappers or ugly boilerplate.
Dynamic languages give us a very useful tool to be dynamic in the APIs we write. We can create functions that take a dictionary (an array in PHP) of options. We can extend our objects at runtime by just adding a property. And with PHP's (way too) lenient data conversion rules, we can even do math with user supplied string data.
But can we read data from $_GET without boilerplate? No. Not in PHP. Can we use a dictionary of optional parameters? Not in PHP. PHP would require boilerplate.
If a language basically mandates retyping the same expression three times, then, IMHO, something is broken. And if all the workarounds are either crappy to read or have very bad runtime properties, then something is terribly broken.
So, I decided to just fix the problem (undefined variable access) but leave
E_NOTICE where it is (off). There's always git blame and I'll make sure I
will get a beer every time somebody lets another undefined variable slip in.
Asking for permission
Only just last year, I told @brainlock (in real life, so I can't link) that the coolest thing about our industry was that you don't have to ask for permission to do anything.
Want to start the next big web project? Just start it. Want to write about your opinions? Just write about them. Want to get famous? It's still a lot of work and marketing, but nothing (aside of lack of talent) is stopping you.
Whenever you have a good idea for a project, you start working on it, you see how it turns out and you decide whether to continue working on it or whether to scrap it. Aside of a bit of cash for hosting, you don't need anything else.
This is very cool because is empowers "normal people". Heck, I probably wouldn't be where I currently am if it wasn't for this. Back in 1996 I had no money, I wasn't known, I had no past experience. What I had though was enthusiasm.
Which is all that's needed.
Only a year later though, I'm sad to see that we are at the verge of losing all of this. Piece by piece.
First was apple with their iPhone. Even with all the enthusiasm of the world, you are not going to write an app that other people can run on the phone. No. First you will have to ask Apple for permission.
Want to access some third-party hardware from that iPhone app? Sure. But now you have to not only ask Apple, but also the third party vendor for permission.
The explanation we were given is that a malicious app could easily bring down the mobile network. Thus they needed to be careful what we could run on our phones.
But then, we got the iPad with the exact same restrictions even though not all of them even have mobile network access.
The explanation this time? Security.
As nobody wants their machine to be insecure, everybody just accepts it.
Next came Microsoft: In the Windows Mobile days before the release of 7, you didn't have to ask anybody for permission. You bought (or pirated if you didn't have money) Visual Studio, you wrote your app, you published it.
All of this is lost now. Now you ask for permission. Now you hope for the powers that be to allow you to write your software.
Finally, you can't even do what you want with your PC - all because of security.
So there's still the web you think? I wish I could be positive about that, but as we are running out of IP-addresses and the adoption of IPv6 is slow as ever, I believe that public IP addresses are becoming a scarce good at which point, again, you will be asking for permission.
In some countries, even today, it's not possible to just write a blog post
because the government is afraid of "unrest" (read: losing even more
credibility). That's not just countries we always perceived as "not free" -
heck, even in Italy you must register with the government if you want to have
a blog (it turns out that law didn't come to pass - let's hope no other country
has the same bright idea). In Germany, if you read the law by the letter, you
can't blog at all without getting every post approved - you could write
something that a minor might see.
«But permission will be granted anyways», you might say. Are you sure though? What if you are a minor wanting to create an application for your first client? Back in my days, I could just do it. Are you sure that whatever entity is going to have to give permission wan't to do business with minors? You do know that you can't have a Gmail account if you are younger than 13 years, do you? So age barriers exist.
What if your project competes with whatever entity has to give permission? Remember the story about the Google Voice app? Once we are out of IP addresses, the big provider and media companies who still have addresses might see you little startup web project as competition in some way. Are you sure you will still get permission?
Back in 1996 when I started my company in High-School, all you needed to earn your living was enthusiasm and a PC (yes - I started doing web programming without having access to the internet)
Now you need signed contracts, signed NDAs, lobbying, developer program memberships, cash - the barriers to entry are infinitely higher at this point.
I'm afraid though, that this is just the beginning. If we don't stand up now, if we continue to let big companies and governments take away our freedom of expression piece by piece, if we give up more and more of our freedom because of the false promise of security, then, at one point, all of what we had will be lost.
We won't be able to just start our projects. We won't be able to create - only to work on other peoples projects. We will lose all that makes our profession interesting.
Let's not go there.
Please.
Read on →Lion Server authentication issues
Lately I was having an issue with a Lion Server that refused logins of users stored in OpenDirectory. A quick check of /var/log/opendirectoryd.log revealed an issue with the «Password Server»:
Module: AppleODClient - unable to send command to Password Server - sendmsg() on socket fd 16 failed: Broken pipe (5205)
As this message apparently doesn't appear on Google yet, there's my contribution to solving this.
The fix was to kill -9 the kerberos authentication daemon:
sudo killall kpasswdd
which in fact didn't help (sometimes even sudo isn't enough), so I had to be more persuasive to get rid of the apparently badly hanging process:
sudo killall -9 kpasswdd
This time the process was really killed and subsequently instantly restarted by launchd.
After that, the problem went away.
Read on →serialize() output is binary data!
When you call serialize() in PHP, to serialize a value into something that you store for later use with unserialize(), then be very careful what you are doing with that data.
When you look at the output, you'd be tempted to assume that it's text data:
php > $a = array('foo' => 'bar');
php > echo serialize($a);
a:1:{s:3:"foo";s:3:"bar";}
php >
and as such, you'd be tempted to treat this as text data (i.e. store it in a TEXT column in your database).
But what looks like text on first glance isn't text data at all. Assume that my terminal is in ISO-8859-1 encoding:
php > echo serialize(array('foo' => 'bär'));
a:1:{s:3:"foo";s:3:"bär";}
and now assume it's in UTF-8 encoding:
php > echo serialize(array('foo' => 'bär'));
a:1:{s:3:"foo";s:4:"bär";}
You will notice that the format encodes the strings length together with the string. And because PHP is inherently not unicode capable, it's not encoding the strings character length, but its byte-length.
unserialize() checks whether the encoded length matches the actual delimited strings length. This means that if you treat the serialized output as text and your databases's encoding changes along the way, that the retrieved string can't be unserialized any more.
I just learned that the hard way (even though it's obvious in hindsight) while migrating PopScan from ISO-8859-1 to UTF-8:
The databases of existing systems now contain a lot of output from serialize() which was run over ISO strings but now that the client-encoding in the database client is set to utf-8, the data will be retrieved as UTF-8 and because the serialize() output was stored in a TEXT column, it happily gets UTF-8 encoded.
If we remove the database from the picture and express the problem in code, this is what's going on:
unserialize(utf8encode(serialize('data with 8bit chàracters')));
i.e the data gets altered after serializing and the way it gets altered is a way that unserialize can't deal with the data any more.
So, for everybody else not yet in this dead end:
The output of serialize() is binary data. It looks like textual data, bit it isn't. Treat it as binary. If you store it somewhere, make sure that the medium you store it to treats the data as binary. No transformation what so ever must ever be made on it.
Of course, that leaves you with a problem later on if you switch character sets and you have to unserialize, but at least you get to unserialize then. I have to go great lengths now to salvage the old data.
Read on →