‘Twas a Cosmic Ray!

 A simulation of a cosmic ray shower formed when a proton with 1TeV (1e12 eV) of energy hits the atmosphere about 20km above the ground. The ground shown here is a 8km x 8km map of Chicago's lakefront. This visualization was made by Dinoj Surendran, Mark SubbaRao, and Randy Landsberg of the COSMUS group at the University of Chicago, with the help of physicists at the Kavli Institute for Cosmological Physics and the Pierre Auger Observatory.

Image props: Dinoj Surendran, Mark SubbaRao, and Randy Landsberg of the COSMUS group at the University of Chicago.

Have you every had a bug that was caused by a hardware error? Something that looked like a software bug, but where the code was just fine, yet the machine gave wrong results. Is it always just a blind spot preventing the programmer from seeing a subtle problem in the code, or can it possibly be that some transistor in the CPU just doesn't work right? Does that ever even happen, or is that just the stuff of legends?

As programmers, we are taught that the hardware is (almost) always right. We have to first question our code, then maybe consider 3rd party libraries, the OS, and in the end the compiler. But we practically never say things like "maybe the CPU is taking a wrong branch here" or "maybe the memory has remembered this wrong". In his programming classic "Code Complete", Steve McConnell mentioned an interesting, and controversial, data point that only 1% of reported bugs are caused by hardware errors. He argued that the data was measured in 70's and that the number is most certainly lower "today" (the book was originally published in the '90s). But how much lower?

While I don't have any numbers on that, as of today, I can at least provide some anecdotal evidence. What good is anecdotal evidence? Well, it certainly doesn't give accurate statistics, but at least it can give an indication of what kinds of weird things are possible. Sometimes.

This story starts with something as ordinary as a build bug. Over here at Croteam, we run about a dozen builder machines doing CI builds all the time for various versions of several games on multiple platforms. Besides the expected, developer-caused build bugs, we've come to consider transient "glitches" as "normal". With several hundred builds (or at least build attempts) per day, one can see all the fun ways things can go wrong without anyone submitting buggy code or data: remote signing servers being off-line, Windows updates rebooting the machine in the middle of a build, weird OS bugs randomly reporting "file not found" when the file is obviously there, compiler internal errors appearing at random for recompiles of the same file that was compiled correctly before...

Those are mostly easily explainable: Server was down - check. Logs show machine was rebooted by Windows update - check. OS can leak stuff sometimes - we are not going to sweat over that. Compilers are complex beasts, especially when using batch compilation as they then push multiple source files without cleaning up between them completely.1 In all those (and many other cases), we've already learned to recognize the pattern and mostly just click "retry" on such a build failure. And it almost always passes on the second try.

But this one was different. A build machine that was happily churning out builds on an otherwise idle branch, did a trouble-free clean nightly build on Saturday (just as on several days before), but then it failed on Sunday. The error was our own code's error report that a data file is wrong version. "Saved with a newer version of the application", it said. Huh. But no one has changed, neither the application nor the file. Heck, no one's been working in that branch for weeks!2

My first reaction was: "Oh, the OS is having a tantrum again. Let's just re-run it." But it came back with the exact same error, on same file. I became suspicious and compared the file with the source control. What do you know... the file was different. One bit of the file was changed. But how can that be? The date of the file showed that it was synced months ago.

That was the time for a brainstorm. We went through all possible scenarios how a file on disk that was not written to can suddenly change its contents. And none of them were applicable. Not without Windows reporting that there's a read error on disk. Which they didn't report. We checked the disk for S.M.A.R.T. errors (without results), just out of paranoia, even though it didn't make sense.

Then, a colleague had an idea: what if the file was not damaged on disk? What if the file was in Windows file cache, and that bit in RAM became damaged in the process. That made sense, since the machine doesn't have ECC so a RAM error would go unnoticed. The machine did have 8GB of RAM, but it sounded like a bit of a stretch to expect that the file would still be sticking in cache even after we ran an another build, since each build generates several GB of just data files, not to mention intermediates, etc.

Nevertheless, just to be on the safe side, he rebooted the machine. Lo and behold - the file was now magically correct!

This kind of errors are often attributed to cosmic rays , but in normal work I always considered that more of a joke, not a real possibility. Guess I'll have to reconsider that notion.

Lessons learned: Don't add too much RAM to build machines. Seems it can have detrimental effects on build stability sometimes. Also, hardware is always right - except when it isn't.


 

1A careful reader will notice that the last two might also be caused by "cosmic rays". But who knows. It is hard to diagnose something like that in a closed source executable, let alone an entire closed-source OS.
2Why the heck do we do nightly builds on branches where no one is working on? Precisely for situations like this. So that if anything breaks the machine, we know it ASAP. If (when!) someone suddenly needs to urgently ship a patch on that branch, we want to make sure the build can be done, not to have to fight exotic hardware problems that appeared months before - while no one was watching.

Reselling Steam Games

In the light of the recent news that Valve Software was recently sued in Germany over rights to resell used games, I'd like to address some ramblings I've had lingering in my head for a while.

First of all, let's make it clear that I Am Not A Lawyer. So, for what it's worth, here's why I think lawsuits like that don't have substantial grounds. Basically, they are referring to the Oracle vs UsedSoft case from last summer, where the European Court of Justice ruled that software licenses can be resold, even if they are for digital downloads. The ruling is widely claimed to be a precedent that EULAs claiming the software is "licensed, not sold", don't guard the publishers against the first-sale doctrine. There is hardly much to say about it. It does make sense.

However, does this really apply in the case of the Steam platform? What the court said there was this: The principle of exhaustion of the distribution right applies not only where the copyright holder markets copies of his software on a material medium (CD-ROM or DVD) but also where he distributes them by means of downloads from his website. But Valve doesn't just sell users digital copies of the software. They sell access to a service whose one part is that it provides unlimited digital copies to the buyer. It also happens, for most games, to provide services that are required for the game to work.

Granted, if you buy a totally DRM-free game from Steam (the examples are rather rare, but even some Serious Sam games were made available on Steam completely DRM free), then yeah, according to the ruling, you can resell that to someone else. Note that according to the ruling, you are then legally obliged to "destroy" all copies of the game in your possession. However, for most Steam games, if you resell to a second-hand buyer the "digital copy" of the game (by giving them the files on the disk, and destroying them on your side) - it's worthless. The game won't work without access to Steam services, and from an account that "owns" (i.e. has a license to) that game.

For me, the case here is clear: unlike when buying a game from a pure online distributor(so called e-tailers, like e.g. GoG) , when buying a game on Steam, you are not buying a product. You are buying a license to use Steam services for that game. So, unlike the attempt to turn a software into a license, as normal EULAs try to do (with largely controversial and still unclear legal ramifications), this really is a license to a service.

Ok, but let's say the District Court of Berlin doesn't agree with my conclusions and they rule that Valve has to provide facilities to allow users to resell their games, with no cut of that transaction going to the original publisher. Would that be beneficial to the customers in the end?

First of all, there is no set limit on the lowest price for first-sale doctrine resales. It could in fact be zero. Oh, but that means... you can give a game to your friend's account for free when you are done playing it. And he can then give it to another friend of yours. Any of them can give it back to you. In fact all your friends, family and relatives, neighbors, etc... can share the one game you paid once.

It is just like with physical objects - you can lend and borrow them. Sounds great, doesn't it? Just that it isn't. Both of that. Such "borrowing" is not same as with physical objects, and it is not nearly as great as it sounds.

Would you lend me your favorite ______, please? (Fill in the blank with something like book or bicycle, pen, DVD movie... or a boxed version of a game - for that matter!)

There are some true altruists in the world, and there are people who don't care about stuff, but if you are like 95% of people, chances are you would be a bit reluctant. Not (only) because you would fear that I will not return it - but because it might break, wear, tear, I might lose it, etc... None of that can happen to a digital good - especially one that a "cloud" provider keeps online for you, and will deliver a fresh copy on demand.

As you can imagine, such an ability would quickly lead into a total chaos of "sharing", not that different than the piracy scene: One person bought the thing, and all others are using it. Granted, only one person can use it at a time, so one copy cannot "seed" the entire Internet of "leachers". But it can multiply at least 10x or more. And what's worse, for each such "borrowing", Valve would have to provide a download for the new user. For free, as that is the term of the license - owner can download the game as many times as he wants. Most people don't exercise that, especially not with all games.

So, with lower income per actual user, what will Valve (and publishers on Steam) do? You can bet the first next thing would be raising the prices, and probably ditching the steep discounts that Steam is famous for. See, one fallacy about software pricing, and for games especially, is that it is not about how much one copy "should cost". It is about what is the total earning of a project. This is especially important with smaller and Indie developers (like Croteam, the company I work for). If the total sum of money earned by a project is not greater than the sum of money spent developing it, there will not be a "next game" from that developer. Plain and simple. So, allowing resales absolutely guarantees that either the prices will go up (since the first buyer now has to pay for all the second, third...), or developers won't cover their expenses and games don't get made.

Even worse than pricing is that, when presented with the fact that this would mean Valve would have to allow each second-hand, third-hand, ... n-hand user to download the game from their servers; some people suggested that Valve imposes a limit on the number of downloads. Can you believe that? This is against the very idea of Steam.

All in all, you can't have your cake and eat it. [*] Steam is a great distribution platform, providing unmatched support and excellent prices. The fact one has to accept about it is that it is based on you not "owning" the game in the physical way. The case in point seems to be yet another one of those where so called "consumer protection" organizations are actually working against the best interest of the consumer.

[*] The cake is a lie, anyway. Or was it pie?

Disclaimer: I work for Croteam, and we sell games on Steam. But this article does not represent opinions of either Croteam or Valve.

A Simple and (Rather) Secure Mirroring Using rsync Over ssh

When it comes to remote backups, there are certainly many ways to skin a cat. In this particular case, I will show a little known, yet rather simple, approach that is able to exactly mirror all files together with permissions and ownership information; and isvery secure (as in it is hard for a third party to hack into your stream and capture or alter data, or gain access to your systems).

Continue reading