Digg’s woes continue with their new architecture. The rewrite is a mythical dragon in the software world that lives in a cave and occasionally wipes out entire cities. The idea for a web application is that you create all the software from scratch, leaving the data in tact, then on the big day you do a sleight of hand and “voila!” your site is running on a brand new platform. You avoid all the mistakes you made along the way in architecture design and rid yourself of nasty legacy code written by that snarky engineer you fired for incompetence years ago. Now your set for years of growth on a platform that scales with ease and grace. Everything is just faster. Engineers are happy. That’s the theory but it rarely happens in practice.
In reality, engineers are working overtime for months on end with no weekends slowly getting disillusioned by the sheer amount of work involved. The technical leads start seeing other unforeseen issues with the new architecture that lead to potential problems. Data needs to be preserved and migrated seamlessly. As the rewrite drags on code gets sloppier and what’s more there’s no feedback from the real world so people are coding against a subset of test data and not getting pounded by real traffic. When the rewrite does go live, people rejoice, and then the “wait” begins (negative feedback or bugs, lots of bugs). When the system starts coming down in flames everyone’s overworked and drained, not to mention a part of them let go when the switch was flipped, so it takes a lot of mental effort to get back into the flow. Stress levels are even higher than during the rewrite. People start panicking and yelling. Everyone’s unhappy and full of regret.
Of course, this is when rewrites go terribly wrong. Rewrites will always involve some kind of negative adjustment. There are cases where products vanish from the face of the earth after a rewrite because the product becomes something totally different or the competition keeps chugging along while you’re fixing problems. It can be done in style like Foursquare did. Reddit also rewrote their system (not counting the original version done in Lisp) from one framework to another. It can be painful but rewarding in the long-term like Apple’s rewrite of the OS which gives us OS X and provides the underpinnings of their current success.
Digg’s current rewrite involves both architecture and function which is rarely a good thing. On the surface it makes a lot of sense, if we’re going to write it from scratch why not improve the algorithms and functionality in one go. However, in practice, it’s hard to fix things that are broken because your fixing something totally different.
One thing about Digg’s current predicament is the circumstances the VP of Engineering was either fired or quit in the middle of this crisis. If he was fired, it begs the question why because he’s the one who made the decision and should be the one to fix it or guide the team in the right direction. If not, he shouldn’t be leading the team to start with. Of course, his past experience aside from being “VP of Engineering” doesn’t seem to indicate anything relevant to running a highly trafficked web application. If he did walk out on his own, that’s also not good because you’re basically leaving a mess behind you. Either way it’s career suicide and I wonder what would lead to this kind of drama.
On the plus side, Digg is getting lots of attention of late from all the drama surrounding the rewrite. The dust will eventually settle as the issues get sorted out and hopefully they’ll be able to incorporate feedback to make the site better.