Monday, May 08, 2017

Another book on algorithms?

I recently read "Grokking Algorithms" by Aditya Bhargava[1] and my initial reaction was "does the world really need another book on algorithms?" I was soon won over by the light yet accurate style and ended up really enjoying it. It is pitched at the beginner (or beginner's mind) and I think he's hit on a gap in the current market especially when it comes to shorter texts that enlighten and entertain. It's lucid and doesn't get bogged down in programming language details. He happens to use Python, but it's almost incidental. I'm not a fan of algorithms books using a programming language for reasons I'll go into further on. It got me thinking about the algorithms books I already possess, so I spent a bit of time skimming in order to form an opinion on what I like about them. So here is a list of the texts I've read, or partially read, to be completely candid; I have not read any cover to cover, but have studied enough of each to feel entitled to have an opinion about them.

Introduction to Algorithms, by Cormen et al (aka CLRS)[2]. For comprehensiveness and theoretical rigor this is hard to beat. It gives the most in-depth treatment of how to prove algorithm correctness, and if you're interested in academic rigor this is probably the best on the list. However, if you're coming from a practitioner's standpoint it can be a little daunting, and perhaps overkill. For example, the chapter on np-completeness starts by laying out a formal model for discussing the topic at hand, giving you an unexpected preamble to get to grips with before getting to np-completeness. Yes, it may be the most comprehensive treatment of the subject, but if you're feeling pressed for time this one can feel a bit like disappearing down a rabbit hole.

Algorithms, by Papadimitriou et al. Probably my personal favourite, this strikes a good balance between academic rigor and practical application. It's a shining example of communicating a technical subject using a mix of concise formalism and plain language. Case in point: the knapsack algorithm with and without repetitions is covered in under two pages in a manner that conveys the essence of the algorithm, and its variants, in a crystal clear manner. Impressive for a non-trivial algorithm. For balance between clarity and conciseness this is hard to beat. It's the one I try to emulate whenever I have to explain or document something.

Algorithms, by Sedgewick & Wayne.[3] Another one that gets a nice balance between academic rigor and implementation concerns.  On the minus side, it's the only book on the list that uses a programming language (Java), and while I've nothing against Java (or nothing that I won't go into here) I prefer to see a notation such as predicate calculus[4a] or TLA[4] being used for a couple of reasons: with a programming language you occasionally need to reason about the minutiae of the language, warts and all. In addition, you're being spoon-fed an end to end solution, and in my experience it leads to better understanding to implement an algorithm from a declarative specification rather than work backwards through code. But the authors take a nice API-first approach and steer clear of the more idiosyncratic features of Java, so this is a minor complaint. It is such a good book regardless.

The Algorithm Design Manual, by Skiena.[5] This is a great compendium, and is the one most aimed at the practitioner of all the texts on this list. There are "war stories" in each section that relate the topic at hand to real life applications, mostly drawn from the author's experience. In terms of breadth it probably covers, or at least references, more algorithms than any other text in this list. On the minus side many of the references use URLs and are obsolete thanks to the vagaries of internet time. In this respect it's probably the text that could most do with an overhaul, although it is particularly challenging to keep references to URLs up to date. Overall this is a minor point, I like the author's writing style and reach for this for insights into the variations of, or examples of the applications of, certain algorithms.

Algorithm Design, by Kleinberg and Tardos.[6] The one on the list I'm least familiar with, but I really like clarity of the writing in the parts I've studied. Algorithms are explained comprehensively, yet in a manner that is more digestible than say CLRS (to me anyway). It helps that the presentation of each algorithm topic is structured in three clear stages: problem statement, algorithm design and algorithm implementation, guiding you through the conceptual, logical and physical levels in a structured way. It recently gave me a valuable insight into the spectrum of scheduling problems, in particular how some stray into NP-complete territory. I think this is one I might be spending more time with in future.

Whatever about a "favorite" or "best", there's one thing that all these writers need to contend with, and that's change. It may be stating the obvious to say that algorithms are a dynamic and evolving field, but when you pick up a book (or five) you may subconsciously think it's all "solved problems", permanently fixed in ink for you to refer to in perpetuity. But there are always improvements to existing algorithms being made and new algorithms being created. Consequently, a text book may lag the state of the art often by several years (or decades in some cases). For example, you won't find Ukonen's algorithm[7] for building a suffix tree in linear time in any of the aforementioned publications. It, and other algorithms on strings, have been put to extensive use in biotech [8], a great example of how intense research in a subject - DNA sequencing in particular - forces the field to create new and better algorithms. New inventions and improvements do make their way into the standard texts; Robert Sedgewick published an improvement to insertions and deletions of red-black trees[13], a discovery that made it into the 4th edition of his book. So it pays to keep an eye on what's new in the new editions. Given that we're firmly in the era of many-core architectures, perhaps work in the parallelization of or making algorithms cache friendlier[14][15a,15b] will make its way into the standard texts.  Thanks largely to "big data", there's been extensive adoption of probabilistic algorithms [9][10][11] in the last few years, and increasing volumes of data has spawned an increasingly enthusiastic adoption of Machine Learning algorithms[12], so one could also speculate that these topics will see more coverage in future editions. All bearing in mind that the biggest challenge these authors face is probably deciding what to leave out...

Circling back to the list, I'm going to sit on the fence when it comes to a favorite, and reach for that well worn cliche "it depends", but it depends fundamentally on where you're coming from; if you need a rigorous, academic approach and want to learn about proof and correctness then CLRS is hard to beat. If you're coming from a professional practitioner's standpoint then Papadimitriou or Skiena will probably get you going more expeditiously. And, if your approach is beginner's mind, one I like to adopt because I like to approach problems from first principles, then Sedgwick is a good one. If you're stuck in a problem and are looking for insights or variations on a theme, Skiena might provide the necessary direction. If I had to pick one, it would be Papadimitirou, for the clear and succinct exposition. But the tl;dr is that it pays to have a few within reach, and different perspectives go a long way.

In conclusion, I'll briefly acknowledge there are some classic texts I have yet to read in any depth[16][17]. Some day...

Monday, April 10, 2017

And the (ACM) Award Goes To...

Congratulations to Sir Tim Berners-Lee, the latest recipient[1] of the ACM Turing award. Purists may  scoff[2], but in terms of producing something tangible that pushes forward the boundaries of technology and turns out to be a great social leveler, its hard to think of a more deserving receiver of this accolade. Berners-Lee is not a practitioner of pure computer science in the same vein as a Codd, Hoare or Milner, to name three previous Turing awardees (who also happen to be - or were - his compatriots). His background is Physics, and he was doing physics at CERN at the time the first web servers came online. So the Web was essentially a product of applied computing, building on several ideas[3][4][5], and its refreshing that the ACM are willing to recognize such a contribution, in addition to more fundamentally theoretical ones.

Berners-Lee is described as as the "Inventor" of the World Wide Web in the ACM's announcment[6]. There's something quaint about the term 'inventor'. Not commonly used in modern technical language (except perhaps for patents), it feels like a throwback to a previous age. In this sense, perhaps Berners-Lee's achievements are closer in spirit to those of an Edison or a Tesla, than to his previously named compatriots. In terms of personal impact, here's my "where were you when..." story on first encountering the web; I was in grad school, in the early 90s, and at that time academia was a heavy user of the internet (probably the most so). The tools of the day were ftp, gopher and wais, news, telnet and others I no longer recall, all requiring rudimentary computer skills to use, yet different in each case. At some point, possibly around early '93, the web became impossible to ignore. I remember going to a talk entitled "mosaic - is this the coolest program ever?", and before long our research group had put up a web server and we were tinkering with personal web-sites, and it became clear this was a seismic shift. A unifying, simple to use interface to the internet. Point and click instead of arcane and cryptic command line protocol syntax. A brilliant invention!

Like all brilliant inventions, the web has taken on a life beyond its creator's vision; I'm not sure if Berners-Lee had in mind that the web would become the underpinning communication mechanism between (and within) today's internet-facing systems, but with the ubiquity of APIs built on the Web, and in particular the work of Roy Fielding[7], it has become the de facto way for software systems, as well as humans, to communicate on the internet. Not immune to criticism[8], it has nonetheless enabled services and infrastructure that are genuinely planet-scale.

This is also a timely announcement, particularly given the recent decision in the US to enable ISPs to harvest customer data[9], never mind that the various search and social media data trails we leave have long been put to work by their seemingly benign enablers[10][11][12]. All of which serves only to exacerbate the feeling that the internet has become corporatized to the extent that to partake in daily life there is no choice but to be on it, yet to do so is to merely be a sitting duck for whoever wants your data. In short, a darkening, capitalist dystopia. Berners-Lee, to his credit, never hesitates to speak out[13] on where it has gone wrong and how it can be made better. So lets all celebrate the original democratizing spirit of the web and follow his lead in demanding change, and changing it for the better.

Friday, January 26, 2007

Why you should do computer science

It's that time of year when many young people are considering what to do when they leave school and enter university. I want to take this opportunity to strongly urge anyone with an interest in technology to really consider computer science. Unfortunately, there has been a shortfall in the number of graduates coming out of computer science in recent years, and a perception that the dot com era and outsourcing make this an outmoded option. This is unfortunate and also very wrong. Check the 2007 economist yearbook to see where eCommerce is headed in the coming years, in the words of the old Yazz tune "the only way is up".

Let's leave the growing IT markets to the Economist, and just say you do have a keen interest in tech, maybe curious about your iPod or how Bebo works, you are handy at mathematics and interested in science. It boils down to your point of view - do you see yourself as the source of creativity and invention, or the source of service support? Where would you rather be? Doing computer science, even if you never actually program in your professional career, will give you a huge headstart in getting into the former, infinitely more interesting camp. You will get a grounding in how software is developed, what is involved, and set yourself up for a good career in development, project management, or maybe start your own company.

I think that outsourcing has been a good thing for the IT industry, because it has provided extra resources, in the form of people power. Huge opportunities are afforded by having these resources available - your idea for a social networking site based on musical and entertainment figures, that will require a high quality hosted services environment ? Suddenly got a lot more feasible with the availability of 100s of programmers and a range of hosted service companies to choose from.

I believe that instead of killing off software development, the direct opposite has happened as a result of out-sourcing. The pace of software development and the scope for creativity and innovation got a huge boost, leading to an acceleration in the development and delivery of sophisticated software. In the developed world, we are in pole position to drive the direction of software. Do you want to get in on the act?

Labels: , ,