Sequence Analysis in a Nutshell Scott Markel and Darryl Léon
Published by O'Reilly and Associates
£ 20.95
Published: 7th February 2003
reviewed by Damian Counsell
   in the June 2003 issue (pdf), (html)

I spent a brief (and unpleasant) time working for an online scientific publishing company. Back then I liked that unoriginal rule of thumb: ``Any book entered via its index is better implemented electronically''. That ``rule'' often comes to mind when I pick up volumes in O'Reilly's ``Nutshell'' series. I try to avoid printing out documents out whenever I can, and, most of the time, I solve my computer problems more quickly by Googling than by reading, but I still find the dead tree format pleasing. (Of course, most O'Reilly publications, including this one, are available online via the Safari service so this point is becoming academic.)

Of all the currently fashionable topics in scientific, technical and medical publishing -- both online and on dead tree -- bioinformatics is among the hippest. Roughly, the term ``bioinformatics'' describes any use of computers to handle biological information. In practice, most people use the term to mean ``computational molecular biology'' -- the use of computers to characterize the molecular components of living things. Biological molecules are generally polymers; ordered chains of simpler molecular modules called monomers. Think of the monomers as beads or building blocks which, despite having different colours and shapes, all have the same thickness and the same way of connecting to one another.

These beads may make pretty necklaces (as any number of 3D graphics of molecular helices suggest), but some of the popular shine has begun to come off the new cell and molecular biological revolution: PPL Therapeutics, the company that cloned Dolly the Sheep, has decided not to build a £42m factory to manufacture drugs based on this technology and many pharma and biotech firms are shrinking their bioinformatics departments. Despite this, O'Reilly has, in the past few years, gone from rejecting bioinformatics book proposals for the want of a buyers, to finding bioinformatics books amongst its bestsellers. Its most recent bioinformatics-related title is ``Sequence Analysis in a Nutshell''.

Chains of DNA or protein monomers can be treated computationally as letters of an alphabet, put together in pre-programmed arrangements to carry messages or do work in a living cell. Since the ``completion'' of the Human Genome Project, the mission to read the order of all the monomers in human DNA, we have a great many more of sequences to analyze. There are, unsurprisingly, a whole range of techniques for interpreting these ``biological stories''. Most of these techniques come under the heading of sequence analysis. Most of the analysis is done on UNIX boxes.

Although there is still a search for standards in the relatively young science (engineering discipline?) of bioinformatics, certain data formats and collections of analysis tools have become more widely used than others in the area. The authors of ``Sequence Analysis in a Nutshell'', Scott Markel and Darryl Léon, have made some shrewd choices about which of these to cover in the 300 pages or so of this handbook. This is not surprising, given that they are both experienced PhD bioinformaticians. Of the data formats, they describe FASTA, GenBank/EMBL/DDBJ, SwissProt, Pfam and PROSITE. Of the software tools, they deal with more specialized packages like Readseq, BLAST, BLAT, ClustalW, HMMER, and MEME/MAST, plus the ultimate Swiss Army Knife (or perhaps that should be ``Swiss Army'') that is EMBOSS. The book also includes appendices containing various tables of information useful to practising bioinformaticians.

So what lifts this book above the level of Google searches? Firstly, the authors have done the hard work of gathering surprisingly scattered chunks of information together in one mass -- a neat, glossy mass which should fit easily on a shelf near your desk. Secondly, their work is packaged and produced to the usual high O'Reilly standard of typesetting and layout: the text is clear, consistent and tasteful (with a striking cover image of a liger). Thirdly, by the simple act of making an informed selection, Markel and Léon, have served the field by more clearly defining the de facto standard bioinformatics standards and systems.

This reference is sensibly aimed at the generalist, possibly in a commercial, administrative or service bioinformatics role who just needs to get things done. ``The liger book'' would also be especially useful to relatively inexperienced bioinformaticians or ones only superficially familiar with the tools it covers, for example, students tackling a research project. Both groups in particular would find it a handy ``meta tool'' to help themselves and help others.

