BLAST Ian Korf, Mark Yandell, Joseph Bedell
Published by O'Reilly and Associates
360 pages
£ 28.50
Published: 8th August 2003
reviewed by Damian Counsell
   in the December 2003 issue (pdf), (html)

Bioinformatics is the term people use to describe the young science of using computers to analyse genetic data, particularly in the form of sequences of genes (DNA) and their products (RNA and protein). I argue that ``bioinformatics'' describes all sorts of activities on the interface between biology and computing, but it's like trying to persuade people that computers are not just those three-box devices that sit on their desks, nursing Windows infections.

The most commonly used bioinformatics software belongs to a family of programs called BLAST. BLAST does a simple -- but difficult and powerful -- thing very fast: it compares one sequence of nucleotides (DNA letters) or amino acid residues (protein letters) with lots of others. Many biologists who would never call themselves bioinformaticians do a lot BLAST jobs; some spend more time running BLAST than they do walking around a lab wearing a white coat.

Of the three O'Reilly bioinformatics books I have looked at so far, I think BLAST is the best. It covers an important topic with a depth and breadth which, to my knowledge, no other book on the market has. I learned more from reading through the practical chapters during a couple of train journeys than in hours of wading through any other BLAST documentation. I am not exaggerating when I say that any biologist planning on doing a large number of BLAST searches -- especially with a view to publishing results or drawing significant conclusions -- should be made to read the relevant sections of this book first, well before going anywhere near a computer. The ``20 Tips To Improve Your BLAST Searches'' and ``BLAST Protocols'' chapters are alone worth the price of the book 28.50 to a practising molecular biologist.

O'Reilly's BLAST book may be the best we have, but it is not perfect. I found the theory chapters both fascinating and confusing: fascinating because they mentioned intriguing experimental results that should change the way people use BLAST in (for example) protein science and because they include concise Perl implementations of important algorithms; confusing because some of the explanations lost me and there were some important statements that I just disagreed with. Examples of the latter: scores are not (as the book says) necessarily metrics; genes that are 80 percent similar do not have an 80 percent chance of a common ancestor; genetic drift is about the ``flow'' of alleles into and out of a population over time and not about changes in sequence of a gene through time. A further illustration: this book contains a clear description of ``dynamic programming'' -- an approach central to most alignment algorithms -- but a poor account of evolution -- a process many believe central to most biological sciences.

There is much more of value in BLAST: rich information about installation, optimization, hardware and database issues; useful appendices and, of course, reference chapters on commands and options for the major BLAST versions. Although I believe there are some glitches in the text and I would have constructed the book differently (partly to engage the ``casual'' reader more completely) this is an important work. BLAST should be a talismanic tome against bad bioinformatic analysis in contemporary biomedical research, but because of the geeky aura of O'Reilly books and because of the priority it gives to theory, it may not be read by as many of the high priests of modern biology as it should; those that do read it may not do so as closely as they should. More fool them. But well done to the authors. I believe this volume will be referred to unambiguously by both computational and biological scientists as ``The BLAST Book'' for years to come.

