UKUUG home

UKUUG

(the UK's Unix & Open Systems User Group)

Home

Events

About UKUUG

UKUUG Diary

Membership

Book Discounts

Other Discounts

Mailing lists

Sponsors

Newsletter

Consulting

 


 

High Performance Linux Clusters with OSCAR, Rocks OpenMOSIX and MPI Joseph Sloan
Published by O'Reilly Media
ISBN:0-596-00570-9
367 pages
£ 28.50
Published: 3rd December 2004
reviewed by John Hearns
   in the March 2005 issue (pdf), (html)
bookcover  

Anyone reviewing this book should be of course aware that this is the second foray by Oreilly into Beowulf clustering. The first, ``Building Linux Clusters'' by David Spector was not well received by the community. By coincidence, just when I was completing this review a rather uncomplimentary review of Sloan's book was posted to the Beowulf list.

This book does exactly what it says on the tin -- it gets you onto the path of constructing clusters using the above-mentioned packages. The book is divided into four parts: Introduction, Getting Started Quickly, Building Custom Clusters and Cluster Programming.

In the section regarding choice of hardware for a Beowulf system, there is a recommendation to make sure that a video adapter is included in the purchase -- in my experience any suitable motherboard these days has an inbuilt adapter, which is perfectly adequate for diagnostic use. The author also quite correctly recommends motherboards with PXE network booting capability, however then talks about using PXE ROMS in sockets on the board. PXE capability is part of the list of `must haves' for any motherboard suitable for a Beowulf these days, and you should not be concerning yourself with inserting boot ROMS these days. This section refers to motherboards which may refuse to boot when a keyboard is not detected -- in my opinion if you cannot set `ignore keyboard' in the BIOS then such motherboards are again not suitable for a Beowulf. Better to make an informed choice of motherboard based on these requirements before starting your project. As regards serial console access, we enable this on all our systems and find it a highly convenient way of accessing multiple rack-mounted systems. Using Cat5 cable for the serial connections makes for neat cabling, and a terminal server in the rack gives you direct access to all systems. The author does discuss this, however making the claim that this is ``a fair amount of work on most Linux systems'', which I don't agree with.

This section belies the book's bias towards the homebrew, ``build a cluster from a pile of donated machines'' philosophy. Things have moved on dramatically from those days, and Linux clusters are now an integral part of many scientific and engineering department's research tools, and critical to large business compute resources. The author too quickly dismisses rackmount cases as being expensive and ``for the high end''. On the contrary, the LOBOS (Lots of Boxes on Shelves) approach is an invitation to a snake's wedding, unless one person rules the cabling infrastructure with a rod of iron, and desktop PSUs are really not intended to take the 24 hour high loads of a cluster on full song. A proper rack mount is the professional way to do things, you'll get server-grade nodes with good cooling flows and future addition of nodes when you inevitably come to grow will be as easy as slotting them in the rack. The author certainly does discuss the issues of cooling and power requirements, but extracting heat from (say) a 128-node dual Opteron cluster takes a proper machine room infrastructure.

There is a short mention in Chapter Three concerning high performance networks for Beowulf clusters. Less than half a page does not do this topic justice, as it is an integral part of specifying and tuning a high performance cluster. Many clusters certainly do perform very well using gigabit ethernet. Using a motherboard with twin inbuilt gigabit, segmenting parallel traffic and general cluster/NFS as discussed here is a good technique. We also provide low-latency drivers which work on commodity ethernet. The author fails to flag up that the choice of gigabit switch is important for performance -- you can't expect a cheap office-grade eight-port switch to perform in a heavily loaded network.

However, many applications will benefit from a high-speed low-latency interconnect such as Myrinet, Quadrics QSNet or Infiniband. The choices of these interconnects, and benchmarking against your own applications is a fascinating part of clustering, and the subject of much debate, it deserves a more space in the book. Dismissing Quadrics and Infiniband as ``competitive technologies that are emerging or are available'' certainly does them no justice. QSNet is based on a mature technology (Meiko).

Quadrics is used in some of the biggest clusters in the world, and developed in Bristol to boot. Myrinet is commonly deployed in many clusters, both research and industrial. Quadrics and Myrinet continue to innovate, e.g. with future plans to utilise 10gig ethernet switches. This means they can use as many commodity components as possible. Infiniband is the newcomer to the market, but is gaining ground, and is moving from the R+D stage to position itself as a useful cluster interconnect.

The choice of an interconnect becomes increasingly important as node count increases. Scaling a network which can handle thousands of processors across a large fabric, without introducing bottlenecks, is the forte of these high performance interconnects. If you want to achieve the best performance, and scaling, you need one of these. The book's comment that ``these highly expensive technologies are no longer needed for most applications'' is misleading. They certainly do cost more than onboard gigabit interfaces and ethernet switches. However, if you are planning a new HPC cluster you should do the figures, and get the benchmark results. Your budget should aim to get the most computing performance for your budget -- and setting aside part of the budget for a good interconnect rather than piling on more nodes will achieve that, depending on the nature of your application. HPC is about network/memory performance, balance and tuning as well as adding nodes.

Another topic which the author doesn't deal with in depth is the choice of commercial compilers. Many high performance applications run significantly faster using an optimised compiler. It again makes sense to reserve some of your budget for a compiler. You can easily obtain an evaluation license for most compilers, and if you (say) prove a 30% speedup in your application it will be more cost effective than buying more nodes.

Chapter 4 discusses the choice of Linux distribution and how to configure it. It is certainly important to use a stable distribution, which supports your chosen hardware well. This probably won't be the latest and greatest test kernel. The book shows its slant here, again discussing using recycled hardware and having to use an older distribution to cope with this. This chapter is a useful high-level overview of the services which a typical cluster depends on -- DHCP, NFS, SSH, NTP and security, though it is really no substitute for some proper systems admin knowledge. When it comes to debugging problems with these services, there's no substitute for experience.

The section on cloning systems was interesting to me. I had not heard of g4u -- which is an equivalent to Norton Ghost for Linux systems. g4u has the nice feature of storing compressed images, but the downsides of having to reboot machines to make any updates and difficulty coping with different disk geometries make me think it is not that useful for HPC cluster installs. It certainly should be in your mind if you have to roll out many systems, eg. in a classroom or office environment. Kickstarting or image based install tools are the way to go with HPC clusters.

The next three chapters give details on openMOSIX, Oscar and Rocks. The first is a set of kernel patches which provide process checkpointing and migration. This gives the behaviour of a Single System Image machine, but is not truly an SSI mach aines the author implies. Oscar and Rocks are both frameworks for installing and deploying clusters, though with different features. One useful feature of Rocks is the inclusion of Rolls (geddit?) for the easy inclusion of additional software, eg. commercial compilers or batch systems. For my taste, both these chapters are a little too heavy on the ``this is how to download, and here are the exact steps to take'' approach. This is common in many O'Reilly books, and you should be prepared to download and follow the latest documentation for any package you implement, not solely depend on the book.

The section on batch schedulers deals with OpenPBS adequately, and is certainly enough to get you up and running. My only personal quibble is that little mention is made of one of the main alternatives, Sun Gridengine: http://gridengine.sunsource.net

We configure SGE on the majority of our clusters, and find that the free (as in beer) version and the commercially supported N1 Grid Engine version do what our customers want. The support on the SGE mailing list is excellent, and should you decide to give it a try -- see you on the list! However one can't expect an author to become an expert in all alternative batch systems just to write a book.

In such a fast-moving field, a book such as this will inevitably be slightly out of date. For instance, in the section on filesystems, there is a link to the OpenGFS project, and a comment that: ``RedHat markets a commercial, enterprise version of OpenGFS''. RedHat have now released GFS in open source again, and of course provide a supported version with their Enterprise Linux. Anyone considering GFS would be unwise to go with OpenGFS now.

The author discusses parallel command tools (C3 toolkit) and the Ganglia monitoring framework, both of which are important for the effective control of a whole set of machines. You want your cluster management to scale well -- clusters will only get bigger, and having the tools on hand to manage them will continue to be a topic for people to work on, and there are plenty of interesting problems to handle in that area.

The final section of the book is an introduction to parallel programming. After all, now that you've chosen the hardware, installed the distribution, compiled up your libraries you want the thing to DO something. And hopefully the skills you learn here, using standards such as MPI will be transferable to other installations and large setups.

This book is a good resource for anyone wanting to get started building a homebrew cluster, or a cluster as a learning project at a high school or university level. There are in addition lots of other sources of information, many included in the appendices. If you are working as a scientist or engineer, or working for a company which needs reliable, well-managed high performance computing I would urge you to also consult the online resources such as the original Beowulf site or Clusterworld magazine. One resource which stands out is Robert Brown's online book, released under the Open Publication License, based on his experiences of building clusters at Duke University. Another excellent treatment of clusters, by one of the original Beowulf team, is Thomas Sterling's ``Beowulf Cluster Computing with Linux''.

Linux clusters have come of age in the last five or six years, scaling on the one hand to be the leading systems in the Top500, with thousands of processors, and down to turnkey rack mounted clusters suitable for a workgroup or research group, which can be delivered and working the same day.

If you are specifying or purchasing a new cluster for your department or company then the `roll your own' approach isn't the best these days. Consult with the Beowulf community, ask your campus IT services for advice and the regional E-sciences centres. And any clustering company worth its salt will be happy to spend a good deal of time with you, finding out about your applications, giving informed advice about hardware choices, networking, high performance interconnects, system software and running benchmarks.

References: http://www.beowulf.org http://www.clusterworld.com http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf_book.php ``Beowulf Cluster Computing with Linux'', ed. Thomas Sterling. MIT Press.

Back to reviews list

Tel: 01763 273 475
Fax: 01763 273 255
Web: Webmaster
Queries: Ask Here
Join UKUUG Today!

UKUUG Secretariat
PO BOX 37
Buntingford
Herts
SG9 9UQ
More information

Page last modified 02 Apr 2007
Copyright © 1995-2011 UKUUG Ltd.