LinuxConf Europe logo
LinuxConf Europe 2007
Conference and Tutorials
---------------------------------------------------
Sunday 2nd - Wednesday 5th September
University Arms Hotel, Cambridge, England

LinuxConf Europe 2007

Photos and reports

Timetable

Programme

Registration

LPI Exams

Conference Dinner (Sunday)

Duxford Excursion (Monday)

Exhibitors and Sponsors

Accommodation

Venue

Travel

About Cambridge

Kernel Summit 2007

Other GUUG events

Other UKUUG events

Fernando Luis Vázquez Cao - NTT Open Source Software Center

Generating a White List for Hardware which Works with Kexec/Kdump

The mainstream Linux kernel lacked a crash dumping mechanism from its inception until the recent adoption of Kdump. This, despite the fact that there were several solutions available out-of-tree and some of them were even included in major distributions. However concerns about their intrusiveness and reliability prevented them from making it into the mainstream (vanilla) kernel, the main argument being that relying on the resources of a crashing kernel to capture a dump, as they did, is inherently dangerous.

The appearance of Eric Biederman's Kexec patches and their subsequent inclusion in the kernel as a new system call paved the way for the implementation of an idea that had been floating around for some time: the use of a memory-preserving soft-booted kernel to capture the crash dump. This was the approach adopted by Kdump, which made it possible to achieve high reliability by isolating the crash dumping process from the crashed kernel.

In theory, Kdump's approach constitutes the most reliable way of capturing a dump. Even though testing proved the theory right (i.e. Kdump is much more robust and reliable than in-kernel crash dumping solutions), some deficiencies in Kdump were revealed too.

Kernel crash dumping is a multi-stage process which involves three basic operations: detecting the crash, a minimal shutdown of the previously running system (i.e. the crashed kernel), and, finally, the capture of the crash dump. Kdump is very good at the first two but there are still some issues when the dump capture kernel takes control of the system. In particular the new kernel may fail to initialize the underlying devices which, in turn, is likely to lead to a kernel panic or an oops.

The underlying problem is that the state of the devices during a kdump boot is not predictable because no device shutdown is performed in the crashed kernel (it cannot be trusted), and the firmware stage of the standard boot process is skipped (the dump capture kernel is a soft-booted kernel after all). In other words, the inherent assumption that the firmware (known as the BIOS on some systems) is always there to do the dirty work is not valid anymore.

The Linux Kernel in general and drivers need to be improved so that they are able to boot in potentially unreliably environments, which with the advent of soft-reboot mechanisms such as kexec is likely to become a common scenario. But this is bound to be a painstaking and never-ending task, which requires the creation of a white-list that is updated as bugs are fixed and new hardware appears. This paper discusses possible ways of fixing the aforementioned reliability problems and an automated testing method that can be used to create a white list for hardware that works with kdump.

Submitted paper

Paper (PDF) and Paper (tgz) .


G O L D  S P O N S O R  S I L V E R  S P O N S O R 
Intel
Intel
Google
Google

S  P O N S O R S
Bytemark
Bytemark
Sun
Sun
Novell
Novell
Positive Internet
The Positive Internet Company
collabora
collabora

M  E D I A   S  P O N S O R S
Linux User
Linux User & Developer
Linux Magazine
Linux Magazine
The USENIX Association
The USENIX Association

For more information please contact UKUUG Problems? e-mail webmaster
© Copyright 2007 UKUUG Ltd