\documentclass[a4paper,twocolumn]{article}

\usepackage{latexsym}
\usepackage{xspace}
\usepackage{graphicx}
\usepackage{url}
\usepackage{amssymb}

\newif\ifpdf
\ifx\pdfoutput\undefined
\pdffalse
\else
\pdfoutput=1
\pdftrue
\fi

\newcommand{\tm}{\texttrademark\xspace}
\newcommand{\R}{\textsuperscript{\textregistered}\xspace}
\newcommand{\ppc}{PowerPC\xspace}
\newcommand{\iic}{I\textsuperscript{2}C\xspace}
\newcommand{\benh}{Benjamin Herrenschmidt\xspace}
\newcommand{\coreocp}{\texttt{core\_ocp}\xspace}
\newcommand{\devel}{\texttt{linuxppc\_2\_4\_devel}\xspace}
\newcommand{\birec}{\texttt{bi\_record}\xspace}
\newcommand{\PM}{power management\xspace}
\newcommand{\elap}{eLAP\xspace}
\newcommand{\mv}{MontaVista\xspace}

\newcommand{\fixme}[1]{$\bigstar$\emph{\textbf{\large #1}}$\bigstar$}

\date{}
\title{Linux on the eLAP}
\author{David Gibson\\
  \emph{OzLabs, IBM Linux Technology Center}\\
  {\normalsize \tt <{dwg}{@}{au1.ibm.com}>}\\
  {\normalsize \tt <{ukuug}{@}{gibson.dropbear.id.au}>}
}

\begin{document}
\bibliographystyle{plain}

\maketitle

\begin{abstract}
  The IBM\R PowerPC\R 405LP is an embedded CPU designed especially for
  handheld applications.  Like other PowerPC 4xx CPUs, it includes a
  number of peripheral devices built into the CPU die itself, such as
  an LCD controller, PCMCIA controller and serial ports.  The 405LP
  also includes a number of novel power management features.  As well
  as being able to power on and off the various chip components as
  necessary, it can adjust the CPU core and internal bus frequencies
  with very low latency allowing the the chip to provide high
  performance when necessary, but low-power operation the rest of the
  time.
  
  The IBM PowerPC 405LP PDA Reference Design, also known as the
  Embedded Linux Application Platform, or \elap, is a sample board
  made by IBM based on the 405LP (see \cite{press:tomshardware} or
  \cite{press:linuxdevices}).  It is designed to run Linux and
  demonstrate the 405LP's capabilities.  It is expected that handheld
  devices derived from this design will be made and sold by third
  parties.  The hardware in this device is very different from a
  typical PC or server, so we examine what is involved in getting
  Linux to deal with the device's peculiarities and take advantage of
  the hardware's features.  We pay particular attention to power
  management.
\end{abstract}

\section{The hardware}

\subsection{The CPU}

The IBM \ppc 405LP is a CPU from the \ppc 4xx family.  This series of
CPUs is designed for ``system-on-chip'' embedded applications.  As the
name suggests these processors are implementations of the \ppc
Architecture\tm, however they have some notable differences from
``classic'' \ppc CPUs (as used in IBM pSeries\tm servers and Apple
workstations).  The 4xx CPUs operate at much lower clock rates (and
hence are cooler and cheaper), although they are in the high end by
embedded standards.  They have a much simpler MMU (just a software
loaded TLB) and they have no floating point unit.

More interestingly, the 4xx CPUs include a number of peripheral
devices built into the CPU die itself (hence the term
``system-on-chip'').  Different CPUs in the family are designed for
different applications and so have different sets of on-chip
peripherals.  The 405LP is aimed at handheld, battery-powered
applications and includes an LCD controller, PCMCIA/Compact Flash
controller and real time clock along with more general purpose
devices.  Other 4xx chips can include devices such as Ethernet
controllers, HDLC interfaces, PCI host bridges, IDE and USB
controllers.

The 405LP also includes a number of features to reduce power
consumption, some of which are quite novel.  The chip's base power
consumption is already quite low: it will run at 266MHz with no
heatsink or fan.  The various on-chip devices can be individually
turned on and off, saving power when they are not in use.  In
addition, the CPU clock speed and the frequencies of the several
internal buses can be adjusted while the CPU is operating.  This
allows the OS to adjust the CPU's compute and IO performance, allowing
a high peak performance when necessary, but keeping power usage to a
minimum the rest of the time.  This adjustment can be done quickly
(microseconds in many cases), and without disrupting the running
system, which allows the system to take advantage of even brief
periods of inactivity to save power.

The 405LP can also operate at a number of different voltages, and the
voltage can also be adjusted while the CPU is running (although with
much higher latency).  With appropriate board-level support, this
allows for further savings in power consumption.  Power usage is
directly proportional to the frequency, but to the square of the
operating voltage (for a fixed frequency).  The maximum frequency is
roughly proportional to the voltage.  Thus, voltage scaling can
achieve much greater power savings than frequency adjustment alone.

\subsection{The \elap}

\begin{figure*}[t!]
  \centering
  \ifpdf
  \includegraphics[width=\textwidth]{elap.pdf}
  \else
  \includegraphics[width=\textwidth]{elap.eps}
  \fi
  \begin{tabular}{l|l}
    \tiny \sf
    \begin{tabular}{rp{0.35\textwidth}}
      \textsf{APM} & Power management unit, implements the CPU
      sleep/suspend states (NB: this is not related to the APM BIOS in
      PC laptops) \\
      \textsf{Audio Codec} & Texas Instruments TLV320AIC23 stereo audio
      codec \\
      \textsf{Battery ADC} & ADS7823 ADC monitoring battery
      voltage \\
      \textsf{CPM} & Clock and Power Management unit \\
      \textsf{CSI} & Codec Serial Interface (interface to audio codec
      devices) \\
      \textsf{DMAC} & DMA controller (can DMA to/from devices on
      both the PLB and OPB) \\
      \textsf{EBC} & External Bus Controller \\
      \textsf{Ethernet MAC} & RTL8109 Ethernet controller \\
      \textsf{Frontlight Ctrl} & DS1050 LCD frontlight controller \\
      \textsf{GPIO} & General Purpose IO interface \\
      \textsf{IIC} & \iic bus interface (both master and slave capable) \\
      \textsf{LED Ctrl} & BU8770KN tricolour LED driver \\
      \textsf{LCDC} & LCD display controller \\
    \end{tabular} &
    \tiny \sf
    \begin{tabular}{rp{0.3\textwidth}}
      \textsf{MDOC} & 64M of M-Systems Millennium Plus Disk-on-Chip
      (NAND flash with  specialised controller) \\
      \textsf{PCCF} & PCMCIA and Compact Flash controller (16-bit
      only, not CardBus)\\
      \textsf{POB} & PLB-to-OPB bridge \\
      \textsf{RTC} & Real Time Clock \\
      \textsf{SDIO} & Toshiba TC6380AF Secure Digital and SDIO
      interface \\
      \textsf{SDRAM} & SDRAM controller \\
      \textsf{TCPA} & Atmel AT97SC3201 TPM \\
      \textsf{TDES} & Triple-DES accelerator \\
      \textsf{TS Ctrl} & Semtech UR7HCT52\_S840L touchscreen
      controller \\
      \textsf{UARTx} & NS16550 compatible UARTs \\
      \textsf{UIC} & Universal Interrupt Controller \\
      \textsf{USB} & Phillips ISP1161 USB interface (includes both
      host controller and target interface) \\
    \end{tabular}
  \end{tabular}
  \caption{Block diagram of the \elap}
  \label{fig:elap}
\end{figure*}

The IBM \ppc 405LP PDA Reference Design (also known as the Embedded
Linux Application Platform or \elap) is, as the name suggests, a
handheld design based on the 405LP and designed to demonstrate the
CPU's capabilities in a handheld device.  Figure \ref{fig:elap} shows
a block diagram of the \elap, including the devices built into the
405LP itself.  In addition to the devices within the 405LP, the \elap
includes some RAM and flash memory and a number of additional
peripherals.  Most of these are connected via a simple bus driven by
the 405LP's on-chip External Bus Controller (EBC) unit.  In particular
it has several ways of attaching further external devices: in addition
to the PCMCIA controller built into the 405LP, the \elap has both a
socket for Secure Digital (SD) memory and IO cards and a Phillips
ISP1161 USB interface.  This chip is both a USB host controller
(allowing USB devices like keyboards and mice to be connected to the
\elap) and a USB client interface (allowing the \elap to be connected
to a PC as a USB device).

An extra debug and development sled can be attached to the \elap, also
shown in Figure \ref{fig:elap}.  It includes an Ethernet controller,
the physical PCMCIA slot driven by the 405LP's PCCF core and the
physical connectors for the USB host port and serial port.

\section{Current support}

As the name ``Embedded Linux Application Platform'' suggests, the
\elap is intended to have Linux as its OS.  Although work is still in
progress in some areas, kernel support now exists for most of the
hardware and features of the device.  The development for this has
mostly been done by IBM and \mv Software building on the code
supporting other \ppc 4xx processors which has existed for some
time\footnote{\cite{dwg:linux-on-4xx} covers Linux 4xx support more
  generally.}.

Unfortunately, for various historical reasons, the code for the \elap
has not yet been merged into the standard 2.4 or 2.5 Linux development
trees.  In fact what little 4xx support is in the 2.4 tree is broken,
and the 4xx code in 2.5 is quite incomplete.  Eventually the 4xx and
\elap code should be merged back into the mainstream trees --- once
someone can find the time to do the necessary cleanups, consolidation
and merging.  For now, most of the development for 4xx based machines,
including the \elap, takes place in the \devel BitKeeper tree
(\cite{bk:24devel}).

\mv Software maintains a distribution for embedded devices, which has
been used as the basis for the userspace software on the \elap.  This
distribution also includes a kernel which includes the \elap and other
4xx support from \devel (or at least will in a future release).  It
also includes a few extra pieces for the \elap which have not yet been
merged into the \devel tree.

\subsection{Booting and the basics}

As a handheld device, the \elap is designed to boot from flash memory.
The \elap's flash is wired so that the CPU will execute the \elap
firmware, PIBS (\ppc Initialization and Boot Software), from flash at
power-on.  PIBS performs some basic hardware configuration (such as
initialising the 405LP's SDRAM controller) then boots the Linux
kernel.  It does this by loading the kernel image into RAM then
jumping to it, much as LILO does on a normal PC.

Usually the kernel image is stored in another section of flash, but
for development purposes PIBS can also load a kernel over the serial
port, or (if the development sled is attached) over Ethernet.  The
kernel image will usually include an initial ramdisk with a basic
userland for the device.  The startup scripts will then usually mount
a JFFS2 filesystem which takes up the remainder of the flash memory
and contains the rest the userspace programs and data.  For
development it is also possible to use an NFS root filesystem.

The kernel includes drivers for the 405LP's built in serial ports, and
uses one of these (UART0) for the console.  Of course the console is
only expected to be used for development and debug. The kernel also
has full support for the 405LP's LCD controller (framebuffer device)
and the Semtech touchscreen controller attached to the second serial
port.  The normal interface to the device is through the Qtopia GUI
running on the handheld's built in LCD display.

\subsection{Bells and whistles}

In addition to the devices needed for basic operation, most of the
peripherals on the 405LP and the rest of the \elap are supported in
the \devel tree.  The 405LP's real-time clock and GPIO lines are fully
supported, as is its PCCF (PCMCIA and Compact Flash) controller which
drives the PCMCIA slot in the \elap's development sled.  Likewise the
\iic interface (IIC) is supported as are the \elap's \iic devices.
The TI audio codec which is controlled via \iic is supported in
conjunction with the 405LP's CSI, to form a complete audio device.
The \elap's buttons and its multicolour LED, accessed via a custom
FPGA, are both supported, as is the RTL8019 Ethernet on the
development sled.

The Phillips USB chip is partially supported.  There is a driver for
the host controller side of the chip, although this has not been
merged into the \devel tree (it is in \mv's kernel).  Similarly a
driver for the Atmel TCPA chip exists, but only has an external module
which has not been merged into the development tree.  In any case the
driver is of very limited use without considerable userspace library
support, which is still in progress.  A driver for the Toshiba SDIO
chip has been written by Toshiba with IBM's assistance, but is only
available in binary form.  SD Association rules prevent Toshiba from
releasing specifications to write an open source driver, with luck
this will change eventually.

\section{Continuing and future work}

\subsection{Missing drivers}

Although, as we have seen, most of the the \elap's hardware is already
supported, a few devices still lack drivers.  The most obvious of
these is the \mbox{M-Systems} Disk-on-Chip (MDOC), which comprises the
bulk of the handheld's storage.  A driver for this device, developed
by M-Systems and adapted for the \elap with IBM and \mv's assistance,
does now exist.  Unfortunately a number of technical hitches ---
hardware and software --- have delayed getting problems ironed out of
this driver.  The most difficult problems now seem to have been
resolved, so MDOC support should be working in the near future.

Using the USB target side of the Phillips USB chip --- allowing the
\elap to act as a USB device --- is, as yet, wholly unsupported.
Adding support for this is made more difficult by the fact that,
unlike USB host controller support, USB gadget support is not yet in
the mainstream kernel trees.  However, some groups, notably
Lineo\footnote{See \url{http://opensource.lineo.com/usb/}}, have
worked on USB gadget support, so it should be possible to use this
work as a basis for a driver for the \elap.

The 405LP's triple-DES accelerator (TDES) is also unsupported thus
far.  It is quite a simple device to operate, however, and a driver
should appear in due course --- so far it has not had a high priority.

\subsection{Power management}

The obvious importance of reducing power consumption in a battery
powered device makes support for \PM on the \elap particularly
interesting.  A number of power management techniques are already
supported on the device, but work continues to best take advantage of
the 405LP's features in this area.

The simplest form of power management supported by the \elap is
suspend and resume, that is, powering off most of the device while
preserving the state of the OS and applications.  This is just like
the ``sleep'' modes supported on most laptops.  The 405LP's APM core
supports several different methods of preserving the CPU state.

In the \elap, suspend-to-RAM is implemented by saving the CPU's
register state to RAM, ensuring caches are flushed, then switching the
SDRAM into a self-refresh mode to preserve its contents.  The device
is then powered off, except for the RAM and the power management units
themselves.  The suspend code also records a special flag in an unused
bit in one of the real time clock registers (which are preserved while
the device is off, of course), and stores a pointer to a restore
routine at a fixed address in memory.  When the device is powered on
again, the PIBS firmware checks the flag to determine that the device
has been suspended and instead of booting normally passes control to
the Linux restore routine.  This restores the register state from
memory and resumes running.

In order for this to work properly, device drivers also need to shut
down devices before suspending, then re-initialise them and restore
their state after resume.  The most important \elap drivers, such as
the LCDC driver do this already, but some other drivers still need to
have suspend support added.

Similarly, some drivers already support powering the device they
control off when possible to save power.  Other drivers still need to
be enhanced to save power in this manner.  The power to the on-chip
devices is controlled by the Clock and Power Management (CPM) core,
while power to most of the off-chip devices is controlled by registers
in the FPGA or by GPIO lines.

\subsubsection{Dynamic Power Management}

Of course, by far the most interesting aspect of \PM on the 405LP is
its ability to adjust operating frequency and voltage while running.
Linux support for this is under continuing development by IBM and \mv.
This and other means of saving power while the machine is running are
known as ``dynamic power management'' (DPM).

This simplest way of using this capability is to allow the user to
explicitly select a particular frequency and voltage mode, and this
mode of operation is already supported.  This approach is analogous to
that used by the \texttt{cpufreq}\footnote{See
  \url{http://www.brodo.de/cpufreq_old/}} project which allows
frequency adjustment on a number of other CPUs (although the 405LP
implementation is, for now, independent of the \texttt{cpufreq}
infrastructure).  However this fails to really make use of the 405LP's
abilities: because the 405LP can adjust frequency with particularly
low latency we want to take advantage of this by automatically
adjusting the frequency to match runtime requirements.

How to best automatically manage frequency and voltage scaling is
still under active investigation.  A promising scheme developed at
IBM's Austin Research Lab is described in \cite{ibm:dpm} and partially
implemented for the \elap.

The essence of this approach is for the kernel to keep track of which
``operating state'' it is in at any moment.  The operating states
include things such as ``idle'', ``executing user code'', ``executing
an interrupt handler'' and so forth.  If different processes on the
system have different compute or IO requirements they could have
different operating states e.g., ``IO intensive task'' or ``compute
intensive task''.

When the operating state changes, the kernel consults a DPM ``policy''
to map the new state to an ``operating point'' --- that is, a set of
frequency and voltage parameters.  The DPM policy can itself be
changed at runtime in order to adapt to longer term changes in
performance requirements.  This is handled by a userspace program
known as a DPM policy manager.  Suitably modified application programs
could give hints to the policy manager to let it better anticipate
their performance requirements.  For example, a video playback program
might inform the policy manager that it needs a certain amount of
compute time in order to maintain full speed playback.

The problem is complicated by the fact that some devices in the system
may only operate properly at certain bus frequencies.  These devices
place constraints on what operating points can be selected when they
are active.  The scheme in \cite{ibm:dpm} allows drivers to register
these constraints with the DPM subsystem so that they can be properly
taken into account.

\subsection{Consolidation and cleanups}

In addition to ongoing work to add support for currently missing
features on the \elap, some work on consolidation and cleanup of the
existing code will be needed as the \elap support is merged back into
the main kernel trees.  Many of the problems faced supporting the
\elap also affect other \ppc 4xx based machines or other embedded
devices generally.  This work is somewhat delayed by the need for
consolidation and cleanup of the general \ppc 4xx kernel code.

Most of the devices on the \elap --- particularly those built into the
405LP --- are quite different to what would be found on a normal PC or
server machine.  They are neither ``standard'' devices for the
architecture nor devices found on a standard bus, such as PCI, which
can be systematically probed.

At the moment the kernel locates most of these devices by ad-hoc
methods.  Many of the drivers are \elap specific and ``just know''
that a particular device exists at a particular address.  This
approach quickly becomes messy as the kernel supports more embedded
machines --- especially when a particular peripheral is included in
multiple embedded devices connected in slightly different ways.

The 4xx code in the \devel tree includes a partial solution to this
problem for the on-chip devices: the OCP (for ``On-Chip Peripheral'')
subsystem uses a table of devices for a particular CPU to properly
initialise each of the drivers, which can be common across multiple
4xx CPUs.  However, this system has a number of implementation
problems and an improved and consolidated approach handling both on
and off-chip embedded devices is needed. \cite{dwg:device-discovery}
covers this topic in more detail.

Finally, the \elap \PM code is quite 405LP specific at the moment.
However, it has a certain amount of overlap with the functionality of
the \texttt{cpufreq} mechanism.  The techniques used on the \elap are
also likely to be applicable to more embedded machines in the future.
At some point, thus, the \PM code will need consolidation into a
flexible and portable dynamic power management framework, though this
might be best left until it is better understood what techniques work
well in practice.

\nocite{ibm:405lp-um}
\begin{flushleft}
  \bibliography{linux-on-elap}
\end{flushleft}

\section*{About the author}

David Gibson is an employee of the IBM Linux Technology Center,
working from Canberra, Australia.  Most recently he has been working
on board and device bringup for Linux on embedded \ppc machines, along
with various bits of kernel infrastructure for cleanly supporting
PowerPC 4xx and other system-on-chip CPUs.  He is also the author and
maintainer of the orinoco driver for Prism II based 802.11b NICs.  In
the past he has worked on ramfs (as included in the \texttt{-ac}
kernel tree), and ``esky'', a userspace implementation of
checkpoint/resume.

\section*{Legal Statement}

This work represents the view of the author and does not necessarily
represent the view of IBM.

IBM, \ppc, \ppc Architecture and pSeries are trademarks or registered
trademarks of International Business Machines Corporation in the
United States and/or other countries.

Linux is a registered trademark of Linus Torvalds.

Other company, product, and service names may be trademarks or service
marks of others.

\end{document}

