The Linux kernel has proven itself a very stable and high-quality base for several application domains. However, using standard Linux to support an application that requires real-time response features is problematic. This is a problem because Linux was designed to have good system throughput when many tasks compete for the systems resources, and to provide fair access to those resources for the competing tasks. In contrast, a real-time kernel provides an unfair share of the systems resources to the most important tasks, so these tasks can have guaranteed response to real-world events.
The approaches for adding real-time performance to Linux fall into two camps:
- Allow a small real-time executive to control the system and run standard Linux as one process under that real-time executive
- Modify Linux to provide real-time performance
In the past, a developer of a real-time application has had to make a large trade-off between these two different approaches. The real-time executive approach provided superior performance, but it could only be successful by forcing a developer to program under two completely different programming environments. This article presents an overview of the concept of shielded CPUs, a construct that allows an application to run under a single Linux kernel while guaranteeing interrupt response of less than 30 microseconds on a dual Xeon Pentium 4 system. This configuration compares to a guarantee of 19 microseconds when using the real-time executive scheme in RTLinuxPro 1.2.
The advantages of using a pure Linux environment for application development over the dual Linux/real-time executive approach are many. First and foremost is that a developer can simplify application design by having a single programming interface for all of the application. Using a pure Linux environment means there is only one set of tools for use in debugging and performance analysis. That tool set can view the system as a whole rather than have the limited scope of focusing on either the real-time executive or on the Linux-based application components at any given time.
Using Linux for application development also has many benefits because it has support for:
- Many device drivers lowering the overall cost of implementing a complete application solution
- A wide variety of high-level languages for better programming efficiency
- Commercial applications (that may not be central to the design of the real-time system but that are helpful during the development phase or that can provide additional functionality in the end system)
- Complex protocol stacks such as CORBA
- Extensive graphics capabilities
- Advanced application development tools
Besides all of the functionality available in standard Linux today, the engineering community is developing an ever-expanding list of features for the Linux operating system, due to the very strong momentum of the Linux phenomenon. By using Linux as the basis for an application design, a user will have many more options in the future in terms of new features that are now under development both in the open source community and at commercial software companies.
Some commercial Linux providers have made minor changes to the OS to provide limited soft real-time support. Other providers have made more extensive changes that allow for better guarantees on real-time response. An example of a Linux kernel that provides hard real-time performance guarantees is RedHawk Linux from Concurrent Computer Corporations Integrated Solutions Division.
Why standard Linux is not a real-time OS
A real-time application is one that must respond to a real-world event and complete some processing task within a given deadline. A correct answer that a system delivers after the deadline becomes an incorrect answer. The deadlines themselves are very application-dependent and can vary from tens of microseconds up to several seconds. For hard real-time applications, a system should not miss any deadlines. This means that worst-case measurements of system metrics are the only things that matter to a hard real-time application, since they are the cases that will cause a missed deadline.
Because an interrupt communicates the occurrence of a real-world event to a computer system, a real-time operating system must provide guaranteed, worst-case interrupt response time. In responding to an interrupt and giving control to the real-time application, the computer system has performed the first step to meet the deadline. Once the real-time application is running, the system must provide deterministic execution times. If the time it takes to execute the code associated with a real-time applications response varies widely, then the system will miss the deadlines.
To guarantee good interrupt response, the operating system must be able to quickly preempt any tasks that are currently executing when an interrupt occurs. Because the standard 2.4 Linux kernel series does not allow one task to preempt the execution of another task executing inside the Linux kernel, a kernel based on this series will have very poor worst-case interrupt response. A preemption patch is available to make a task that is executing within the Linux kernel preemptible. However, even in a Linux kernel with the preemption patch installed, there is a hidden problem that still causes very long interrupt response delays.
The job of any operating system is to coordinate the execution of the many tasks that share the resources of the system. A system can corrupt the data structures that describe these shared resources if it allows multiple tasks to access the data structures at the same time. Therefore, all operating systems have critical sections of code that tasks can access only in a sequential fashion. When a high priority task suddenly becomes runnable, because an interrupt it was awaiting has occurred, that task cannot take control of the CPU if there is another task currently executing inside of one of these critical sections. This means that long critical sections have a big impact on the ability of the system to respond to an interrupt.
In general, the more complex a subsystem is, the longer the critical sections are. Because Linux supports many such complex subsystems, such as file systems, networking, and graphics subsystems, its critical sections are very long compared to those in a small real-time executive. There are low-latency patches that make algorithmic changes that shorten some of the longest critical sections in the Linux kernel. The preemption and low-latency patches have greatly improved the responsiveness of Linux, but there are still many critical sections that can last tens of milliseconds. This delay is unacceptable for the deadlines that many real-time applications require. This is one of the primary reasons that standard Linux, even with the preemption and low-latency patches, does not offer performance guarantees acceptable to many real-time applications.
It is possible to identify and shorten the longest critical sections in Linux, thus improving the worst-case interrupt response. The problem with this approach is that it is essentially impossible to guarantee interrupt response times less than a millisecond. This glitch is because there are hundreds of critical sections in the millisecond range. To provide a guarantee of less than a millisecond would require such extensive modification to the Linux kernel that it would be impossible to maintain it against the evolving Linux base.
How shielded CPUs provide the solution
The shielded CPU model is an approach for obtaining the best real-time performance in a Symmetric Multiprocessor (SMP) system. Because the latest Pentium chips support hyper-threading (two virtual CPUs on a single physical CPU), a developer can even apply the shielded CPU model to a uniprocessor system if the chip supports hyper-threading.
In the shielded CPU model, a developer can assign tasks and interrupts to CPUs in such a way as to guarantee a high grade of service to certain important real-time functions. In particular, a high-priority task is bound to a shielded CPU, while most interrupts and low priority tasks are bound to other CPUs. On a shielded CPU, the system will never prevent a high priority task from responding to an interrupt because another task is currently executing inside a critical section on that CPU.
The shielded CPU model also allows for deterministic execution of a real-time application once it has control of the CPU. The biggest impact on a process execution time determinism is random interrupt routines that become active. This happens because interrupt routines occur at unpredictable points in time and they always execute at a priority above the highest priority process. A shielded CPU handles only the high-priority interrupts associated with the real-time tasks that execute on the shielded CPU, thus it shields the high-priority tasks from the unpredictable processing associated with interrupts.
RedHawk Linux supports the ability to dynamically mark CPUs as shielded via both a shield command and the /proc file system. Developers have used extended test runs under a heavy system load that included both graphics and networking activity to measure a worst-case interrupt response time on a shielded CPU that is less than 30 microseconds. For a complete description of these tests, see the white paper at http://www.ccur.com/rtdocs/wp-shielded-cpu.pdf.
. . . . .
For more information about Concurrent Corporation and its RedHawk Linux offering, visit www.ccur.com.
Stephen A. Brosky is chief scientist of the Integrated Solutions Division of Concurrent Computer Corporation, a leading provider of high-performance, real-time computer systems solutions and software for commercial and government markets. Steve determines future development strategies, defines new software projects, and actively participates in the design and development of new software products for real-time computers. He was also a member of the IEEE committee that developed the POSIX 1003.1b and 1003.1c standards for real-time application interfaces and thread interfaces.
Linux is a registered trademark of Linus Torvalds. All other trademarks referred to herein are the property of their respective owners.
|