Here is an A4 version of thesis, and the (as will appear in print) B5 version is here
Simon Kågström, Håkan Grahn and Lars Lundberg.Scalability vs. Development Effort for Multiprocessor Operating System Kernels
Abstract
With multiprocessors becoming increasingly common, many operating systems have to be adapted to work with the multiprocessor systems. In this paper, we present a categorization of porting methods for multiprocessor operating systems. We also perform a case study of the evolution of multiprocessor support for the Linux kernel, both in terms of performance and implementation complexity.
Simon Kågström, Håkan Grahn and Lars Lundberg.The Design and Implementation of Multiprocessor Support for an Industrial Operating System Kernel, To appear in the International Journal of Computers and Their Applications
Abstract
The ongoing transition from uniprocessor to multi-core computers requires support from the operating system kernel. Although many general-purpose multiprocessor operating systems exist, there is a large number of specialized operating systems which require porting in order to work on multiprocessors. In this paper we describe the multiprocessor port of a cluster operating system kernel from a producer of industrial systems. Our initial implementation uses a giant locking scheme that serializes kernel execution. We also employed a method in which CPU-local variables are placed in a special section mapped to per-CPU physical memory pages. The giant lock and CPU-local section allowed us to implement an initial working version with only minor changes to the original code, although the giant lock and kernel-bound applications limit the performance of our multiprocessor port. Finally, we also discuss experiences from the implementation.
Simon Kågström, Balazs Tuska, Håkan Grahn and Lars Lundberg.Implementation issues and evolution of a multiprocessor operating system port, submitted for publication
Abstract
As multiprocessors become more and more common, operating system support for these systems is increasingly important. In this paper, we describe the evolution and performance of a multiprocessor port of a special-purpose cluster operating system. The port is based on an earlier prototype which serializes kernel execution. This paper describes experiences, problems and solutions from the transition to a coarse-grained approach. Our evaluation shows an improvement over both the uniprocessor and the serialized execution approach on a 4-core multiprocessor, and we also show how a lock-free heap improves performance for Java applications.
Simon Kågström, Håkan Grahn and Lars Lundberg.The Application Kernel Approach - a Novel Approach for Adding SMP Support to Uniprocessor Operating System, In Software: Practice and Experience, volume 36, issue 14, November 2006
Abstract
The current trend of using multiprocessor computers for server applications require operating system adaptations to take advantage of more powerful hardware. However, modifying large bodies of software is very costly and time-consuming, and the cost of porting an operating system to a multiprocessor might not be motivated by the potential performance benefits. In this paper we present a novel method, the application kernel approach, for adaption of an existing uniprocessor kernel to multiprocessor hardware. Our approach considers the existing uniprocessor kernel as a ``black box'', to which no or very small changes are made. Instead, the original kernel runs OS-services unmodified on one processor whereas the other processors execute applications on top of a small custom kernel. We have implemented the application kernel for the Linux operating system, which illustrates that the approach can be realized with fairly small resources. We also present an evaluation of the performance and complexity of our approach, where we show that it is possible to achieve good performance while at the same time keeping the implementation complexity low.
Simon Kågström, Håkan Grahn and Lars Lundberg.Automatic Low Overhead Program Instrumentation with the LOPI Framework, qIn proceedings of the 9th Workshop on Interaction between Compilers and Computer Architectures, San Francisco, CA, USA, February 13 2005
Abstract
Program instrumentation is an important technique for a different tasks such as performance measurements, debugging, and coverage analysis. Instrumentation, however, poses two important requirements to be useful: it must be easy to apply and it should perturb the application as little as possible. In this paper, we present the LOPI framework which provides a simple means to automatically instrument binary files with low perturbation. An evaluation of the LOPI framework with detailed measurements of seven SPEC CPU2000 benchmarks show that the it gives lower perturbation in terms of instructions executed and cache behavior than Dyninst. For example, a LOPI instrumented application executes on average 36\% more instructions, while a Dyninst instrumented application executes 49\% more instructions for a common performance-oriented instrumentation, than the uninstrumented application.
Simon Kågström, Håkan Grahn and Lars Lundberg.Cibyl - an Environment for Language Diversity on Mobile Devices, In proceedings of the Virtual Execution Environments (VEE), San Diego, USA, June 13--15 2007
Abstract
With an estimated installation base of around 1 billion units, the Java J2ME platform is one of the largest development targets available. For mobile devices, J2ME is often the only available environment. For the very large body of software written in C other languages, this means difficult and costly porting to another language to support J2ME devices. This paper presents the Cibyl programming environment which allows existing code written in C and other languages supported by GCC to be recompiled into Java bytecode and run with close to native Java performance on J2ME devices. Cibyl translates compiled MIPS binaries into Java bytecode. In contrast to other approaches, Cibyl supports the full C language, is based on unmodified standard tools, and does not rely on source code conversion. To achieve good performance, Cibyl employs extensions to the MIPS architecture to support low-overhead calls to native Java functionality and use knowledge of the MIPS ABI to avoid computing unused values and transfer unnecessary registers. An evaluation on multiple virtual machines shows that Cibyl achieves performance similar to native Java, with results ranging from a slowdown of around 2 to a speedup of over 9 depending on the JVM and the benchmark.
Simon Kågström, Håkan Grahn and Lars Lundberg.Optimizations in the Cibyl binary translator for J2ME devices, In INTERACT-12: Workshop on Interaction between Compilers and Computer Architectures, Salt Lake City, february 2008
Abstract
The Java J2ME platform is one of the largest software platforms available, and often the only available development platform for mobile phones, which is a problem when porting C or C++ applications. The Cibyl binary translator targets this problem, translating MIPS binaries into Java bytecode to run on J2ME devices. This paper presents the optimization framework used by Cibyl to provide compact and well-performing translated code. Cibyl optimizes expensive multiplications/divisions, floating point support, function co-location to Java methods and provides a peephole optimizer. The paper also evaluates Cibyl performance both in a real-world GPS navigation application where the optimizations increase display update frequency with around 15\% and a comparison against native Java and the NestedVM binary translator where we show that Cibyl can provide significant advantages for common code patterns.