Add SMT/HT awareness to DragonFlyBSD scheduler

Mihai Carabas

Short description: The aim of this project is to add awareness of SMT (Simultaneous multithreading/HyperThreads) to DragonFly scheduler so that scheduling on multithreaded CPUs (HyperThreading technology) improves.

Name: Mihai Carabas

Email: mihai.carabas@gmail.com

Physical address: 10A-12 Merisor Street, Constanta, Romania

Phone number (include country and area code): +40730021550

 

I. Description:

In order to add support for the SMT to the scheduler, I have to establish answers for the following basic questions:

  1. which CPUs are siblings (share the same core)
  2. which is the status of each core
  3. where to schedule

Before I begin, I must get a strong understanding of the current scheduler implementation [1]. Also I have to take a good look in the initialization of the CPU information that the kernel is holding. For this I will start and dig from the entry point of the kernel [2][6]. As for documenting, a good start is the article [3] mentioned in your project proposal and the current linux implementation of the scheduler [4]. Another place where I can look is at the ULE scheduler from FreeBSD [8][9].

I have found a technical report from 2005, written by James R. Bulpin, which describes the entire hyper-threading technology (from hardware to software) [7]. Though is an old paper and there are current advanced SMT-aware schedulers, the theoretical base is the same and is a good place to understand well how I can reach my final goal. In this article, there is a section provinding different tests from where I can start developing the testing suite. 

After a brief documentation I can start working on the following goals:

A. Detect how many threads have each core

  • adding a generic way of detecting how many threads each core has (create a map of siblings for example). First, I will be looking into a simple extension where i would only want to differentiate between physical and logical cores. This can then be expanded so that I can take into account physical packages/CPUs in the system. I will also look in the FreeBSD ULE scheduler[8] because it has knowledge about the nodes in the system.
  • guard all new added information regarding HyperThreading (physical/logical CPU discovery). For this, I will add an option (Enable HT) in the config file [5].
  • create a mechanism that allows to modify the behaviour of the scheduler, from runtime, using sysctl (enable the new HT added opperations versus the original scheduler).

B. Effective SMT aware scheduling task

In order to achieve the final goal, I have to make a design of how threads will get scheduled. I will have to treat different cases:

  • "passive" scheduling - select a free physical CPU (if available) for the next thread to schedule
  • "active" scheduling" - if a physical CPU becomes idle and on another CPU are running two threads, migrate one of them on the idle physical cpu
  • what task to get from a list - get the one that was scheduled on that domain (physical CPU), no matter on what logical CPU has run before
  • tasks should stick to the same domain (physical CPU)

In order to achieve all cases presented above, a shared runqueue per physical CPU is required to fulfill all the requirements, as stated in the paper presenting the linux implementation [3]. But, in the first part of the project, I don't want to make changes to be too invasive. I will instead work with what it is now in usched_bsd4.c. After getting the basic idea working, I will try to made more radical changes, as stated above.

C. Testing and testing again

  • benchmarks with different types of loads.
    • intensive CPU long jobs
    • much more CPU short jobs
  • each benchmark will be run on the scheduler with and without HT support (using the sysctl option).
  • generate some test-cases from the test scenarios presented in James Bulpin's article [7]
  • another test scenario can be found here [10], where are compared sched_4bsd.c (HT-unaware) and sched_ule.c (HT-aware). Here they compile Apache2 with -j[2 | 3 |4] and they measure the overall time. Also they used synthetic tests using two utilities:  stream and ubench. We can
  • at this step, not only test the performance of the SMT aware scheduler, but discover any bugs introduced by this feature. Bugs will be easily noticeable, as they tend to cause a massive drop in performance under certain workloads or simply panic the system.
 

II. Project timeline


Note: I will finish this semester on 10th of June, but int the last three weeks is the session exam and I have no classes. There will be no problem for working as a full time job. Of course, I am willing to start working in advance (especial on documenting and make comfortable with the DragonFly kernel environment) because I always preffer finishing in advance all my work.

Before:
- accommodate with the developing platform and the kernel code

Week 1-2:
  • search the current linux kernel for SMT techniques (design issues) and techniques for discovering number of threads per physical CPU
  • search the FreeBSD ULE scheduler and see how are CPUs grouped and what decisions are taken in order to ensure the HT-aware scheduler
  • study the build system in order to add new option to guard all new added information (Enable HT option)

Week 3-4:
  • implement the mechanism of detecting how many threads each core has starting, by updating initialization CPU info
  • make this information, through some structures, available to the kernel scheduler (here will be some design issues - eg: how the structure will look like)
  • enable/disable HT from hardware to see if the information is consistent.

Week 5:
  • make changes to the scheduler to be SMT aware in case of a "passive" scheduling (when a new thread is ready for scheduling, first look for an idle physical CPU). Or instead of looking where to schedule, another option is to look where not to schedule
  • add a new sysctl option in order to disable/enable the future added above
  • submit for review

Week 6:
  • high workload benchmark to see if there are some perfomance changes after the new decision of scheduling was added in the last week
  • make changes to the scheduler to be SMT aware in case of a "active" scheduling (if 2 threads are running on the same physical CPU and another CPU becomes idle, move one of the theads there)
  • here will be introduced the problem of flip-flaping (one thread may spend more time moving around idle phyisical CPUs than executing).

Week7 - Mid-term evaluation:
  • Mid-term evaluation with awareness of how many threads per core are and basic functionalities for a SMT(HT) aware scheduler (ONLY passive scheduling). The option added in last week may not have been finished and tested.

Week8:
  • high workload benchmark to see if there are some perfomance changes after the new decision of scheduling was added in the week6 ("active" scheduling).
  • improving the scheduler's decision, as it must choose from the threads that ran on that physical CPU (no matter what logical CPU), not other threads that had run somewhere else
  • implementing this decision, we will make use of the hot caches

Week9:
  • benchmarks to see where we stand with the perfomance after all the new features were added
  • the threads should attempt to run everywhere on the physical CPU, not only on the logical CPU where was first planned to run.

Week10:
  • another new technique for HT-aware scheduler: when a thread is woken-up on a logical CPU, that is currently busy running other thread, and has a sibling in 
  • idle, the initial thread should be scheduled on the idle sibling, not somewhere else.

Week11-12:
  • high workload benchmark to see if there are some perfomance changes after the all new features added
  • it is a posibility that not all the techniques presented above, may be a perfect match with the current scheduler, but all of this can only be determined 
  • through benchmarking
  • here I can customize the sysctl option, to enable/disable certain option implemented for HT-awareness
  • compare performance between the HT aware kernel and HT un-aware kernel
  • submit for review.

Week13:
  • Final review.

All the benchmarking will be done with the synthetic utilities that was mentioned above and with real life workload (compiling sources using different number of concurrent processes; multiple "openssl speed" that stress the CPU cores).

Future work (after GSoC program):
  • add features for SMT affinity - low power consumption (for example, on laptops you want ocupy first all the logical cores from the physical CPU and then go 
  • to other idle physical CPUs)
  • implement any new scheduling ideas.
  • porting any needed drivers.
Deliverables: For both, on mid-term and on final evaluation, I will have the following ready: a diff patch with the modifications made to the kernel, a document describing what have I done and how, what problems I had. Also here I will include the benchmarks for each step of my implementation. This document will be updated weekly, so to cover all my work (failures and successes).

Resources: I’m currently running DragonFlyBSD on my laptop, on VMware, but after the first 2 weeks I’ll have my desktop available - for better performance. I currently don't have a machine with minimum 2 CPUs that have HyperThreading technology. I will try to get access to one machine from our faculty cluster for testing purposes.
As an editor, when it comes to C programming (especially kernel programming), I always prefer using vim with cscope/ctags support code browsing.

III. About me

I am a frist year student at network-security master program from Politehnica University of Bucharest, Computer Science and Engineering Department. As a 
research project, I am working at a microkernel from L4 family (the research is done only during the school time, as part of the master program). The goal is 
to create a High Available system, but we are currently adding different mechanisms and services that we need, to the microkernel, in order to accomplish our objective (eg. porting a MMC driver and a FAT filesystem from u-boot, in order to be able to store the checkpoints on a permanent memory location, add a feature to start at runtime different services [for now the services are included from the build step only], etc). I graduated bachelor also at Politehnica University of Bucharest. My favourite courses were regarding the operating systems design. In one of the classes we studied the interface to the kernel (how to use system calls) and have done the following projects in user-space (with POSIX and Win32 API): mini-shell, swapper&demmand pager (using virtual memory mechanisms), a client-server protocol using IPC and a generic monitor (using locking mechanisms from linux). We also had a class where we had studied operating systems internals and had some projects that implied kernel programming: driver for a serial port (UART16550), a minimal software RAID, a filesystem using linux VFS, syscall interceptor (that monitors and logs data about the calling processes), a new protocol for network and transport layer 
based on datagrams (the protocol have the role of a transport layer protocol but it was implemented at layer 3, over the data link layer. All these projects [11] were finished before deadline with a maximum score. There will be no problem respecting the deadlines.
Also on Bachelor I had a class that studied computer system architecture. Here we had to port the Viola-Jones Face Detection algorithm on CellBE and make use of the Cell architecture to improve perfomance (a SIMD implementation). I managed to get a running time five times lower than the original algorithm (there 
was a PPU and 8 SPUs - only the SPUs were doing calculations, the PPU was collecting the results and put them together).
Compilers are another interesting domain I have studied. I had to do a mini-compiler for COOL language (Classroom Object Oriented Language)[12]. Here I did syntactic and semantic analysis. Then, based on the AST (Abstract Syntax Tree), I generated intermediary code. On this code I applied different optimizations (finding dead variables, constant folding, etc).

As you can see, I worked on various platforms and I am able to adapt to changes. I worked on different flavours of kernels (microkernel and monolithic 
kernel). There will not be a problem adapting to DragonFlyBSD kernel programming. For some code samples, please look at my kernel programming projects [11].


Language skills:
- Good knowledge of C, Java, Python and Bash shell scripting.
- Good network administration skills (BGP, OSPF, STP, VLAN (dot1q), Apache2, Bind9, Postfix, MySQL, LDAP, tc - HTB)
- Basic knowledge of Scheme