Add SMT/HT awareness to DragonFlyBSD scheduler
Short description: The aim of this project is to add awareness of SMT (Simultaneous multithreading/HyperThreads) to DragonFly scheduler so that scheduling on multithreaded CPUs (HyperThreading technology) improves.
Name: Mihai Carabas
Physical address: 10A-12 Merisor Street, Constanta, Romania
Phone number (include country and area code): +40730021550
In order to add support for the SMT to the scheduler, I have to establish answers for the following basic questions:
- which CPUs are siblings (share the same core)
- which is the status of each core
- where to schedule
Before I begin, I must get a strong understanding of the current scheduler implementation . Also I have to take a good look in the initialization of the CPU information that the kernel is holding. For this I will start and dig from the entry point of the kernel . As for documenting, a good start is the article  mentioned in your project proposal and the current linux implementation of the scheduler . Another place where I can look is at the ULE scheduler from FreeBSD .
I have found a technical report from 2005, written by James R. Bulpin, which describes the entire hyper-threading technology (from hardware to software) . Though is an old paper and there are current advanced SMT-aware schedulers, the theoretical base is the same and is a good place to understand well how I can reach my final goal. In this article, there is a section provinding different tests from where I can start developing the testing suite.
After a brief documentation I can start working on the following goals:
A. Detect how many threads have each core
- adding a generic way of detecting how many threads each core has (create a map of siblings for example). First, I will be looking into a simple extension where i would only want to differentiate between physical and logical cores. This can then be expanded so that I can take into account physical packages/CPUs in the system. I will also look in the FreeBSD ULE scheduler because it has knowledge about the nodes in the system.
- guard all new added information regarding HyperThreading (physical/logical CPU discovery). For this, I will add an option (Enable HT) in the config file .
- create a mechanism that allows to modify the behaviour of the scheduler, from runtime, using sysctl (enable the new HT added opperations versus the original scheduler).
B. Effective SMT aware scheduling task
In order to achieve the final goal, I have to make a design of how threads will get scheduled. I will have to treat different cases:
- "passive" scheduling - select a free physical CPU (if available) for the next thread to schedule
- "active" scheduling" - if a physical CPU becomes idle and on another CPU are running two threads, migrate one of them on the idle physical cpu
- what task to get from a list - get the one that was scheduled on that domain (physical CPU), no matter on what logical CPU has run before
- tasks should stick to the same domain (physical CPU)
In order to achieve all cases presented above, a shared runqueue per physical CPU is required to fulfill all the requirements, as stated in the paper presenting the linux implementation . But, in the first part of the project, I don't want to make changes to be too invasive. I will instead work with what it is now in usched_bsd4.c. After getting the basic idea working, I will try to made more radical changes, as stated above.
C. Testing and testing again
- benchmarks with different types of loads.
- intensive CPU long jobs
- much more CPU short jobs
- each benchmark will be run on the scheduler with and without HT support (using the sysctl option).
- generate some test-cases from the test scenarios presented in James Bulpin's article 
- another test scenario can be found here , where are compared sched_4bsd.c (HT-unaware) and sched_ule.c (HT-aware). Here they compile Apache2 with -j[2 | 3 |4] and they measure the overall time. Also they used synthetic tests using two utilities: stream and ubench. We can
- at this step, not only test the performance of the SMT aware scheduler, but discover any bugs introduced by this feature. Bugs will be easily noticeable, as they tend to cause a massive drop in performance under certain workloads or simply panic the system.
II. Project timeline
- search the current linux kernel for SMT techniques (design issues) and techniques for discovering number of threads per physical CPU
- search the FreeBSD ULE scheduler and see how are CPUs grouped and what decisions are taken in order to ensure the HT-aware scheduler
- study the build system in order to add new option to guard all new added information (Enable HT option)
- implement the mechanism of detecting how many threads each core has starting, by updating initialization CPU info
- make this information, through some structures, available to the kernel scheduler (here will be some design issues - eg: how the structure will look like)
- enable/disable HT from hardware to see if the information is consistent.
- make changes to the scheduler to be SMT aware in case of a "passive" scheduling (when a new thread is ready for scheduling, first look for an idle physical CPU). Or instead of looking where to schedule, another option is to look where not to schedule
- add a new sysctl option in order to disable/enable the future added above
- submit for review
- high workload benchmark to see if there are some perfomance changes after the new decision of scheduling was added in the last week
- make changes to the scheduler to be SMT aware in case of a "active" scheduling (if 2 threads are running on the same physical CPU and another CPU becomes idle, move one of the theads there)
- here will be introduced the problem of flip-flaping (one thread may spend more time moving around idle phyisical CPUs than executing).
- Mid-term evaluation with awareness of how many threads per core are and basic functionalities for a SMT(HT) aware scheduler (ONLY passive scheduling). The option added in last week may not have been finished and tested.
- high workload benchmark to see if there are some perfomance changes after the new decision of scheduling was added in the week6 ("active" scheduling).
- improving the scheduler's decision, as it must choose from the threads that ran on that physical CPU (no matter what logical CPU), not other threads that had run somewhere else
- implementing this decision, we will make use of the hot caches
- benchmarks to see where we stand with the perfomance after all the new features were added
- the threads should attempt to run everywhere on the physical CPU, not only on the logical CPU where was first planned to run.
- another new technique for HT-aware scheduler: when a thread is woken-up on a logical CPU, that is currently busy running other thread, and has a sibling in
- idle, the initial thread should be scheduled on the idle sibling, not somewhere else.
- high workload benchmark to see if there are some perfomance changes after the all new features added
- it is a posibility that not all the techniques presented above, may be a perfect match with the current scheduler, but all of this can only be determined
- through benchmarking
- here I can customize the sysctl option, to enable/disable certain option implemented for HT-awareness
- compare performance between the HT aware kernel and HT un-aware kernel
- submit for review.
- Final review.
- add features for SMT affinity - low power consumption (for example, on laptops you want ocupy first all the logical cores from the physical CPU and then go
- to other idle physical CPUs)
- implement any new scheduling ideas.
- porting any needed drivers.