Probabilistic Models for the Execution Time of Individual Tasks in Stochastic Scheduling
June 7th 2020
The execution time of programs is a key element in many areas of computer science, mainly those where achieving good performance (e.g., scheduling in cloud computing) or a predictable one (e.g., meeting deadlines in embedded systems) is the objective. Despite being random variables, execution times are most often treated as deterministic in the literature, with few works taking advantage of their randomness; even in those, the underlying distributions are assumed as being normal or uniform for no particular reason. In this work we investigate these distributions in various machines and algorithms. A mathematical problem arises when dealing with samples whose populational minimum is unknown, so a significant portion of this monograph is dedicated to such problem. We propose several different effective or computationally cheap ways to overcome the problem, which also apply to execution times. These methods are tested experimentally, and results point to the superiority of our proposed inference methods. We demonstrate the existence of execution time distributions with long tails, and also conclude that two particular probability distributions were the most suitable for modelling all execution times. While we do not discuss direct applications to stochastic scheduling, we hope to promote the usage of probabilistic execution times to yield better results in, for example, task scheduling.
In this project we investigate the underlying distribution of execution times of programs, attempting to reason about:
The main hindrance we found was performing maximum likelihood estimation over variables whose ``location'' (or populational minimum) is large and unknown, which we have thoroughly investigated and thus far proposed six inference methods that are discussed and justified in the thesis. On top of that, experiments were performed to test the capacity of a specified set of distribution families to fit samples of execution times obtained experimentally. Results demonstrate the existence of execution time distributions with heavy tails on either side, which could indeed have significant impact in scheduling decisions. We show that the exponentiated Weibull and the odd log-logistic generalized gamma distributions achieve very good performance when fitting samples of execution times.
The experimental data can be found here. File naming is on the format
[algorithm]-[machine]-[problem-size1]-...-[problem-sizeN].txt, and the file has 1000 samples for each of these problem
sizes, in the same order that they appear in the file name.
dijkstra_helix_500K_1M_2M_10M.txt refers to the Dijkstra program ran on the Helix machine with problem
sizes 500K, 1M, 2M and 10M; the first 1000 samples refer to the 500K problem size, the next 1000 samples to the 1M problem size,
and so on.
|Andromeda||AMD FX(tm)-8350 Eight-Core Processor||4x Corsair 8GB DIMM DDR3 Synchronous 800 MHz||Seagate HD 2TB ST2000DM001-1ER1||Gigabyte 970A-D3|
|HalleyHD||Intel Core i7-4790 CPU 3.60GHz||4x AMI 8GB DIMM DDR3 Synchronous 1600 MHz||Seagate HD 2TB ST2000DM001-1CH1||Gigabyte Z97X-SLI-CF|
|HalleySSD||Intel Core i7-4790 CPU 3.60GHz||4x AMI 8GB DIMM DDR3 Synchronous 1600 MHz||Kingston SSD 240GB SA400S3||Gigabyte Z97X-SLI-CF|
|Helix||Intel Core i5-4440 CPU 3.10GHz||4x Kingston 4GB DIMM DDR3 Synchronous 1333 MHz||Kingston SSD 240GB SA400S3||Gigabyte Z87-D3HP-CF|