Dimitrios Los
I completed my PhD under the supervision of Prof Thomas Sauerwald at University of Cambridge, where I mostly worked on randomised algorithms for load balancing. I was fortunate to be a member of St John's college, which together with Cambridge Trust funded my PhD. I am currently a PostDoctoral fellow with George Giakkoupis at the Wide Team in INRIA, Rennes.
Contact details
If you think I could be of help, don't hesitate to contact me.
Email: [Javascript required]
Publications
- Naively sorting evolving data is optimal and robust [ FOCS 2024 ][ Abstract ][ arXiv ][ visualisations ]
In Proceedings of the 65th IEEE Symposium on Foundations of Computer Science (to appear)
We study comparison sorting in the evolving data model [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains close to the true order over time. Previous works have analyzed adaptations of classic sorting algorithms, assuming that an evolution step changes the rank of an item by just one, and that a fixed constant number $b$ of evolution steps take place between two comparisons. In fact, the only previous result achieving optimal $O(n)$ total deviation from the true order, where $n$ is the number of items, applies just for $b=1$ [BvDEGJ18].
We analyze a very simple sorting algorithm suggested in [M14], which samples a random pair of adjacent items in each step and swaps them if they are out of order. We show that the algorithm achieves and maintains, with high probability, optimal total deviation, $O(n)$, and optimal maximum deviation, $O(\log n)$, under very general model settings. Namely, the perturbation introduced by each evolution step is sampled from a general distribution of bounded moment generating function, and we just require that the *average* number of evolution steps between two sorting steps be bounded by an (arbitrary) constant, where the average is over a linear number of steps.
The key ingredients of our proof are a novel potential function argument that inserts "gaps" in the list of items, and a general analysis framework which separates the analysis of sorting from that of the evolution steps, and is applicable to a variety of settings for which previous approaches do not apply. Our results settle conjectures by [AKMU11] and [M14], and provide theoretical support for the empirical evidence that simple quadratic algorithms are optimal and robust for sorting evolving data [BvDEGJ18].
- An asymptotically optimal algorithm for generating bin cardinalities [ MATCOM ][ doi ][ arXiv ][ Abstract ]
In Mathematics and Computers in Simulation, vol. 228, February 2025, p. 147-155
In the balls-into-bins setting, $n$ balls are thrown uniformly at random into $n$ bins. The naïve way to generate the final load vector takes $\Theta(n)$ time. However, it is well-known that this load vector has with high probability bin cardinalities of size $\Theta(\frac{\log n}{\log \log n})$. Here, we present an algorithm in the RAM model that generates the bin cardinalities of the final load vector in the optimal $\Theta(\frac{\log n}{\log \log n})$ time in expectation and with high probability.
Further, the algorithm that we present is still optimal for any $m \in [n, n \log n]$ balls and can also be used as a building block to efficiently simulate more involved load balancing algorithms. In particular, for the $\TwoChoice$ algorithm, which samples two bins in each step and allocates to the least-loaded of the two, we obtain roughly a quadratic speed-up over the naïve simulation.
- Balanced allocations under incomplete information: New settings and techniques [ PhD Dissertation ][ doi-1 doi-2 ][ Abstract ][ visualisations ]
In the balanced allocations framework, there are \( m \) balls to be allocated into \( n \) bins with the aim of minimising the maximum load of any of the bins, or equivalently minimising the gap, i.e., the difference between the maximum load and the average load. In this dissertation, we focus on the heavily-loaded case where \( m \gg n \), which tends to be more challenging to analyse.
In a decentralised setting, the simplest process is \( \OneChoice \), which allocates each ball to a bin sampled uniformly at random. It is well-known that w.h.p. \( \Gap(m) = \Theta( \sqrt{(m/n) \cdot \log n } ) \) for any \( m \gg n \). A great improvement over this is the \( \TwoChoice \) process [ABKU99, KLM96], which allocates each ball to the least loaded of two bins sampled uniformly at random. Berenbrink, Czumaj, Steger, and Vöcking (2006) showed that w.h.p. \( \Gap(m) = \log_2 \log n + \Theta(1) \) for any \( m \geq n \). This improvement is known as the "power of two choices". It has found several applications in hashing, load balancing and routing; and its importance was recently recognised in the 2020 ACM Theory and Practice Award.
In this dissertation, we introduce a set of techniques based on potential functions. These enable us to analyse (both in terms of gap and load distribution) a wide range of processes and settings in the heavily-loaded case and to establish interesting insights in the balanced allocations framework:
We analyse variants of the \(\TwoChoice\) process which trade sample efficiency, completeness of information and gap guarantees. For the \( (1+\beta) \)-process which mixes \(\OneChoice\) and \(\TwoChoice\) with probability \( \beta \in (0, 1] \), we prove tight bounds for small and large \( \beta \), extending the results of Peres, Talwar and Wieder (2015). Another sample efficient family is that of \( \TwoThinning \) processes, which allocate to the two sampled bins in an online manner. For \( \TwoThinning \) processes that use as a decision function thresholds relative to the average load or thresholds in the rank domain, we establish tight bounds and also resolve a conjecture by Feldheim and Gurel-Gurevich (2021). We also quantify trade-offs for two-sample processes between the number of queries and the gap bound, establishing a "power of two queries" phenomenon.
We analyse the \( \TwoChoice \) process with random, adversarial and delay noise, proving tight bounds for various settings. In the adversarial setting, the adversary can decide in which of the two sampled bins the ball is allocated to, only when the two loads differ by at most \( g \). The analysis of this setting implies bounds for settings with random noise and delay
For the setting where load information is updated periodically every \( b \) steps, for \( b = n \) we tighten the bound of [BCEFN12] to \( \Theta(\log n/\log \log n ) \) and prove that \( \TwoChoice \) is optimal in this setting for any in \( [n \cdot \exp(-\log^c n), n \log n] \) for any constant \( c > 0 \). For \( b \in [n \log n, n^3] \), we show that \( \TwoChoice \) achieves w.h.p. a \( \Theta(b/n) \) gap, while surprisingly the \( (1+\beta) \)-process with appropriately chosen \( \beta \) achieves w.h.p. a \( \Theta( \sqrt{ (b/n) \cdot \log n } ) \) gap, which is optimal over a large family of processes. This proves that in the presence of outdated information, less aggressive strategies can outperform the greedy processes (such as \( \TwoChoice \)), which has been empirically observed in the queuing setting [D00, M00] for centralised processes since 2000, but to the best of our knowledge has not been formally proven.
Next we analyse \( \TwoChoice \) in the graphical setting, where bins are vertices of a graph and each ball is allocated to the lesser loaded of the vertices adjacent to a randomly sampled edge. We extend the results of Kenthapadi and Panigrahy (2006) proving that for dense expanders in the heavily-loaded case the gap is w.h.p. \( O(\log \log n) \). In the presence of weights, we make progress towards [Open Problem 1, PTW15] by proving that for graphs with conductance \( \phi \), the gap is w.h.p. \( Ο(\log n / \phi) \).
Further, we introduce and analyse processes which can allocate more than one balls to a sampled bin. We prove that these processes achieve w.h.p. an \( O(\log n) \) gap (which also applies for any \( d \)-regular graph), while still being more sample-efficient than \( \OneChoice\) ("power of filling").
For the \( \Memory \) process that can store bins in a cache, we generalise the \( O(\log \log n) \) gap bound by Mitzenmacher, Prabhakar and Shah (2002) to the heavily-loaded case and prove a matching lower bound. Further, in the presence of heterogeneous sampling distributions, we establish a striking difference between \( \TwoChoice \) (or even \( \DChoice \) with \( d = O(1) \) and \( \Memory \), showing that for the later the gap is bounded, while for the former it is known to diverge [W07] ("power of memory").
- Balanced allocations in batches: The tower of two choices [ SPAA 2023 ][ doi ][ arXiv ][ Abstract ][ visualisations ][ code ]
In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, p. 51-61
In the balanced allocations framework, the goal is to allocate \(m\) balls into \(n\) bins, so as to minimize the gap (difference of maximum to average load). The \(\OneChoice\) process allocates each ball to a randomly sampled bin and achieves w.h.p. a \(\Theta(\sqrt{(m/n) \cdot \log n})\) gap. The \(\TwoChoice\) process allocates to the least loaded of two randomly sampled bins, and achieves w.h.p. a \(\log_2 \log n + \Theta(1)\) gap. Finally, the \((1+\beta)\) process mixes between these two processes with probability \(\beta \in (0, 1)\), and achieves w.h.p. an \(\Theta(\log n/\beta)\) gap.
We focus on the outdated information setting of Berenbrink, Czumaj, Englert, Friedetzky and Nagel (2012), where balls are allocated in batches of size \(b\). For almost the entire range \(b \in [1,\mathcal{O}(n \log n)]\), it was shown in [LS22a] that \(\TwoChoice\) achieves w.h.p. the asymptotically optimal gap and for \(b = \Omega(n\log n)\) it was shown in [LS22b] that it achieves w.h.p. a \(\Theta(b/n)\) gap.
In this work, we establish that the \((1+\beta)\) process for appropriately chosen \(\beta\), achieves w.h.p. the asymptotically optimal gap of \(\Oh(\sqrt{(b/n) \cdot \log n})\) for any \(b \in [2n \log n, n^3]\). This not only proves the surprising phenomenon that allocating greedily based on \(\TwoChoice\) is not the best, but also that mixing two processes (\(\OneChoice\) and \(\TwoChoice\)) leads to a process with a gap that is better than both. Furthermore, the upper bound on the gap applies to a larger family of processes and continues to hold in the presence of weights sampled from distributions with bounded MGFs.
- Balanced allocations with heterogeneous bins: The power of memory [ SODA 2023 ][ doi ][ arXiv ][ Abstract ][ visualisations ]
In Proceedings of the 34th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 4448-4477
We consider the allocation of \(m\) balls (jobs) into \(n\) bins (servers). In the standard \(\TwoChoice\) process, at each step \(t=1,2,\ldots,m\) we first sample two bins uniformly at random and place a ball in the least loaded bin. It is well-known that for any \(m \geq n\), this results in a gap (difference between the maximum and average load) of \(\log_2 \log n + \Theta(1)\) (with high probability). In this work, we consider the memory process by Shah and Prabhakar (2002) where instead of two choices, we only sample one bin per step but we have access to a cache which can store the location of one bin. Mitzenmacher, Prabhakar and Shah (2002) showed that in the lightly loaded case (\(m = n\)), the memory process achieves a gap of \(\Oh(\log \log n)\).
Extending the setting of [MPS02] in two ways, we first allow the number of balls \(m\) to be arbitrary, which includes the challenging heavily loaded case where \(m \geq n\). Secondly, we follow the heterogeneous bins model of Wieder (2007), where the sampling distribution of bins can be biased up to some arbitrary multiplicative constant. Somewhat surprisingly, we prove that even in this setting, the memory process still achieves a \(\Oh(\log \log n)\) gap bound. This is in stark contrast with the \(\TwoChoice\) (or any \(d{\rm \text{-}C{\small HOICE}}\) with \(d=\Oh(1)\)) process, where it is known that the gap diverges [W07]. Finally, we prove a tight gap bound of \(\Oh(\log n)\) for memory in another relaxed setting with heterogeneous (weighted) balls and a cache which can only be maintained for two steps.
- Balanced allocations with the choice of noise [ PODC 2022 – Best Student Paper Award ][ doi ][ arXiv ][ Abstract ][ visualisations ][ code ]
In Proceedings of the 41st ACM Symposium on Principles of Distributed Computing, p. 164–175
We consider the allocation of \(m\) balls (jobs) into \(n\) bins (servers). In the standard \(\TwoChoice\) process, at each step \(t=1,2,\ldots,m\) we first sample two randomly chosen bins, compare their two loads and then place a ball in the least loaded bin. It is well-known that for any \(m \geq n\), this results in a gap (difference between the maximum and average load) of \(\log_2 \log n + \Theta(1)\) (with high probability).
In this work, we consider \(\TwoChoice\) in different models with noisy load comparisons. One key model involves an adaptive adversary whose power is limited by some threshold \(g \in \mathbb{N}\). In each round, such adversary can determine the result of any load comparison between two bins whose loads differ by at most \(g\), while if the load difference is greater than \(g\), the comparison is correct.
For this adversarial model, we first prove that for any \(m \geq n\) the gap is \(O(g+\log n)\) with high probability. Then through a refined analysis we prove that if \(g \leq \log n\), then for any \(m \geq n\) the gap is \(\mathcal{O}(\frac{g}{\log g} \cdot \log \log n)\). For constant values of \(g\), this generalizes the heavily loaded analysis of Berenbrink, Czumaj, Steger and Vöcking (2006) and of Talwar and Wieder (2014) for the \(\TwoChoice\) process, and establishes that asymptotically the same gap bound holds even if many (or possibly all) load comparisons among "similarly loaded" bins are wrong. Finally, we complement these upper bounds with tight lower bounds, which establishes an interesting phase transition on how the parameter \(g\) impacts the gap.
We also apply a similar analysis to other noise models, including ones where bins only update their load information with delay. For example, for the model of Berenbrink, Czumaj, Englert, Friedetzky and Nagel (2012), where balls are allocated in consecutive batches of size \(n\), we present an improved and tight gap bound of \(\Theta(\log n/ \log \log n )\).
- Balanced allocations in batches: Simplified and generalized [ SPAA 2022 ][ doi ][ arXiv ][ Abstract ][ visualisations ][ code ]
In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, p. 389–399
We consider the allocation of \(m\) balls (jobs) into \(n\) bins (servers). In the \(\TwoChoice\) process, for each of \(m\) sequentially arriving balls, two randomly chosen bins are sampled and the ball is placed in the least loaded bin. It is well-known that the maximum load is \(m/n+\log_2 \log n + \mathcal{O}(1)\) w.h.p.
Berenbrink, Czumaj, Englert, Friedetzky and Nagel (2012) introduced a parallel version of this process, where \(m\) balls arrive in consecutive batches of size \(b=n\) each. Balls within the same batch are allocated in parallel, using the load information of the bins at the beginning of the batch. They proved that the gap of this process is \(\mathcal{O}(\log n)\) with high probability.
In this work, we present a new analysis of this setting, which is based on exponential potential functions. This allows us to both simplify and generalize the analysis of [BCEFN12] in different ways:
- Our analysis covers a broad class of processes. This includes not only \(\TwoChoice\), but also processes with fewer bin samples like \((1+\beta)\), processes which can only receive one bit of information from each bin sample and graphical allocation, where bins correspond to vertices in a graph.
- Balls may be of different weights, as long as their weights are independent samples from a distribution satisfying a technical condition on its moment generating function.
- For arbitrary batch sizes \(b \geq n\), we prove a gap of \(\mathcal{O}(b/n \cdot \log n)\). For any \(b \in [n , n^3]\), we improve this to \(\mathcal{O}(b/n + \log n)\) and show that it is tight for a family of processes. This implies the unexpected result that for e.g. \((1+\beta)\) with constant \(\beta \in (0, 1]\), the gap is \(\Theta(\log n)\) for all \(b \in [n,n \log n]\). We also conduct experiments which support our theoretical results, and even hint at a superiority of less powerful processes like \((1+\beta)\) for large batch sizes.
- An improved drift theorem for balanced allocations [ TALG ][ doi ][ arXiv ][ Abstract ]
In ACM Transactions on Algorithms, vol. 20, iss. 4, p.35:1-35:39
Refines and extends the content on the drift theorem and applications in "Balanced Allocations in Batches: Simplified and Generalized".In the balanced allocations framework, there are \(m\) jobs (balls) to be allocated to \(n\) servers (bins). The goal is to minimize the gap, the difference between the maximum and the average load. Peres, Talwar and Wieder (RSA 2015) used the hyperbolic cosine potential function to analyze a large family of allocation processes including the \(1+\beta\)-process and graphical balanced allocations. The key ingredient was to prove that the potential drops in every step, i.e., a drift inequality.
In this work we improve the drift inequality so that (i) it is asymptotically tighter, (ii) it assumes weaker preconditions, (iii) it applies not only to processes allocating to more than one bin in a single step and (iv) to processes allocating a varying number of balls depending on the sampled bin. Our applications include the processes of (RSA 2015), but also several new processes, and we believe that our techniques may lead to further results in future work.
- Tight bounds for repeated balls-into-bins [ STACS 2023 ][ doi ][ arXiv ][ Abstract ][ visualisations ][ code ]
In Proceedings of the 40th International Symposium on Theoretical Aspects of Computer Science, p. 45:1-45:22
We study the repeated balls-into-bins process introduced by Becchetti, Clementi, Natale, Pasquale and Posta (2019). This process starts with \(m\) balls arbitrarily distributed across \(n\) bins. At each step \(t=1,2,\ldots\), we select one ball from each non-empty bin, and then place it into a bin chosen independently and uniformly at random.
We prove the following results:
- For any \(n \leq m \leq \mathrm{poly}(n)\), we prove a lower bound of \(\Omega(m/n \cdot \log n)\) on the maximum load. For the special case \(m=n\), this matches the upper bound of \(\mathcal{O}(\log n)\), as shown in [BCNPP19]. It also provides a positive answer to the conjecture in [BCNPP19] that for \(m=n\) the maximum load is \(\omega(\log n/ \log \log n)\) in a polynomially large window. For \(m \in [\omega(n), n \log n]\), our new lower bound disproves the conjecture in [BCNPP19] that the maximum load remains \(\mathcal{O}(\log n)\).
- For any \(n \leq m \leq \mathrm{poly}(n)\), we prove an upper bound of \(\mathcal{O}(m/n \cdot \log n)\) on the maximum load for a polynomially large window. This matches our lower bound up to multiplicative constants.
- For any \(m \geq n\), our analysis also implies an \(\mathcal{O}(m^2 / n)\) waiting time to a configuration with \(\mathcal{O}(m/n \cdot \log m)\) maximum load, even for worst-case initial distributions.
- For \(m \geq n\), we show that every ball visits every bin in \(\mathcal{O}(m \log m)\) steps. For \(m = n\), this improves the previous upper bound of \(\mathcal{O}(n \log^2 n)\) in [BCNPP19]. We also prove that the upper bound is tight up to multiplicative constants for any \(n \leq m \leq \mathrm{poly}(n)\).
- Brief announcement: Tight bounds for repeated balls-into-bins [ SPAA 2022 ][ doi ][ Abstract ]
In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, p. 419-421
We study the repeated balls-into-bins process introduced by Becchetti, Clementi, Natale, Pasquale and Posta (2019). This process starts with \(m\) balls arbitrarily distributed across \(n\) bins. At each step \(t=1,2,\ldots\), we select one ball from each non-empty bin, and then place it into a bin chosen independently and uniformly at random.
We prove the following results:
- For any \(n \leq m \leq \mathrm{poly}(n)\), we prove a lower bound of \(\Omega(m/n \cdot \log n)\) on the maximum load. For the special case \(m=n\), this matches the upper bound of \(\mathcal{O}(\log n)\), as shown in [BCNPP19]. It also provides a positive answer to the conjecture in [BCNPP19] that for \(m=n\) the maximum load is \(\omega(\log n/ \log \log n)\) in a polynomially large window. For \(m \in [\omega(n), n \log n]\), our new lower bound disproves the conjecture in [BCNPP19] that the maximum load remains \(\mathcal{O}(\log n)\).
- For any \(n \leq m \leq \mathrm{poly}(n)\), we prove an upper bound of \(\mathcal{O}(m/n \cdot \log n)\) on the maximum load for a polynomially large window. This matches our lower bound up to multiplicative constants.
- For any \(m \geq n\), our analysis also implies an \(\mathcal{O}(m^2 / n)\) waiting time to a configuration with \(\mathcal{O}(m/n \cdot \log m)\) maximum load, even for worst-case initial distributions.
- For \(m \geq n\), we show that every ball visits every bin in \(\mathcal{O}(m \log m)\) steps. For \(m = n\), this improves the previous upper bound of \(\mathcal{O}(n \log^2 n)\) in [BCNPP19]. We also prove that the upper bound is tight up to multiplicative constants for any \(n \leq m \leq \mathrm{poly}(n)\).
- Balanced allocations: Caching and packing, twinning and thinning [ SODA 2022 ][ doi ][ arXiv ][ Abstract ][ visualisations ]
In Proceedings of the 33rd Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1847–1874
We consider the sequential allocation of \(m\) balls (jobs) into \(n\) bins (servers) by allowing each ball to choose from some bins sampled uniformly at random. The goal is to maintain a small gap between the maximum load and the average load. In this paper, we present a general framework that allows us to analyze various allocation processes that slightly prefer allocating into underloaded, as opposed to overloaded bins. Our analysis covers several natural instances of processes, including:
- The \({\rm C{\small ACHING}}\) process (a.k.a. memory protocol) as studied by Mitzenmacher, Prabhakar and Shah (2002): At each round we only take one bin sample, but we also have access to a cache in which the most recently used bin is stored. We place the ball into the least loaded of the two.
- The \({\rm P{\small ACKING}}\) process: At each round we only take one bin sample. If the load is below some threshold (e.g., the average load), then we place as many balls until the threshold is reached; otherwise, we place only one ball.
- The \(\Twinning\) process: At each round, we only take one bin sample. If the load is below some threshold, then we place two balls; otherwise, we place only one ball.
- The \({\rm T{\small HINNING}}\) process as recently studied by Feldheim and Gurel-Gurevich (2021): At each round, we first take one bin sample. If its load is below some threshold, we place one ball; otherwise, we place one ball into a second bin sample.
As we demonstrate, our general framework implies for all these processes a gap of \(\mathcal{O}(\log n)\) between the maximum load and average load, even when an arbitrary number of balls \(m \geq n\) are allocated (heavily loaded case). Our analysis is inspired by a previous work of Peres, Talwar and Wieder (2010) for the \((1+\beta)\)-process, however here we rely on the interplay between different potential functions to prove stabilization.
- The power of filling in balanced allocations [ SIDMA ][ doi ][ arXiv ][ Abstract ]
In SIAM Journal of Discrete Mathematics, vol. 23, iss. 1, p. 529-565
Expands some of the results in "Balanced Allocations: Caching and Packing, Twinning and Thinning".It is well known that if \(m\) balls (jobs) are placed sequentially into \(n\) bins (servers) according to the \(\OneChoice\) protocol — choose a single bin in each round and allocate one ball to it — then, for \(m \gg n\), the gap between the maximum and average load diverges. Many refinements of the \(\OneChoice\) protocol have been studied that achieve a gap that remains bounded by a function of \(n\), for any \(m\). However most of these variations, such as \(\TwoChoice\), are less sample-efficient than \(\OneChoice\), in the sense that for each allocated ball more than one sample is needed (in expectation).
We introduce a new class of processes which are primarily characterized by "filling" underloaded bins. A prototypical example is the \(\rm P{\small ACKING}\) process: At each round we only take one bin sample, if the load is below the average load, then we place as many balls until the average load is reached; otherwise, we place only one ball. We prove that for any process in this class the gap between the maximum and average load is \(\mathcal{O}(\log n)\) for any number of balls \(m\). For the Packing process, we also prove a matching lower bound. We also prove that the \(\rm P{\small ACKING}\) process is more sample-efficient than \(\OneChoice\), that is, it allocates on average more than one ball per sample. Finally, we also demonstrate that the upper bound of \(\mathcal{O}(\log n)\) on the gap can be extended to the \({\rm C{\small ACHING}}\) process (a.k.a. memory protocol) studied by Mitzenmacher, Prabhakar and Shah (2002).
- Mean-biased processes for balanced allocations [ arXiv ][ Abstract ]
Expands some of the results in "Balanced Allocations: Caching and Packing, Twinning and Thinning".We introduce a new class of balanced allocation processes which bias towards underloaded bins (those with load below the mean load) either by skewing the probability by which a bin is chosen for an allocation (probability bias), or alternatively, by adding more balls to an underloaded bin (weight bias). A prototypical process satisfying the probability bias condition is \(\MeanThinning\): At each round, we sample one bin and if it is underloaded, we allocate one ball; otherwise, we allocate one ball to a second bin sample. Versions of this process have been in use since at least 1986. An example of a process, introduced by us, which satisfies the weight bias condition is \(\Twinning\): At each round, we only sample one bin. If the bin is underloaded, then we allocate two balls; otherwise, we allocate only one ball.
Our main result is that for any process with a probability or weight bias, with high probability the gap between maximum and minimum load is logarithmic in the number of bins. This result holds for any number of allocated balls (heavily loaded case), covers many natural processes that relax the \( \TwoChoice \) process, and we also prove it is tight for many such processes, including \( \MeanThinning \) and \( \Twinning \).
Our analysis employs a delicate interplay between linear, quadratic and exponential potential functions. It also hinges on a phenomenon we call "mean quantile stabilization", which holds in greater generality than our framework and may be of independent interest.
- Balanced allocations with incomplete information: The power of two queries [ ITCS 2022 ][ doi ][ arXiv ][ Abstract ][ visualisations ]
In Proceedings of the 13th Innovations in Theoretical Computer Science, p. 103:1–103:23
We consider the allocation of \(m\) balls into \(n\) bins with incomplete information. In the classical \(\TwoChoice\) process a ball first queries the load of two randomly chosen bins and is then placed in the least loaded bin. In our setting, each ball also samples two random bins but can only estimate a bin's load by sending binary queries of the form "Is the load at least the median?" or "Is the load at least 100?".
For the lightly loaded case \(m=O(n)\), Feldheim and Gurel-Gurevich (2021) showed that with one query it is possible to achieve a maximum load of \(O(\sqrt{\log n/\log \log n})\), and posed the question whether a maximum load of \(m/n+\mathcal{O}(\sqrt{\log n/\log \log n})\) is possible for any \(m = \Omega(n)\). In this work, we resolve this open problem by proving a lower bound of \(m/n+\Omega( \sqrt{\log n})\) for a fixed \(m=\Theta(n \sqrt{\log n})\), and a lower bound of \(m/n+\Omega(\log n/\log \log n)\) for some \(m\) depending on the used strategy. We complement this negative result by proving a positive result for multiple queries. In particular, we show that with only two binary queries per chosen bin, there is an oblivious strategy which ensures a maximum load of \(m/n+O(\sqrt{\log n})\) for any \(m \geq 1\). Further, for any number of \(k = \mathcal{O}(\log \log n)\) binary queries, the upper bound on the maximum load improves to \(m/n + \mathcal{O}(k(\log n)^{1/k})\) for any \(m \geq 1\).
Further, this result for \(k\) queries implies (i) new bounds for the \((1+\beta)\)-process introduced by Peres, Talwar and Wieder (2015), (ii) new bounds for the graphical balanced allocation process on dense expander graphs, and (iii) the bound of \(m/n+\mathcal{O}(\log \log n)\) on the maximum load achieved by the \(\TwoChoice\) process, including the heavily loaded case \(m=\Omega(n)\) derived by Berenbrink, Czumaj, Steger and Vöcking (2006). One novel aspect of our proofs is the use of multiple super-exponential potential functions, which might be of use in future work.
Some visualisations for the processes described in the papers above, can be found here.
Presentations/Talks
- Naively sorting evolving data is optimal and robust [ FOCS 2024 ][ slides ]
- Load balancing algorithms and evolving data [ Wide Team Seminar, INRIA ]
- On the exponential potential for analysing algorithms with dynamic data [ Department Seminar, University of Liverpool ]
- On the exponential potential for analysing algorithms with dynamic data [ Algorithms and Complexity Theory Seminar, University of Oxford ]
- Balanced allocations: A refined drift theorem [ WAND @ DISC 2023 ][ slides ]
- Balanced allocations in batches: The tower of two choices [ SPAA 2023 ][ slides ]
- Tight bounds for repeated balls-into-bins [ STACS 2023 ][ slides ]
- Balanced allocations with heterogeneous bins: The power of memory [ SODA 2023 ][ slides ]
- Balanced allocations with the choice of noise [ ART & TEA, University of Hamburg ]
- Balanced allocations with the choice of noise [ PODC 2022 ][ slides ]
- Brief announcement: Tight bounds for repeated balls-into-bins [ SPAA 2022 ][ slides ]
- Balanced allocations in batches: Simplified and generalized [ SPAA 2022 ][ slides ]
- Balanced allocations: Relaxing Two-Choice [ HALG 2022 ][ slides ][ poster ]
- Balanced allocations under incomplete information [ BCTCS 2022 ][ slides ]
- Balanced allocations using potential functions [ FATA Seminar, University of Glasgow ]
- Balanced allocations: Caching and packing, twinning and thinning [ SODA 2022 ][ slides ]
- Balanced allocations with incomplete information: The power of two queries [ ITCS 2022 ][ slides ]
The animations on some of the slides run only on some pdf viewers (e.g. Adobe Reader or Okular).
Part II projects
During my PhD I co-supervised the following Part II projects:
- A comparison of approaches for the capacitated vehicle routing problem (by K. N. Ganesh)
- Intelligent composer tool (by P. Kram)
- Reinforcement learning meets balls into bins (by A. Vári-Kakas)
- Investigating algorithms for grouping people to maximise new interactions (by J. Koh)
Supervisions
Detailed material for the supervisions can be accessed here (Raven access required). Below is a list of the handouts:
- Foundations of computer science:
- Example sheets: S1 (Es, ESs), S2 (Es, ESs), S3 (Es, ESs)
- Extras: Projects, PPs, T1, T2, T3, T4, T5, T6
- Discrete mathematics:
- Algorithms:
- Example sheets: S1 (Qs, Es, QSs, ESs, Cs), S2 (Qs, Es, F, QSs, ESs, Cs), S3 (Qs, Es, F), S4 (Qs, Es, F, QSs, ESs, Cs), S5 (Qs, Es, F, QSs, ESs, Cs), S6 (Qs, Es, F, QSs, ESs)
- Extras: PPs, T1
- Introduction to probability:
- Data science:
- Computation theory:
- Complexity theory:
- Extras: PPs
- Formal models of language:
- Bioinformatics:
- Randomised algorithms: