The visualizations below are based on simulated DARTS cycles using data collected during our case study. Play with the different explore/exploit policies, parameters, and allocation strategies and see how target pool allocations change. Also observe the different reward (correct target) and regret (incorrect target) accumulation behaviors. Check out our technical overview for more information about how DARTS works under the hood.
Adaptive Module Parameters
Explore/Exploit Policies dictate the general behavior of how the Adaptive Module chooses to explore target pools it does not know much about or exploit the target pool it thinks is best.
- UCB1: relies on an upper confidence bound to quantify how uncertain the Adaptive Module is about a target pool.
- Bayes UCB: similar to UCB1, but changes how the upper confidence bound works to gain confidence in target pools more quickly.
- Epsilon-Greedy: balances exploration and exploitation with a simple heuristic: Explore sub-optimal target pools a fraction of the time equal to epsilon.
Bayes UCB Scale
Parameter applicable to the Bayes UCB policy. Controls the size of the upper confidence bound by choosing the number of standard deviations of the rewards earned by each target pool to take into account.
Parameter applicable to the Epsilon-Greedy policy. Higher values explore more often whereas lower values lean towards exploiting the best performing target pool.
Allocation Strategy Parameters
Greed Factor is a general factor that modifies the scores output by an explore/exploit policy. The higher the factor, the greedier any of the policies will become.
Dictates how targets are picked from the target pools.
- Round-Robin: picks one target at a time from each target pool, rotating through the pools until allocations have been exhausted. Fairest method.
- Greedy: picks all of the targets at one time from the target pool with the best performance. Biases top performing target pool.
- Altruist: picks all of the targets at one time from the target pool with the worst performance. Biases worst performing target pool.
Determines which targets should be prioritized when picking from a target pool.
- Best: if the target pool assigns scores to targets, picks the highest scoring targets first. If it does not, picks the first target in the list of targets.
- Worst: if the target pool assigns scores to targets, picks the lowest scoring targets first. If it does not, picks the last target in the list of targets.
- Random: picks a target from the target pool at random.