Tech in T: depth + breadth‎ > ‎Math‎ > ‎Optimization‎ > ‎

Multi-armed Bandit

multi-armed bandit.

a gambler faces at a row of slot machines when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.
model the problem of managing research projects in a large organization, like a science foundation or a pharmaceutical company. Given its fixed budget, the problem is to allocate resources among the competing projects, whose properties are only partially known now but may be better understood as time passes.

clinical trials investigating the effects of different experimental treatments while minimizing patient losses,[2][3][6][7] and
adaptive routing efforts for minimizing delays in a network.

Gittins: Scheduling Problem: a machine which has to perform jobs and has a set time period, every hour or day for example, to finish each job in. The machine is given a reward value, based on finishing or not within the time period, and a probability value of whether it will finish or not for each job is calculated. The problem is "to decide which job to process next at each stage so as to maximize the total expected reward."