LightRL API Reference¶
Welcome to the detailed API reference for LightRL. Below you'll find documentation for the key classes and functions available in the library, complete with usage guidelines and examples.
Bandits Module¶
LightRL includes a variety of bandit algorithms, each tailored for specific use cases in reinforcement learning environments. The following classes are part of the lightrl.bandits
module:
Base Bandit Class¶
Bandit
: The foundational class for all bandit algorithms. Subclasses provide specialized implementations.
Bases: ABC
Source code in lightrl/bandits.py
__init__(arms)
¶
Initialize a Bandit with a specified number of arms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
Source code in lightrl/bandits.py
__repr__()
¶
String representation of the Bandit object.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
String representation of the Bandit, showing its arms. |
report()
¶
Print a report of the average rewards (Q-values) and selection counts for each arm.
Source code in lightrl/bandits.py
select_arm()
abstractmethod
¶
Abstract method to select the next arm to be used.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
update(arm_index, reward)
¶
Update the value estimates for a given arm based on the reward received.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arm_index
|
int
|
Index of the arm that was selected. |
required |
reward
|
float
|
Reward received after selecting the arm. |
required |
Source code in lightrl/bandits.py
Epsilon-Based Bandits¶
These bandits use epsilon strategies to balance exploration and exploitation.
EpsilonGreedyBandit
: Implements an epsilon-greedy algorithm, allowing for a tunable exploration rate.
Bases: Bandit
Source code in lightrl/bandits.py
__init__(arms, epsilon=0.1)
¶
Initialize an EpsilonGreedyBandit with a specified number of arms and an exploration probability.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
epsilon
|
float
|
The probability of choosing a random arm for exploration. Defaults to 0.1. |
0.1
|
Source code in lightrl/bandits.py
select_arm()
¶
Select an arm to use based on the epsilon-greedy strategy.
This method uses exploration with probability 'epsilon' and exploitation otherwise, selecting the arm with the highest estimated value.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
Source code in lightrl/bandits.py
EpsilonFirstBandit
: Prioritizes exploration for a set number of initial steps before switching to exploitation.
Bases: Bandit
Source code in lightrl/bandits.py
__init__(arms, exploration_steps=100, epsilon=0.1)
¶
Initialize an EpsilonFirstBandit with a specified number of arms, exploration steps, and exploration probability.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
exploration_steps
|
int
|
The number of initial steps to purely explore. Defaults to 100. |
100
|
epsilon
|
float
|
The probability of choosing a random arm during the exploration phase. Defaults to 0.1. |
0.1
|
Source code in lightrl/bandits.py
select_arm()
¶
Select an arm to use based on the epsilon-first strategy.
This method uses pure exploration for a defined number of initial steps and then follows an epsilon-greedy strategy thereafter.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
Source code in lightrl/bandits.py
EpsilonDecreasingBandit
: Uses a decreasing epsilon value over time to reduce exploration as understanding improves.
Bases: Bandit
Source code in lightrl/bandits.py
__init__(arms, initial_epsilon=1.0, limit_epsilon=0.1, half_decay_steps=100)
¶
Initialize an EpsilonDecreasingBandit with a specified number of arms and epsilon parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
initial_epsilon
|
float
|
The initial exploration probability. Defaults to 1.0. |
1.0
|
limit_epsilon
|
float
|
The minimum limit for the exploration probability. Defaults to 0.1. |
0.1
|
half_decay_steps
|
int
|
The number of steps at which the exploration probability is reduced
to half of the difference between |
100
|
Source code in lightrl/bandits.py
select_arm()
¶
Select an arm to use based on the epsilon-decreasing strategy.
This method adjusts the exploration probability over time and selects an arm accordingly.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
Source code in lightrl/bandits.py
update_epsilon()
¶
Update the exploration probability epsilon
based on the current step.
The exploration probability decays towards the limit probability over time, according to a half-life decay model.
Source code in lightrl/bandits.py
Other Bandit Strategies¶
UCB1Bandit
: Employs the UCB1 algorithm, focusing on arm pulls with calculated confidence bounds.
Bases: Bandit
Source code in lightrl/bandits.py
__init__(arms)
¶
Initialize a UCB1Bandit with a specified number of arms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
Source code in lightrl/bandits.py
select_arm()
¶
Select an arm to use based on the Upper Confidence Bound (UCB1) strategy.
This method selects an arm that maximizes the UCB estimate, accounting for exploration and exploitation.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
Source code in lightrl/bandits.py
update(arm_index, reward)
¶
Update the value estimates for a given arm based on the reward received and increment the total count.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arm_index
|
int
|
Index of the arm that was selected. |
required |
reward
|
float
|
Reward received after selecting the arm. Must be in the range [0, 1]. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the reward is not within the range [0, 1]. |
Source code in lightrl/bandits.py
GreedyBanditWithHistory
: A variant that uses historical performance data to adjust its greedy selection strategy.
Bases: Bandit
Source code in lightrl/bandits.py
__init__(arms, history_length=100)
¶
Initialize a GreedyBanditWithHistory with a specified number of arms and history length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms
|
List[Any]
|
A list representing different arms or tasks that the Bandit can choose from. |
required |
history_length
|
int
|
The maximum length of history to maintain for each arm's rewards. Defaults to 100. |
100
|
Source code in lightrl/bandits.py
select_arm()
¶
Select an arm to use based on the greedy strategy with bounded history.
This method ensures that each arm's history reaches the defined length before purely exploiting.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The index of the selected arm. |
Source code in lightrl/bandits.py
update(arm_index, reward)
¶
Update the value estimates for a given arm based on the reward received and update its history.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arm_index
|
int
|
Index of the arm that was selected. |
required |
reward
|
float
|
Reward received after selecting the arm. |
required |
Source code in lightrl/bandits.py
Runners Module¶
two_state_time_dependent_process
: The function two_state_time_dependent_process()
takes the bandit and keeps two states. The ALIVE
state and WAITING
state, the bandit is switching between those two states in order to probe the rewards (tasks per seconds multiplied by reward_factor
). In WAITING
state we can select lower number of tasks to process (waiting_args
).
Execute a two-state time-dependent process with a bandit decision-maker.
This function simulates a process which alternates between an "ALIVE" state and a "WAITING" state based on the performance of a given task in relation to a failure threshold. It updates the bandit model with rewards calculated from successful tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bandit
|
An object with methods |
required | |
fun
|
Callable[..., Tuple[float, float]]
|
A function to be called with the current arm's arguments. Should return a tuple containing the number of successful and failed tasks. |
required |
failure_threshold
|
float
|
A float to determine what fraction of tasks fails that triggers a switch to the "WAITING" state. |
0.1
|
default_wait_time
|
float
|
The base wait time in seconds between task executions in the "ALIVE" state. |
5
|
extra_wait_time
|
float
|
Additional wait time in seconds to be added in the "WAITING" state. |
10
|
waiting_args
|
Optional[Union[Tuple, List]]
|
Arguments to be used when calling |
None
|
max_steps
|
int
|
Maximum number of iterations/steps to be performed. |
500
|
verbose
|
bool
|
If True, prints additional detailed logs and progress via tqdm. |
False
|
reward_factor
|
float
|
A scaling factor to adjust the magnitude of the reward computed. |
1e-06
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
Source code in lightrl/runners.py
If you have any questions or require further assistance, feel free to open an issue.