Running the RL Scripts#
Basic training scripts are provided to get started with the environments as well as reproducing the experiments presented in the paper.
The default and base configurations are placed at cfg/. The base config for training looks like
# hydra.job.chdir: false
headless: false
sim: ${task.sim}
env: ${task.env}
total_frames: 250_000_000
max_iters: -1
eval_interval: -1
save_interval: -1
seed: 0
viewer:
resolution: [960, 720]
eye: [8, 0., 6.]
lookat: [0., 0., 1.]
wandb:
group: ${oc.select:..task.name}
run_name: ${oc.select:..task.name,test}-${oc.select:..algo.name,none}
job_type: train
entity: marl-drones
project: omnidrones
mode: online # set to 'disabled' when debugging locally
run_id:
monitor_gym: True
tags:
defaults:
- task: Hover
- algo: mappo
- _self_
which, when executed, trains an agent for task=Hover using algo=mappo.
Here we give a few examples to demonstrate its usage.
# most commonly we want to train in the headless mode for sampling performance
python train.py task=Hover headless=true
python train.py task=InvPendulum/InvPendulumHover headless=true
# enable evaluation (which would record a video clip for visulization) during training
python train.py task=Hover total_frames=100_000_000 eval_interval=60 headless=true
# modifying task and algorithm configurations
python train.py \
task=Track task.reset_thres=0.6 \
algo=mappo algo.train_every=32 \
headless=true
# transform the action space
# e.g., transform the original action space into a multi-discrete space
# with each dimension quantized into 5 bins
python train.py task.action_transform=multidiscrete:5
# or use a velocity controller
python train.py task.action_transform=velocity