Running the RL Scripts#

Basic training scripts are provided to get started with the environments as well as reproducing the experiments presented in the paper.

Note

We use Hydra for configuration and WandB for experiment tracking.

The default and base configurations are placed at cfg/. The base config for training looks like

# hydra.job.chdir: false

headless: false

sim: ${task.sim}
env: ${task.env}

total_frames: 250_000_000
max_iters: -1
eval_interval: -1
save_interval: -1
seed: 0

viewer:
  resolution: [960, 720]
  eye: [8, 0., 6.]
  lookat: [0., 0., 1.]

wandb:
  group: ${oc.select:..task.name}
  run_name: ${oc.select:..task.name,test}-${oc.select:..algo.name,none}
  job_type: train
  entity: marl-drones
  project: omnidrones
  mode: online # set to 'disabled' when debugging locally
  run_id: 
  monitor_gym: True
  tags: 

defaults:
  - task: Hover
  - algo: mappo
  - _self_

which, when executed, trains an agent for task=Hover using algo=mappo.

Here we give a few examples to demonstrate its usage.

# most commonly we want to train in the headless mode for sampling performance
python train.py task=Hover headless=true
python train.py task=InvPendulum/InvPendulumHover headless=true

# enable evaluation (which would record a video clip for visulization) during training
python train.py task=Hover total_frames=100_000_000 eval_interval=60 headless=true

# modifying task and algorithm configurations
python train.py \
    task=Track task.reset_thres=0.6 \
    algo=mappo algo.train_every=32 \
    headless=true

# transform the action space
# e.g., transform the original action space into a multi-discrete space
# with each dimension quantized into 5 bins
python train.py task.action_transform=multidiscrete:5
# or use a velocity controller
python train.py task.action_transform=velocity