PlatformTrack
=============


A cooperative control task where a group of `k` UAVs are connected together by a rigid frame
to form an overactuated platform. Each individual UAV, attached by a 2-DoF passive gimbal
joint, acts as a thrust generator. The goal for the agents is to
collectively control the platform to track a given reference trajectory
while orienting the platform towards a given direction.

## Observation

The observation is a `CompositeSpec` containing:

- `obs_self`` (1, \*): The state of each UAV observed by itself, containing its kinematic
information with the position being relative to the frame center, and an one-hot
identity indicating the UAV's index.
- `obs_others`` (k-1, \*): The observed states of other agents.
- `obs_frame`:
- `state_frame`: (1, \*): The state of the frame.
- `rpos` (3 * `future_traj_steps`): The relative position of the platform to the
reference positions in the future `future_traj_steps` time steps.
- `time_encoding` (optional): The time encoding, which is a 4-dimensional
vector encoding the current progress of the episode.

## Reward

- `reward_pose`: The reward for the pose error between the platform and
the reference (position and orientation).
- `reward_up`: The reward for the alignment of the platform's up vector and
the reference up vector.
- `reward_spin`: Reward computed from the spin of the drone to discourage spinning.
- `reward_effort`: Reward computed from the effort of the drone to optimize the
energy consumption.

The total reward is computed as follows:

```{math}
r = r_\text{pose} + r_\text{pose} * (r_\text{up} + r_\text{spin}) + r_\text{effort}
```

## Config

| Parameter               | Type  | Default       | Description |
|-------------------------|-------|---------------|-------------|
| `drone_model`           | str   | "hummingbird" |             |
| `num_drones`            | int   | 4             |             |
| `arm_length`            | float | 0.85          |             |
| `reset_thres`           | float | 0.5           |             |
| `future_traj_steps`     | int   | 4             |             |
| `reward_distance_scale` | float | 1.2           |             |
| `time_encoding`         | bool  | True          |             |