AerialVLN数据采集方式

约 598 字大约 2 分钟

2026-03-15

1. 轨迹采集

The output of the path generation step includes the multirotor’s pose trace (a series of time-stamped 6-DoF multirotor poses).

轨迹是在 AirSim + Unreal Engine 4 模拟器中由持有 AOPA 证书的无人机飞手手动操控生成的。系统记录无人机的时间序列 6DoF pose（位置 + 姿态），形成 ground truth trajectory。所以不是 rosbag，记录的是(x, y, z, pitch, roll, yaw)这种6DoF序列。

2. 视觉输入

At each step the simulator outputs an RGB image and a depth image.

运行任务时需要，但数据集本身不存储每个时间步的图像。智能体在运行时由 AirSim 模拟器实时生成 RGB 和 depth 图像作为视觉输入（来自前视图）。并且考虑到室外环境设置，允许深度传感器感知前方100米。

3. 语言标注

We show videos of drone flights and require annotators to give natural language commands.
each video is annotated three times by different annotators.

将飞行轨迹录制成视频并发布到 Amazon Mechanical Turk。标注者观看视频后编写自然语言导航指令。每条轨迹由三名不同标注者进行标注，并由另一组人员进行人工审核。所以每个trajectory有3条instruction。

4. 动作离散化

continuous paths are discretised into meta actions.

原始飞行轨迹是连续的 6DoF pose 序列。为了训练导航模型，作者将连续轨迹离散化为 8 个 meta-actions：

Move Forward  向前 5m 
Move Left     向左 5m  
Move Right    向右 5m  
Turn Left     左转 15° 
Turn Right    右转 15° 
Ascend        上升 2m  
Descend       下降 2m  
Stop          结束

例如连续飞行轨迹中的“向前飞一段距离”会被转换为多个 MoveForward 动作，每个动作对应固定的位移（如向前5m）。在数据集中的体现就是，先在airsim里面跑一遍连续的轨迹，即trajectory，包含若干航点，但是从JSON数据里面无法知道每两个 trajectory 航点之间的真实时间间隔，看不到时间戳。

The raw flying paths may have redundant motions, as manipulators sometimes need to look around to identify their positions and decide where to go. We remove such redundant motion for smoother ground truth trajectories. Then, the continuous paths are discretised into meta actions, such as “turn left” and “move forward” to enable training.

而且trajectory已经在论文中说道，是被重新采样过的，删除了冗余动作，然后离散化为meta actions。

贡献者

salt235