AerialVLN/CityNav数据集格式分析

约 1276 字大约 4 分钟

2026-03-15

AerialVLN

查看 train.json 的结构，通过脚本分析数据字段和类型。

分析json结构的脚本

import json
from pprint import pformat

def type_desc(v):
    """类型描述"""
    if isinstance(v, list):
        if len(v) == 0:
            return "list (empty)"
        return f"list[{type(v[0]).__name__}]"
    elif isinstance(v, dict):
        return "dict"
    else:
        return type(v).__name__

def short_example(v, max_len=200):
    """把示例截断，避免输出过长"""
    s = pformat(v, width=80, compact=True)
    if len(s) > max_len:
        s = s[:max_len] + " ... (截断)"
    return s

with open("train.json", "r", encoding="utf-8") as f:
    data = json.load(f)

episodes = data["episodes"]
ep = episodes[0]

print("最外层类型:", type(data).__name__)
print("顶层键:", list(data.keys()))
print("episodes类型:", type(episodes).__name__)
print("episode数量:", len(episodes))
print()

print("第一条 episode 的字段摘要：")
print("-" * 80)

for k, v in ep.items():
    print(f"字段名: {k}")
    print(f"字段类型: {type_desc(v)}")
    print(f"字段示例: {short_example(v)}")
    print("-" * 80)

整体结构

train.json
 └── episodes (list)
      ├── episode (dict)
      │     ├── episode_id
      │     ├── trajectory_id
      │     ├── scene_id
      │     ├── start_position
      │     ├── start_rotation
      │     ├── instruction
      │     ├── goals
      │     ├── reference_path
      │     └── actions
      └── ...

具体字段的类型和示例

最外层类型: dict
顶层键: ['episodes']
episodes类型: list
episode数量: 16386

第一条 episode 的字段摘要：
--------------------------------------------------------------------------------     
字段名: episode_id
字段类型: str
字段示例: '3018Q3ZVORO4Z811ZR054U1M3ODARH'
--------------------------------------------------------------------------------     
字段名: trajectory_id
字段类型: str
字段示例: '39KV3A5D2G5VV16NHAJOVU94299S7W'
--------------------------------------------------------------------------------     
字段名: scene_id
字段类型: int
字段示例: 5
--------------------------------------------------------------------------------     
字段名: start_position
字段类型: list[float]
字段示例: [66.29647064208984, -60.00279998779297, 0.21424497663974762]
--------------------------------------------------------------------------------     
字段名: start_rotation
字段类型: list[float]
字段示例: [0.9999738902366156, 0.0, 0.0, 0.007226260793033801]
--------------------------------------------------------------------------------     
字段名: instruction
字段类型: dict
字段示例: {'instruction_text': 'take off to first floor height and turn right by 180 '
                     'degrees then proceed. stop at bridge and turn right by '       
                     '90 degrees then ascend ... (截断)
--------------------------------------------------------------------------------     
字段名: goals
字段类型: list[dict]
字段示例: [{'position': [-85.99118098412254, -25.287750098101004, -21.785755023360252]}]
--------------------------------------------------------------------------------     
字段名: reference_path
字段类型: list[list]
字段示例: [[66.29647064208984, -60.00279998779297, 0.21424497663974762, 0, 0,        
  0.014452647371354266],
 [66.29647064208984, -60.00279998779297, -1.7857550233602524, 0, 0,
  0.014452647371354266],
 [66.296470642 ... (截断)
--------------------------------------------------------------------------------     
字段名: actions
字段类型: list[int]
字段示例: [4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 7, 1,
 1, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3,       
 3, 3, 7, 1, 1, 1, 7, 1, 2, 2, 2, 1, 3, 1, ... (截断)
--------------------------------------------------------------------------------

CityNav 是一个基于真实城市 3D 点云的 UAV 视觉语言导航数据集。数据集包含 32637 条自然语言导航描述与人类演示轨迹。与 AerialVLN 不同，CityNav 的数据分布在多个文件中。CityNav 将导航任务拆分为三个数据模块：导航轨迹（citynav JSON）、语言描述（processed_descriptions JSON）以及目标物体信息（objects JSON）。这些数据通过 ID 进行关联，共同构成完整的视觉语言导航任务。

分析哪些文件

根据 README

data/
├ citynav/
│   ├ citynav_train_seen.json
│   ├ citynav_val_seen.json
│   └ citynav_val_unseen.json
│
├ cityrefer/
│   ├ objects.json
│   └ processed_descriptions.json

CityNav 主要有三个代表 JSON 需要分析：

citynav_train_seen.json // 导航任务数据
processed_descriptions.json // 自然语言指令数据
objects.json // 目标物体信息

根据项目中的scripts/download_data.sh，找到数据集的链接为：https://www.dropbox.com/scl/fi/ekbogjn2ptxdde2gik6nx/data.tar.gz?rlkey=oq5smcqlbgc6do5mcowetj3mp&st=gx563bhw&dl=0。

File: citynav/citynav_val_seen.json
Top type: list
Length: 2498

Fields
------------------------------------------------------------
area                      str    example: 'birmingham'

block                     int    example: 1

object_ids                list[int]    example: [11]

ann_ids                   list[int]    example: [1]

descriptions              list[str]    example: ['The row of grayish brown houses on Leslie Road to the '
 'left of the gray hou ...

trajectory                list[list]    example: [[33.6426964733, 561.3063272661, 141.8241423345,
  0.7629737435, 0.1680965778, - ...

marker_positions          list[list]    example: [[378.8295684158, 465.7754565511, 17.5935184049]]

target_positions          list[list]    example: [[368.21875, 440.78125, 18.3899993896]]

total_score               float    example: 21.86

dist_marker_to_target     float    example: 27.16

split                     str    example: 'val_seen'

结构为：

citynav_val_seen.json
 └ episode (dict)
      ├ area
      ├ block
      ├ object_ids
      ├ ann_ids
      ├ descriptions
      ├ trajectory
      ├ marker_positions
      ├ target_positions
      ├ total_score
      ├ dist_marker_to_target
      └ split

字段例子：

{
 "area": "birmingham",
 "block": 1,
 "object_ids": [11],
 "ann_ids": [1],
 "descriptions": [
   "The row of grayish brown houses on Leslie Road..."
 ],
 "trajectory": [
   [x, y, z, roll, pitch, yaw],
   ...
 ],
 "marker_positions": [
   [x, y, z]
 ],
 "target_positions": [
   [x, y, z]
 ],
 "total_score": 21.86,
 "dist_marker_to_target": 27.16,
 "split": "val_seen"
}

2. processed_descriptions.json（语言标注）

File: cityrefer/processed_descriptions.json
Top type: dict
Number of top-level keys: 34
Example key: cambridge_block_3

Fields
------------------------------------------------------------
0                         list[dict]    example: [{'landmarks': ['Buckingham Room'],
  'surroundings': ['car park', 'similar buil ...
1                         list[dict]    example: [{'landmarks': ['School of Pythagoras'],        
  'surroundings': ['empty land', 'small ...
2                         list[dict]    example: [{'landmarks': ['Art Room building', 'Merton Hall'],
  'surroundings': [],
  'ta ...
...

这个文件是按地图 block 存语言的。

processed_descriptions.json真实结构为：

dict
 └ block_name
      └ ann_id
           └ list[dict]

字段例子：

{
  "landmarks": ["Buckingham Room"],
  "surroundings": ["car park", "similar building"],
  "target": "building"
}

3. objects.json（物体信息）

结构为：

objects.json
│
├ object_id
│    ├ contour
│    ├ descriptions
│    ├ dimension
│    ├ id
│    ├ map_name
│    ├ name
│    ├ object_type
│    └ position
│
├ object_id
│    └ ...
│
└ ...

字段例子：

'97': {'contour': [[367.95781014565546, 1486.5657059110895],
                    [364.929432446007, 1486.8588177821202],
                    [363.56177800100454, 1488.8128969223253],
                    [363.7571572074334, 1491.5486077186126],
                    [365.02712204922153, 1493.6003908158282],
                    [366.8832245102963, 1495.065950170982],
                    [369.7162230035158, 1495.1636541279922],
                    [371.5723254645907, 1493.6980947728384],
                    [371.7677046710196, 1490.962383976551],
                    [371.1815670517328, 1488.2266731802638],
                    [369.130085384229, 1486.9565217391305]],
        'descriptions': ['A red car in between a silver car and a white car, '
                         'with a blue car in front of it, in the parking lot '
                         'of the Cripps building and the squash court building',
                         'The dark red car between a white car and gray van in '
                         'the parking lot behind the Squash Court building. '
                         'There is a bush near the front of it.',
                         'A red car parked in the parking lot between the '
                         'Squash Court and Bin Brook with a grey vehicle to '
                         'one side and a white vehicle to the other.',
                         'In the parking lot between Bin Brook and Squash '
                         'Court, this red car is in the bottom half of the '
                         'bottom row, seventh car from the right.',
                         'The red car parked in the middle of the first '
                         'parking row, in the parking lot beside Bin Brook.',
                         'A red car between a white car and gray car in the '
                         'Buckingham Room parking lot.'],
        'dimension': [3.5625, 4.078125, 1.6800003051757812],
        'id': 97,
        'map_name': 'cambridge_block_3',
        'name': '',
        'object_type': 'Car',
        'position': [367.65625, 1490.8984375, 34.25]},

三个json的关系

文件	层级	作用
`citynav_val_seen.json`	navigation-level	导航任务
`processed_descriptions.json`	language-level	语言解析
`objects.json`	object-level	地图物体信息