AerialVLN/CityNav数据集格式分析
AerialVLN
查看 train.json 的结构,通过脚本分析数据字段和类型。
分析json结构的脚本
import json
from pprint import pformat
def type_desc(v):
"""类型描述"""
if isinstance(v, list):
if len(v) == 0:
return "list (empty)"
return f"list[{type(v[0]).__name__}]"
elif isinstance(v, dict):
return "dict"
else:
return type(v).__name__
def short_example(v, max_len=200):
"""把示例截断,避免输出过长"""
s = pformat(v, width=80, compact=True)
if len(s) > max_len:
s = s[:max_len] + " ... (截断)"
return s
with open("train.json", "r", encoding="utf-8") as f:
data = json.load(f)
episodes = data["episodes"]
ep = episodes[0]
print("最外层类型:", type(data).__name__)
print("顶层键:", list(data.keys()))
print("episodes类型:", type(episodes).__name__)
print("episode数量:", len(episodes))
print()
print("第一条 episode 的字段摘要:")
print("-" * 80)
for k, v in ep.items():
print(f"字段名: {k}")
print(f"字段类型: {type_desc(v)}")
print(f"字段示例: {short_example(v)}")
print("-" * 80)整体结构
train.json
└── episodes (list)
├── episode (dict)
│ ├── episode_id
│ ├── trajectory_id
│ ├── scene_id
│ ├── start_position
│ ├── start_rotation
│ ├── instruction
│ ├── goals
│ ├── reference_path
│ └── actions
└── ...具体字段的类型和示例
最外层类型: dict
顶层键: ['episodes']
episodes类型: list
episode数量: 16386
第一条 episode 的字段摘要:
--------------------------------------------------------------------------------
字段名: episode_id
字段类型: str
字段示例: '3018Q3ZVORO4Z811ZR054U1M3ODARH'
--------------------------------------------------------------------------------
字段名: trajectory_id
字段类型: str
字段示例: '39KV3A5D2G5VV16NHAJOVU94299S7W'
--------------------------------------------------------------------------------
字段名: scene_id
字段类型: int
字段示例: 5
--------------------------------------------------------------------------------
字段名: start_position
字段类型: list[float]
字段示例: [66.29647064208984, -60.00279998779297, 0.21424497663974762]
--------------------------------------------------------------------------------
字段名: start_rotation
字段类型: list[float]
字段示例: [0.9999738902366156, 0.0, 0.0, 0.007226260793033801]
--------------------------------------------------------------------------------
字段名: instruction
字段类型: dict
字段示例: {'instruction_text': 'take off to first floor height and turn right by 180 '
'degrees then proceed. stop at bridge and turn right by '
'90 degrees then ascend ... (截断)
--------------------------------------------------------------------------------
字段名: goals
字段类型: list[dict]
字段示例: [{'position': [-85.99118098412254, -25.287750098101004, -21.785755023360252]}]
--------------------------------------------------------------------------------
字段名: reference_path
字段类型: list[list]
字段示例: [[66.29647064208984, -60.00279998779297, 0.21424497663974762, 0, 0,
0.014452647371354266],
[66.29647064208984, -60.00279998779297, -1.7857550233602524, 0, 0,
0.014452647371354266],
[66.296470642 ... (截断)
--------------------------------------------------------------------------------
字段名: actions
字段类型: list[int]
字段示例: [4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, 7, 1,
1, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 7, 1, 1, 1, 7, 1, 2, 2, 2, 1, 3, 1, ... (截断)
--------------------------------------------------------------------------------CityNAV
CityNav 是一个基于真实城市 3D 点云的 UAV 视觉语言导航数据集。数据集包含 32637 条自然语言导航描述与人类演示轨迹。与 AerialVLN 不同,CityNav 的数据分布在多个文件中。CityNav 将导航任务拆分为三个数据模块:导航轨迹(citynav JSON)、语言描述(processed_descriptions JSON)以及目标物体信息(objects JSON)。这些数据通过 ID 进行关联,共同构成完整的视觉语言导航任务。
分析哪些文件
根据 README
data/
├ citynav/
│ ├ citynav_train_seen.json
│ ├ citynav_val_seen.json
│ └ citynav_val_unseen.json
│
├ cityrefer/
│ ├ objects.json
│ └ processed_descriptions.jsonCityNav 主要有三个代表 JSON 需要分析:
citynav_train_seen.json // 导航任务数据
processed_descriptions.json // 自然语言指令数据
objects.json // 目标物体信息根据项目中的scripts/download_data.sh,找到数据集的链接为:https://www.dropbox.com/scl/fi/ekbogjn2ptxdde2gik6nx/data.tar.gz?rlkey=oq5smcqlbgc6do5mcowetj3mp&st=gx563bhw&dl=0。
分析json的脚本
略,太乱了,太多脚本了。
分析结果
1. citynav_val_seen.json(导航任务)
File: citynav/citynav_val_seen.json
Top type: list
Length: 2498
Fields
------------------------------------------------------------
area str example: 'birmingham'
block int example: 1
object_ids list[int] example: [11]
ann_ids list[int] example: [1]
descriptions list[str] example: ['The row of grayish brown houses on Leslie Road to the '
'left of the gray hou ...
trajectory list[list] example: [[33.6426964733, 561.3063272661, 141.8241423345,
0.7629737435, 0.1680965778, - ...
marker_positions list[list] example: [[378.8295684158, 465.7754565511, 17.5935184049]]
target_positions list[list] example: [[368.21875, 440.78125, 18.3899993896]]
total_score float example: 21.86
dist_marker_to_target float example: 27.16
split str example: 'val_seen'结构为:
citynav_val_seen.json
└ episode (dict)
├ area
├ block
├ object_ids
├ ann_ids
├ descriptions
├ trajectory
├ marker_positions
├ target_positions
├ total_score
├ dist_marker_to_target
└ split字段例子:
{
"area": "birmingham",
"block": 1,
"object_ids": [11],
"ann_ids": [1],
"descriptions": [
"The row of grayish brown houses on Leslie Road..."
],
"trajectory": [
[x, y, z, roll, pitch, yaw],
...
],
"marker_positions": [
[x, y, z]
],
"target_positions": [
[x, y, z]
],
"total_score": 21.86,
"dist_marker_to_target": 27.16,
"split": "val_seen"
}2. processed_descriptions.json(语言标注)
File: cityrefer/processed_descriptions.json
Top type: dict
Number of top-level keys: 34
Example key: cambridge_block_3
Fields
------------------------------------------------------------
0 list[dict] example: [{'landmarks': ['Buckingham Room'],
'surroundings': ['car park', 'similar buil ...
1 list[dict] example: [{'landmarks': ['School of Pythagoras'],
'surroundings': ['empty land', 'small ...
2 list[dict] example: [{'landmarks': ['Art Room building', 'Merton Hall'],
'surroundings': [],
'ta ...
...这个文件是按地图 block 存语言的。
processed_descriptions.json真实结构为:
dict
└ block_name
└ ann_id
└ list[dict]字段例子:
{
"landmarks": ["Buckingham Room"],
"surroundings": ["car park", "similar building"],
"target": "building"
}3. objects.json(物体信息)
结构为:
objects.json
│
├ object_id
│ ├ contour
│ ├ descriptions
│ ├ dimension
│ ├ id
│ ├ map_name
│ ├ name
│ ├ object_type
│ └ position
│
├ object_id
│ └ ...
│
└ ...字段例子:
'97': {'contour': [[367.95781014565546, 1486.5657059110895],
[364.929432446007, 1486.8588177821202],
[363.56177800100454, 1488.8128969223253],
[363.7571572074334, 1491.5486077186126],
[365.02712204922153, 1493.6003908158282],
[366.8832245102963, 1495.065950170982],
[369.7162230035158, 1495.1636541279922],
[371.5723254645907, 1493.6980947728384],
[371.7677046710196, 1490.962383976551],
[371.1815670517328, 1488.2266731802638],
[369.130085384229, 1486.9565217391305]],
'descriptions': ['A red car in between a silver car and a white car, '
'with a blue car in front of it, in the parking lot '
'of the Cripps building and the squash court building',
'The dark red car between a white car and gray van in '
'the parking lot behind the Squash Court building. '
'There is a bush near the front of it.',
'A red car parked in the parking lot between the '
'Squash Court and Bin Brook with a grey vehicle to '
'one side and a white vehicle to the other.',
'In the parking lot between Bin Brook and Squash '
'Court, this red car is in the bottom half of the '
'bottom row, seventh car from the right.',
'The red car parked in the middle of the first '
'parking row, in the parking lot beside Bin Brook.',
'A red car between a white car and gray car in the '
'Buckingham Room parking lot.'],
'dimension': [3.5625, 4.078125, 1.6800003051757812],
'id': 97,
'map_name': 'cambridge_block_3',
'name': '',
'object_type': 'Car',
'position': [367.65625, 1490.8984375, 34.25]},三个json的关系
| 文件 | 层级 | 作用 |
|---|---|---|
citynav_val_seen.json | navigation-level | 导航任务 |
processed_descriptions.json | language-level | 语言解析 |
objects.json | object-level | 地图物体信息 |
导航任务(citynav)通过 ann_ids 关联语言结构(processed_descriptions),再通过 object_ids 关联地图物体信息(objects)。
