Creating a STAC of Multi-sensor Topographic Data¶
Welcome! This notebook walks through a complete example for building a multi-sensor STAC catalog for the Isar research site. We harvest metadata from UAV Laser Scanning (ULS), UAV Photogrammetry (UPH) and some processing products (e.g., M3C2 distances), then package everything into interoperable STAC Items, Collections, and a top-level Catalog using the topo4d STAC extension together with core STAC features.
You will learn how to
- gather raw LAS/LAZ metadata and combine it with user supplied context
- create individual STAC Items that include topo4d, projection, point cloud, and timestamps extensions
- attach Topo4D-specific Product and Trafo metadata
- organise Items into thematic Collections and publish a reusable Catalog layout
Before you start
- make sure the demo data under
demo/Isar
is available locally pystac
and other required libraries installed in the active environmenttopo4d
helper script locally stored and accessible- basic familiarity with Python and STAC terminology
The notebook highlights each step, explains why it is needed, and points to the relevant STAC fields so you can adapt the flow to your own projects.
Reference
- The STAC Specification
- Create a Basic STAC Catalog Using PySTAC
- Topographic 4D Extension Specification
Last update: 19.09.2025
Import libraries¶
We import required libraries alongside Topo4D STAC helper classes from topo4d
. The sys.path.append
call ensures the repository root is available so the local topo4d
package can be imported while the notebook runs from docs/notebooks
.
import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any
import sys
import os
sys.path.append(os.path.abspath('../..'))
import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset
Define dirs and paths¶
The demo files are stored in demo/Isar/data
, but you can swap these paths for your own acquisition folders. The point cloud files are under Isar_pointcloud
and the processing products are under Isar_m3c2
.
data_root = "../../demo/Isar/data"
data_dir = "../../demo/Isar/data/Isar_pointclouds"
Extract metadata from las/laz file¶
extract_metadata_from_las
parses the LAS headers, point counts, and VLR blocks for every LAS/LAZ file we find. We list all candidate files first, then iterate through them so the helper can persist metadata snapshots alongside the raw data. Having structured metadata up front makes the later STAC construction straightforward. Adapt these block based on your own dataset.
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]
for file_path in file_paths:
print(f"Processing {file_path}")
meta_las = extract_metadata_from_las(file_path, if_save=True)
Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20240812_UPH_10cm.copc.laz Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20241105_UPH_10cm.copc.laz Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20250325_ULS_10cm.copc.laz
# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
Ingest metadata¶
Besides the automatically extracted values we add domain knowledge that cannot be inferred from the file alone: sensor model, acquisition mode, timezone, transformation matrix, and so on. The resulting dictionary merges machine derived and user supplied fields and becomes the single source of truth when we instantiate the STAC Item.
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")
# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput[file_id])
print(json.dumps(metadata, indent=2))
{ "id": "Isar_20240812_UPH_10cm", "header": { "DEFAULT_POINT_FORMAT": "<PointFormat(3, 0 bytes of extra dims)>", "DEFAULT_VERSION": [ 1, 2 ], "are_points_compressed": true, "creation_date": "2025-05-05", "evlrs": "[<laspy.copc.CopcHierarchyVlr object at 0x0000018CC84A8DD0>]", "extra_header_bytes": null, "extra_vlr_bytes": null, "file_source_id": 0, "generating_software": "PDAL 2.7.0 (d5d146)", "global_encoding": "<laspy.header.GlobalEncoding object at 0x0000018CC554AF00>", "major_version": 1, "maxs": [ 674345.436, 5266840.8687, 910.46 ], "minor_version": 4, "mins": [ 673496.386, 5266235.7505, 880.4851 ], "number_of_evlrs": 1, "number_of_points_by_return": [ 23314488, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "offset_to_point_data": 1239, "offsets": [ 673934.0, 5266523.0, 901.0 ], "point_count": 23314488, "point_format": "<PointFormat(7, 0 bytes of extra dims)>", "scales": [ 0.0001, 0.0001, 0.0001 ], "start_of_first_evlr": 214694905, "start_of_waveform_data_packet_record": 0, "system_identifier": "PDAL", "uuid": "00000000-0000-0000-0000-000000000000", "version": [ 1, 4 ], "vlrs": "[<laspy.copc.CopcInfoVlr object at 0x0000018CC837B260>, <laspy.vlrs.known.LasZipVlr object at 0x0000018CC85F8470>, <laspy.vlrs.known.WktCoordinateSystemVlr object at 0x0000018CC8776120>]", "x_max": 674345.436, "x_min": 673496.386, "x_offset": 673934.0, "x_scale": 0.0001, "y_max": 5266840.8687, "y_min": 5266235.7505, "y_offset": 5266523.0, "y_scale": 0.0001, "z_max": 910.46, "z_min": 880.4851, "z_offset": 901.0, "z_scale": 0.0001 }, "vlrs": [ { "user_id": "copc", "record_id": 1, "description": "COPC info VLR", "center": [ 673920.9110000001, 5266660.2754999995, 1305.010099999965 ], "gps_max": 0.0, "gps_min": 0.0, "halfsize": 424.5249999999651, "hierarchy_root_offset": 214694965, "hierarchy_root_size": 21184, "spacing": 5.775850340135579 }, { "user_id": "laszip encoded", "record_id": 22204, "description": "http://laszip.org" }, { "user_id": "LASF_Projection", "record_id": 2112, "description": "", "string": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]" } ], "evlrs": [ { "user_id": "copc", "record_id": 1000, "description": "EPT Hierarchy", "root_page": "<laspy.copc.HierarchyPage object at 0x0000018CC879C050>" } ], "native_crs": "EPSG:25832", "datetime": "2024-08-12T00:00:00", "data_type": "pointcloud", "sensor": "DJI Phantom 4 RTK", "timezone": "UTC+1", "acquisition_mode": "UPH", "orientation": "Nadir", "spatial_resolution": 0.1, "measurement_error": 0.04, "asset_url": "./Isar_20240812_UPH_10cm.copc.laz" }
Create a STAC Item from metadata¶
To keep the workflow readable and testable we split the Item construction into small helper functions. Each of the next subsections covers one aspect of STAC metadata so you can reuse the logic in scripts or pipelines.
- Item
id
Every STAC Item needs a stable identifier. Here we derive it from the filename: each LAS is named after the acquisition timestamp, so the ID doubles as a human readable capture time and remains unique across the dataset.
def get_id(metadata: Dict[str, Any]) -> str:
return metadata.get("id")
item_id = get_id(metadata)
item_id
'Isar_20240812_UPH_10cm'
- Item
Datetime
This helper normalises the acquisition timestamp. It parses the string, attaches the correct timezone (either a named timezone or an explicit UTC offset), and returns a timezone aware datetime
. STAC Items expect naive UTC values, so we strip the tzinfo after performing the conversion.
def get_datetime(metadata: Dict) -> datetime:
dt_str = metadata.get("datetime")
if not dt_str:
raise ValueError("Missing 'datetime' in metadata")
dt = parser.parse(dt_str)
tz_str = metadata.get("timezone")
if tz_str and tz_str.startswith("UTC"):
sign = 1 if "+" in tz_str else -1
offset_str = tz_str[3:].replace("+", "").replace("-", "")
parts = offset_str.split(":")
hours = int(parts[0]) if parts[0] else 0
minutes = int(parts[1]) if len(parts) > 1 else 0
offset = timedelta(hours=sign * hours, minutes=sign * minutes)
tz = timezone(offset)
dt = dt.replace(tzinfo=tz)
elif tz_str and "/" in tz_str:
tz = pytz.timezone(tz_str)
dt = tz.localize(dt)
elif dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
item_datetime = get_datetime(metadata)
item_datetime
datetime.datetime(2024, 8, 12, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
- Item
bbox
,geometry
and CRS information
We compute the spatial footprint from the LAS header min and max coordinates. When a global transformation matrix is available it is applied before determining the bounding box. We also scan the LAS VLRs to recover the native CRS so we can report the footprint both in the acquisition CRS and in WGS84 for the Projection extension.
from shapely.geometry import box, mapping
from pyproj import CRS, Transformer
def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
header = metadata.get("header")
vlrs = metadata.get("vlrs", [])
native_crs = metadata.get("native_crs", None)
if not header or "mins" not in header or "maxs" not in header:
raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")
xmin, ymin, zmin = header["mins"]
xmax, ymax, zmax = header["maxs"]
crs = None
for vlr in vlrs:
if vlr.get("user_id", "").lower() == "lasf_projection":
if "wkt" in vlr and vlr["wkt"]:
crs = CRS.from_wkt(vlr["wkt"])
break
elif "epsg_code" in vlr and vlr["epsg_code"]:
crs = CRS.from_epsg(vlr["epsg_code"])
break
if not crs and native_crs:
try:
crs = CRS.from_user_input(native_crs)
except Exception:
crs = None
bbox = [xmin, ymin, xmax, ymax]
if crs and crs.to_epsg() != 4326:
try:
transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)
min_lon, min_lat = transformer.transform(xmin, ymin)
max_lon, max_lat = transformer.transform(xmax, ymax)
# geom in WGS84
bbox = [min_lon, min_lat, max_lon, max_lat]
geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
except Exception:
pass
else:
# geom in native CRS
geom = mapping(box(xmin, ymin, xmax, ymax))
return {
"bbox": bbox,
"geometry": geom,
"native_crs": crs.to_string() if crs else None
}
item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))
{ "bbox": [ 11.304866670991222, 47.52637673872775, 11.316376321635966, 47.53158992234481 ], "geometry": { "type": "Polygon", "coordinates": [ [ [ 11.316376321635966, 47.52637673872775 ], [ 11.316376321635966, 47.53158992234481 ], [ 11.304866670991222, 47.53158992234481 ], [ 11.304866670991222, 47.52637673872775 ], [ 11.316376321635966, 47.52637673872775 ] ] ] }, "native_crs": "EPSG:25832" }
Create the item¶
With the temporal and spatial metadata in hand we instantiate pystac.Item
. At this point the Item only contains the required geometry, bounding box, and datetime; later cells enrich it with extensions, properties, and assets.
item = pystac.Item(
id=item_id,
geometry=item_geo["geometry"],
bbox=item_geo["bbox"],
datetime=item_datetime.replace(tzinfo=None),
properties={}
)
Adding STAC Extensions¶
We register the projection, point cloud, and timestamps extensions and then use the Topo4D extension to store topographic 4D specific metadata such as timezone, acquisition mode, and transformation matrices. Mapping custom fields onto established STAC extensions keeps the Item interoperable with other tooling.
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")
Topo4D Extension added: True
# Add standard STAC extensions first
item.stac_extensions.extend([
"https://stac-extensions.github.io/projection/v2.0.0/schema.json",
"https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json",
"https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])
# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'
item.properties
{'topo4d:timezone': 'UTC+1', 'topo4d:data_type': 'pointcloud', 'proj:code': 'EPSG:25832', 'instruments': ['DJI Phantom 4 RTK'], 'pc:count': 23314488, 'pc:type': 'other'}
Adding assets¶
The make_item_asset
helper wraps the LAS file path with the appropriate media type and asset metadata. Adding the asset under a descriptive key lets downstream clients discover and download the raw point cloud directly from the Item.
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)
item.add_asset(
key=asset_name,
asset=item_asset
)
item
- type "Feature"
- stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- 1 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 3 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- id "Isar_20240812_UPH_10cm"
geometry
- type "Polygon"
coordinates[] 1 items
0[] 5 items
0[] 2 items
- 0 11.316376321635966
- 1 47.52637673872775
1[] 2 items
- 0 11.316376321635966
- 1 47.53158992234481
2[] 2 items
- 0 11.304866670991222
- 1 47.53158992234481
3[] 2 items
- 0 11.304866670991222
- 1 47.52637673872775
4[] 2 items
- 0 11.316376321635966
- 1 47.52637673872775
bbox[] 4 items
- 0 11.304866670991222
- 1 47.52637673872775
- 2 11.316376321635966
- 3 47.53158992234481
properties
- topo4d:timezone "UTC+1"
- topo4d:data_type "pointcloud"
- proj:code "EPSG:25832"
instruments[] 1 items
- 0 "DJI Phantom 4 RTK"
- pc:count 23314488
- pc:type "other"
- datetime "2024-08-12T00:00:00Z"
links[] 0 items
assets
pointcloud
- href "./Isar_20240812_UPH_10cm.copc.laz"
- type "application/vnd.laszip+copc"
roles[] 1 items
- 0 "data"
Validating the item¶
item.validate()
performs schema checks across the Item and its extensions.
item.validate() # Uncomment this line to validate the item
['https://schemas.stacspec.org/v1.1.0/item-spec/json-schema/item.json', 'https://tum-rsa.github.io/topo4d/v0.1.0/schema.json', 'https://stac-extensions.github.io/projection/v2.0.0/schema.json', 'https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json', 'https://stac-extensions.github.io/timestamps/v1.1.0/schema.json']
Create more items¶
After confirming the helpers work for a single file we scale up to the full dataset. The loop below merges in the shared user metadata for each LAS file, builds a list of Items, and assigns self HREFs so they can be written to disk later on.
def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
item_id = get_id(metadata)
item_datetime = get_datetime(metadata)
item_geo = get_geo(metadata)
item = pystac.Item(
id=item_id,
geometry=item_geo["geometry"],
bbox=item_geo["bbox"],
datetime=item_datetime.replace(tzinfo=None),
properties={}
)
# Add standard STAC extensions
item.stac_extensions.extend([
"https://stac-extensions.github.io/projection/v2.0.0/schema.json",
"https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json",
"https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])
# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.tz = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None) # Fixed typo: was acquisition_date
topo4d_ext.orientation = metadata.get("orientation", None)
topo4d_ext.spatial_resolution = metadata.get("spatial_resolution", None)
topo4d_ext.measurement_error = metadata.get("measurement_error", None)
# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'
# Add timestamps if available
if metadata.get("header", {}).get("creation_date"):
try:
creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
item.properties["timestamps:published"] = creation_date
except:
pass
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)
item.add_asset(
key=asset_name,
asset=item_asset
)
return item
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")
item_list = []
metadata_list = []
for file_path in tqdm(file_paths, desc="Processing files"):
meta_las = extract_metadata_from_las(file_path, if_save=True)
file_id = Path(file_path).stem.split(".")[0]
if file_id not in meta_userinput:
print(f"Skipping {file_id} as it is not in user input metadata")
continue
metadata = meta_las
metadata.update(meta_userinput[file_id])
metadata_list.append(metadata)
item = item_from_metadata(metadata)
# item.validate() # Uncomment when the Schema url is public
item.set_self_href(f"{data_dir}/{item.id}.json")
item_list.append(item)
Processing files: 100%|██████████| 3/3 [00:00<00:00, 374.41it/s]
item_list[0]
- type "Feature"
- stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 1 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- 3 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- id "Isar_20240812_UPH_10cm"
geometry
- type "Polygon"
coordinates[] 1 items
0[] 5 items
0[] 2 items
- 0 11.316376321635966
- 1 47.52637673872775
1[] 2 items
- 0 11.316376321635966
- 1 47.53158992234481
2[] 2 items
- 0 11.304866670991222
- 1 47.53158992234481
3[] 2 items
- 0 11.304866670991222
- 1 47.52637673872775
4[] 2 items
- 0 11.316376321635966
- 1 47.52637673872775
bbox[] 4 items
- 0 11.304866670991222
- 1 47.52637673872775
- 2 11.316376321635966
- 3 47.53158992234481
properties
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "UPH"
- topo4d:orientation "Nadir"
- topo4d:spatial_resolution 0.1
- topo4d:measurement_error 0.04
- proj:code "EPSG:25832"
instruments[] 1 items
- 0 "DJI Phantom 4 RTK"
- pc:count 23314488
- pc:type "other"
- timestamps:published "2025-05-05T00:00:00Z"
- datetime "2024-08-12T00:00:00Z"
links[] 1 items
0
- rel "self"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20240812_UPH_10cm.json"
- type "application/json"
assets
pointcloud
- href "./Isar_20240812_UPH_10cm.copc.laz"
- type "application/vnd.laszip+copc"
roles[] 1 items
- 0 "data"
Adding TrafoMeta¶
Transformation matrices describe how each acquisition relates to a reference epoch. We load the per-file matrices, reference them back to the chosen epoch Item, and store them via the Topo4D TrafoMeta
helper so the downstream processing chain can reproduce alignment steps.
trafo_paths = [f for f in Path(data_dir).glob("trafo*.txt")]
ref_id = "Isar_20241105_UPH_10cm"
# find the match id in item_list
if ref_id in [item.id for item in item_list]:
reference_epoch = [item for item in item_list if item.id == ref_id][0]
reference_epoch_link = pystac.Link(
rel=reference_epoch.id,
target=reference_epoch.get_self_href(),
title="Reference Epoch",
media_type=pystac.MediaType.JSON)
for idx, trafo_path in enumerate(trafo_paths):
with open(trafo_path, "r") as f:
# read trafo meta from txt file as array
trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]
# TrafoMeta.create() now requires reference_epoch as first argument
trafo_meta = TrafoMeta.create(
reference_epoch=reference_epoch_link.to_dict(), # Required parameter
transformation=trafo_mat,
)
item = item_list[idx]
Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta
else:
print(f"Warning: Reference epoch '{ref_id}' not found in item list")
Adding ProductMeta¶
Product metadata captures how a derivative product was generated. When available we convert product attributes into the Topo4D ProductMeta structure and also propagate publication timestamps to the standard timestamps extension.
for idx, meta in enumerate(metadata_list):
item = item_list[idx]
if "product_name" in meta:
product_meta = ProductMeta.create(
product_name=meta["product_name"],
param={"param": meta.get("param", None)},
derived_from=meta.get("derived_from", None),
product_level=meta.get("product_level", None),
)
Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
# Use timestamps extension for publication date instead of ProductMeta.lastupdate
if meta.get('header', {}).get("creation_date"):
try:
creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
item.properties["timestamps:published"] = creation_date
except:
pass
Create the Collection¶
A Collection summarises a group of related Items. The cells that follow fill out each required field so you can tailor the description, license, and providers to match your organisation's publishing standards.
- Collection
id
Choose a short, unique identifier that reflects the site and acquisition technique.
collection_id = f"Isar_pointclouds_collection"
collection_id
'Isar_pointclouds_collection'
- Collection
title
Provide a human friendly title that will appear in catalog browsers and search results.
collection_title = "Isar Point Clouds Collection"
collection_title
'Isar Point Clouds Collection'
- Collection
description
Describe what the dataset contains, the sensors involved, and any processing highlights. Markdown formatting is supported.
collection_desc = f'''### {collection_title}
A collection of point clouds from Isar riverbank, near Wallgau, Germany.
'''
print(collection_desc)
### Isar Point Clouds Collection A collection of point clouds from Isar riverbank, near Wallgau, Germany.
- Collection
license
Declare the usage rights for the data. You can reference a standard SPDX identifier or supply a custom license string.
collection_license = "CC-BY-4.0"
collection_license
'CC-BY-4.0'
- Collection
provider
List the organisations responsible for producing, hosting, or licensing the dataset. Add as many providers as needed and assign the appropriate roles.
collection_providers = [
pystac.Provider(
name="TUM Remote Sensing Applications",
roles=[
pystac.ProviderRole.PROCESSOR,
pystac.ProviderRole.PRODUCER,
pystac.ProviderRole.LICENSOR,
pystac.ProviderRole.HOST
],
url="https://www.asg.ed.tum.de/rsa/startseite/",
),
]
- Collection
extent
The extent captures both spatial and temporal coverage. We seed it with broad bounds here and later tighten it using the Items so search clients receive accurate metadata.
spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)
collection_pc = pystac.Collection(
id=collection_id,
title=collection_title,
description=collection_desc,
extent=collection_extent,
license=collection_license,
providers=collection_providers,
)
collection_pc.to_dict()
{'type': 'Collection', 'id': 'Isar_pointclouds_collection', 'stac_version': '1.1.0', 'description': '### Isar Point Clouds Collection\n\nA collection of point clouds from Isar riverbank, near Wallgau, Germany.\n', 'links': [], 'title': 'Isar Point Clouds Collection', 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, 'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}}, 'license': 'CC-BY-4.0', 'providers': [{'name': 'TUM Remote Sensing Applications', 'roles': ['processor', 'producer', 'licensor', 'host'], 'url': 'https://www.asg.ed.tum.de/rsa/startseite/'}]}
# add items to the collection
for item in item_list:
collection_pc.add_item(item)
- Collection
summaries
Summaries expose quick statistics about the Collection (counts, ranges, enumerations) and help STAC clients reason about the dataset without loading every Item.
# add collection summaries
collection_pc.summaries.add("num_items", {"count": len(item_list)})
collection_pc.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_pc.summaries.add("temporal_resolution", {"resolution": "half-annual"})
collection_pc.update_extent_from_items()
collection_pc.extent.to_dict()
collection_pc
- type "Collection"
- id "Isar_pointclouds_collection"
- stac_version "1.1.0"
- description "### Isar Point Clouds Collection A collection of point clouds from Isar riverbank, near Wallgau, Germany. "
links[] 3 items
0
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20240812_UPH_10cm.json"
- type "application/geo+json"
1
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20241105_UPH_10cm.json"
- type "application/geo+json"
2
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20250325_ULS_10cm.json"
- type "application/geo+json"
- title "Isar Point Clouds Collection"
extent
spatial
bbox[] 1 items
0[] 4 items
- 0 11.304168016412058
- 1 47.52600520593624
- 2 11.3172511501222
- 3 47.53281392856261
temporal
interval[] 1 items
0[] 2 items
- 0 "2024-08-12T00:00:00Z"
- 1 "2025-03-25T00:00:00Z"
- license "CC-BY-4.0"
providers[] 1 items
0
- name "TUM Remote Sensing Applications"
roles[] 4 items
- 0 "processor"
- 1 "producer"
- 2 "licensor"
- 3 "host"
- url "https://www.asg.ed.tum.de/rsa/startseite/"
summaries
num_items
- count 3
timestamp_list
list[] 3 items
- 0 "2024-08-12T00:00:00"
- 1 "2024-11-05T00:00:00"
- 2 "2025-03-25T00:00:00"
temporal_resolution
- resolution "half-annual"
Create more collections¶
If the site has multiple sensor or other groupings, repeat the workflow to produce additional Collections. This section shows how to combine Items into thematic buckets (for example, M3C2 products).
data_dir = "../../demo/Isar/data/Isar_m3c2"
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]
for file_path in file_paths:
print(f"Processing {file_path}")
meta_las = extract_metadata_from_las(file_path, if_save=True)
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
Processing ..\..\demo\Isar\data\Isar_m3c2\Isar_m3c2_20240812_20241105_10cm.copc.laz Processing ..\..\demo\Isar\data\Isar_m3c2\Isar_m3c2_20240812_20250325_10cm.copc.laz
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")
item_list = []
metadata_list = []
for file_path in tqdm(file_paths, desc="Processing files"):
meta_las = extract_metadata_from_las(file_path, if_save=True)
file_id = Path(file_path).stem.split(".")[0]
if file_id not in meta_userinput:
print(f"Skipping {file_id} as it is not in user input metadata")
continue
metadata = meta_las
metadata.update(meta_userinput[file_id])
metadata_list.append(metadata)
item = item_from_metadata(metadata)
# item.validate() # Uncomment when the Schema url is public
item.set_self_href(f"{data_dir}/{item.id}.json")
item_list.append(item)
Processing files: 100%|██████████| 2/2 [00:00<00:00, 303.85it/s]
item_list[0].get_self_href()
'c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20241105_10cm.json'
for idx, meta in enumerate(metadata_list):
item = item_list[idx]
if "product_name" in meta:
product_meta = ProductMeta.create(
product_name=meta["product_name"],
param={
"cyl_radius": meta.get("cyl_radius", None),
"reference_epoch": meta.get("reference_epoch", None),
"compared_epoch": meta.get("compared_epoch", None)
},
derived_from=meta.get("derived_from", None),
product_level=meta.get("product_level", None),
)
Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
# Use timestamps extension for publication date instead of ProductMeta.lastupdate
if meta.get('header', {}).get("creation_date"):
try:
creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
item.properties["timestamps:published"] = creation_date
except:
pass
collection_id = f"Isar_m3c2_collection"
collection_title = "Isar M3C2 Products Collection"
collection_desc = f'''### {collection_title}
A collection of M3C2 products from Isar riverbank, near Wallgau, Germany.
'''
collection_license = "CC-BY-4.0"
collection_providers = [
pystac.Provider(
name="TUM Remote Sensing Applications",
roles=[
pystac.ProviderRole.PROCESSOR,
pystac.ProviderRole.PRODUCER,
pystac.ProviderRole.LICENSOR,
pystac.ProviderRole.HOST
],
url="https://www.asg.ed.tum.de/rsa/startseite/",
),
]
spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)
collection_m3c2 = pystac.Collection(
id=collection_id,
title=collection_title,
description=collection_desc,
extent=collection_extent,
license=collection_license,
providers=collection_providers,
)
collection_m3c2.to_dict()
{'type': 'Collection', 'id': 'Isar_m3c2_collection', 'stac_version': '1.1.0', 'description': '### Isar M3C2 Products Collection\n\nA collection of M3C2 products from Isar riverbank, near Wallgau, Germany.\n', 'links': [], 'title': 'Isar M3C2 Products Collection', 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, 'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}}, 'license': 'CC-BY-4.0', 'providers': [{'name': 'TUM Remote Sensing Applications', 'roles': ['processor', 'producer', 'licensor', 'host'], 'url': 'https://www.asg.ed.tum.de/rsa/startseite/'}]}
# add items to the collection
for item in item_list:
collection_m3c2.add_item(item)
# add collection summaries
collection_m3c2.summaries.add("num_items", {"count": len(item_list)})
collection_m3c2.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_m3c2.summaries.add("temporal_resolution", {"resolution": "half-annual"})
collection_m3c2.update_extent_from_items()
collection_m3c2.extent.to_dict()
collection_m3c2
- type "Collection"
- id "Isar_m3c2_collection"
- stac_version "1.1.0"
- description "### Isar M3C2 Products Collection A collection of M3C2 products from Isar riverbank, near Wallgau, Germany. "
links[] 2 items
0
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20241105_10cm.json"
- type "application/geo+json"
1
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20250325_10cm.json"
- type "application/geo+json"
- title "Isar M3C2 Products Collection"
extent
spatial
bbox[] 1 items
0[] 4 items
- 0 11.304866670991222
- 1 47.52637673872775
- 2 11.31637632165577
- 3 47.53158992279433
temporal
interval[] 1 items
0[] 2 items
- 0 "2024-11-05T00:00:00Z"
- 1 "2025-03-25T00:00:00Z"
- license "CC-BY-4.0"
providers[] 1 items
0
- name "TUM Remote Sensing Applications"
roles[] 4 items
- 0 "processor"
- 1 "producer"
- 2 "licensor"
- 3 "host"
- url "https://www.asg.ed.tum.de/rsa/startseite/"
summaries
num_items
- count 2
timestamp_list
list[] 2 items
- 0 "2024-11-05T00:00:00"
- 1 "2025-03-25T00:00:00"
temporal_resolution
- resolution "half-annual"
Create the Catalog¶
Finally we wrap the Collection in a top-level Catalog so the structure is portable. Catalogs can contain multiple Collections.
catalog_id = "Isar-catalog"
catalog_desc = "This catalog contains multi-temporal point cloud data and M3C2 products from Isar river near Wallgau, Germany."
catalog = pystac.Catalog(
id=catalog_id,
description=catalog_desc
)
catalog.add_child(collection_pc)
catalog.add_child(collection_m3c2)
catalog.describe()
* <Catalog id=Isar-catalog> * <Collection id=Isar_pointclouds_collection> * <Item id=Isar_20240812_UPH_10cm> * <Item id=Isar_20241105_UPH_10cm> * <Item id=Isar_20250325_ULS_10cm> * <Collection id=Isar_m3c2_collection> * <Item id=Isar_m3c2_20240812_20241105_10cm> * <Item id=Isar_m3c2_20240812_20250325_10cm>
Save the STAC¶
Normalising HREFs writes a clean directory layout to disk. Saving as SELF_CONTAINED
ensures every JSON document references relatives paths so moving the catalog elsewhere keeps the links intact.
from pathlib import Path
root_path = str(Path(f"{data_root}"))
print(f"Root path for catalog: {root_path}")
from pystac.layout import TemplateLayoutStrategy
# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
item_template="${id}.json"
)
catalog.normalize_hrefs(root_path, strategy=strategy)
Root path for catalog: ..\..\demo\Isar\data
catalog.validate_all() # Uncomment when the Schema url is public
5
catalog.save(pystac.CatalogType.SELF_CONTAINED)
catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()
* <Catalog id=Isar-catalog> * <Collection id=Isar_pointclouds_collection> * <Item id=Isar_20240812_UPH_10cm> * <Item id=Isar_20241105_UPH_10cm> * <Item id=Isar_20250325_ULS_10cm> * <Collection id=Isar_m3c2_collection> * <Item id=Isar_m3c2_20240812_20241105_10cm> * <Item id=Isar_m3c2_20240812_20250325_10cm>