Creating a STAC of Multi-sensor Topographic Data¶

Welcome! This notebook walks through a complete example for building a multi-sensor STAC catalog for the Isar research site. We harvest metadata from UAV Laser Scanning (ULS), UAV Photogrammetry (UPH) and some processing products (e.g., M3C2 distances), then package everything into interoperable STAC Items, Collections, and a top-level Catalog using the topo4d STAC extension together with core STAC features.

You will learn how to

gather raw LAS/LAZ metadata and combine it with user supplied context
create individual STAC Items that include topo4d, projection, point cloud, and timestamps extensions
attach Topo4D-specific Product and Trafo metadata
organise Items into thematic Collections and publish a reusable Catalog layout

Before you start

make sure the demo data under demo/Isar is available locally
pystac and other required libraries installed in the active environment
topo4d helper script locally stored and accessible
basic familiarity with Python and STAC terminology

The notebook highlights each step, explains why it is needed, and points to the relevant STAC fields so you can adapt the flow to your own projects.

Reference

Last update: 19.09.2025

Import libraries¶

We import required libraries alongside Topo4D STAC helper classes from topo4d. The sys.path.append call ensures the repository root is available so the local topo4d package can be imported while the notebook runs from docs/notebooks.

In [1]:

Copied!





import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any

import sys
import os
sys.path.append(os.path.abspath('../..'))

import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset
import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any

import sys
import os
sys.path.append(os.path.abspath('../..'))

import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset

Define dirs and paths¶

The demo files are stored in demo/Isar/data, but you can swap these paths for your own acquisition folders. The point cloud files are under Isar_pointcloud and the processing products are under Isar_m3c2.

In [2]:

Copied!

data_root = "../../demo/Isar/data"
data_dir = "../../demo/Isar/data/Isar_pointclouds"
data_root = "../../demo/Isar/data"
data_dir = "../../demo/Isar/data/Isar_pointclouds"

Extract metadata from las/laz file¶

extract_metadata_from_las parses the LAS headers, point counts, and VLR blocks for every LAS/LAZ file we find. We list all candidate files first, then iterate through them so the helper can persist metadata snapshots alongside the raw data. Having structured metadata up front makes the later STAC construction straightforward. Adapt these block based on your own dataset.

In [3]:

Copied!

file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)

Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20240812_UPH_10cm.copc.laz
Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20241105_UPH_10cm.copc.laz
Processing ..\..\demo\Isar\data\Isar_pointclouds\Isar_20250325_ULS_10cm.copc.laz

In [4]:

Copied!

# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]

Ingest metadata¶

Besides the automatically extracted values we add domain knowledge that cannot be inferred from the file alone: sensor model, acquisition mode, timezone, transformation matrix, and so on. The resulting dictionary merges machine derived and user supplied fields and becomes the single source of truth when we instantiate the STAC Item.

In [5]:

Copied!

meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput[file_id])

print(json.dumps(metadata, indent=2))
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput[file_id])

print(json.dumps(metadata, indent=2))

{
  "id": "Isar_20240812_UPH_10cm",
  "header": {
    "DEFAULT_POINT_FORMAT": "<PointFormat(3, 0 bytes of extra dims)>",
    "DEFAULT_VERSION": [
      1,
      2
    ],
    "are_points_compressed": true,
    "creation_date": "2025-05-05",
    "evlrs": "[<laspy.copc.CopcHierarchyVlr object at 0x0000018CC84A8DD0>]",
    "extra_header_bytes": null,
    "extra_vlr_bytes": null,
    "file_source_id": 0,
    "generating_software": "PDAL 2.7.0 (d5d146)",
    "global_encoding": "<laspy.header.GlobalEncoding object at 0x0000018CC554AF00>",
    "major_version": 1,
    "maxs": [
      674345.436,
      5266840.8687,
      910.46
    ],
    "minor_version": 4,
    "mins": [
      673496.386,
      5266235.7505,
      880.4851
    ],
    "number_of_evlrs": 1,
    "number_of_points_by_return": [
      23314488,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0
    ],
    "offset_to_point_data": 1239,
    "offsets": [
      673934.0,
      5266523.0,
      901.0
    ],
    "point_count": 23314488,
    "point_format": "<PointFormat(7, 0 bytes of extra dims)>",
    "scales": [
      0.0001,
      0.0001,
      0.0001
    ],
    "start_of_first_evlr": 214694905,
    "start_of_waveform_data_packet_record": 0,
    "system_identifier": "PDAL",
    "uuid": "00000000-0000-0000-0000-000000000000",
    "version": [
      1,
      4
    ],
    "vlrs": "[<laspy.copc.CopcInfoVlr object at 0x0000018CC837B260>, <laspy.vlrs.known.LasZipVlr object at 0x0000018CC85F8470>, <laspy.vlrs.known.WktCoordinateSystemVlr object at 0x0000018CC8776120>]",
    "x_max": 674345.436,
    "x_min": 673496.386,
    "x_offset": 673934.0,
    "x_scale": 0.0001,
    "y_max": 5266840.8687,
    "y_min": 5266235.7505,
    "y_offset": 5266523.0,
    "y_scale": 0.0001,
    "z_max": 910.46,
    "z_min": 880.4851,
    "z_offset": 901.0,
    "z_scale": 0.0001
  },
  "vlrs": [
    {
      "user_id": "copc",
      "record_id": 1,
      "description": "COPC info VLR",
      "center": [
        673920.9110000001,
        5266660.2754999995,
        1305.010099999965
      ],
      "gps_max": 0.0,
      "gps_min": 0.0,
      "halfsize": 424.5249999999651,
      "hierarchy_root_offset": 214694965,
      "hierarchy_root_size": 21184,
      "spacing": 5.775850340135579
    },
    {
      "user_id": "laszip encoded",
      "record_id": 22204,
      "description": "http://laszip.org"
    },
    {
      "user_id": "LASF_Projection",
      "record_id": 2112,
      "description": "",
      "string": "PROJCS[\"WGS 84 / UTM zone 32N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32632\"]]"
    }
  ],
  "evlrs": [
    {
      "user_id": "copc",
      "record_id": 1000,
      "description": "EPT Hierarchy",
      "root_page": "<laspy.copc.HierarchyPage object at 0x0000018CC879C050>"
    }
  ],
  "native_crs": "EPSG:25832",
  "datetime": "2024-08-12T00:00:00",
  "data_type": "pointcloud",
  "sensor": "DJI Phantom 4 RTK",
  "timezone": "UTC+1",
  "acquisition_mode": "UPH",
  "orientation": "Nadir",
  "spatial_resolution": 0.1,
  "measurement_error": 0.04,
  "asset_url": "./Isar_20240812_UPH_10cm.copc.laz"
}

Create a STAC Item from metadata¶

To keep the workflow readable and testable we split the Item construction into small helper functions. Each of the next subsections covers one aspect of STAC metadata so you can reuse the logic in scripts or pipelines.

Item id

Every STAC Item needs a stable identifier. Here we derive it from the filename: each LAS is named after the acquisition timestamp, so the ID doubles as a human readable capture time and remains unique across the dataset.

In [6]:

Copied!

def get_id(metadata: Dict[str, Any]) -> str:
    return metadata.get("id")
def get_id(metadata: Dict[str, Any]) -> str:
    return metadata.get("id")

In [7]:

Copied!

item_id = get_id(metadata)
item_id
item_id = get_id(metadata)
item_id

Out[7]:

'Isar_20240812_UPH_10cm'

Item Datetime

This helper normalises the acquisition timestamp. It parses the string, attaches the correct timezone (either a named timezone or an explicit UTC offset), and returns a timezone aware datetime. STAC Items expect naive UTC values, so we strip the tzinfo after performing the conversion.

In [8]:

Copied!





def get_datetime(metadata: Dict) -> datetime:
    dt_str = metadata.get("datetime")
    if not dt_str:
        raise ValueError("Missing 'datetime' in metadata")

    dt = parser.parse(dt_str)

    tz_str = metadata.get("timezone")
    if tz_str and tz_str.startswith("UTC"):
        sign = 1 if "+" in tz_str else -1
        offset_str = tz_str[3:].replace("+", "").replace("-", "")
        parts = offset_str.split(":")
        hours = int(parts[0]) if parts[0] else 0
        minutes = int(parts[1]) if len(parts) > 1 else 0
        offset = timedelta(hours=sign * hours, minutes=sign * minutes)
        tz = timezone(offset)
        dt = dt.replace(tzinfo=tz)
    elif tz_str and "/" in tz_str:
        tz = pytz.timezone(tz_str)
        dt = tz.localize(dt)
    elif dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    return dt
def get_datetime(metadata: Dict) -> datetime:
    dt_str = metadata.get("datetime")
    if not dt_str:
        raise ValueError("Missing 'datetime' in metadata")

    dt = parser.parse(dt_str)

    tz_str = metadata.get("timezone")
    if tz_str and tz_str.startswith("UTC"):
        sign = 1 if "+" in tz_str else -1
        offset_str = tz_str[3:].replace("+", "").replace("-", "")
        parts = offset_str.split(":")
        hours = int(parts[0]) if parts[0] else 0
        minutes = int(parts[1]) if len(parts) > 1 else 0
        offset = timedelta(hours=sign * hours, minutes=sign * minutes)
        tz = timezone(offset)
        dt = dt.replace(tzinfo=tz)
    elif tz_str and "/" in tz_str:
        tz = pytz.timezone(tz_str)
        dt = tz.localize(dt)
    elif dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    return dt

In [9]:

Copied!

item_datetime = get_datetime(metadata)
item_datetime
item_datetime = get_datetime(metadata)
item_datetime

Out[9]:

datetime.datetime(2024, 8, 12, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))

Item bbox, geometry and CRS information

We compute the spatial footprint from the LAS header min and max coordinates. When a global transformation matrix is available it is applied before determining the bounding box. We also scan the LAS VLRs to recover the native CRS so we can report the footprint both in the acquisition CRS and in WGS84 for the Projection extension.

In [10]:

Copied!





from shapely.geometry import box, mapping
from pyproj import CRS, Transformer

def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
    header = metadata.get("header")
    vlrs = metadata.get("vlrs", [])
    native_crs = metadata.get("native_crs", None)

    if not header or "mins" not in header or "maxs" not in header:
        raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")

    xmin, ymin, zmin = header["mins"]
    xmax, ymax, zmax = header["maxs"]

    crs = None
    for vlr in vlrs:
        if vlr.get("user_id", "").lower() == "lasf_projection":
            if "wkt" in vlr and vlr["wkt"]:
                crs = CRS.from_wkt(vlr["wkt"])
                break
            elif "epsg_code" in vlr and vlr["epsg_code"]:
                crs = CRS.from_epsg(vlr["epsg_code"])
                break

    if not crs and native_crs:
        try:
            crs = CRS.from_user_input(native_crs)
        except Exception:
            crs = None

    bbox = [xmin, ymin, xmax, ymax]

    if crs and crs.to_epsg() != 4326:
        try:
            transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)

            min_lon, min_lat = transformer.transform(xmin, ymin)
            max_lon, max_lat = transformer.transform(xmax, ymax)

            # geom in WGS84
            bbox = [min_lon, min_lat, max_lon, max_lat]
            geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
        except Exception:
            pass 
    else:
        # geom in native CRS
        geom = mapping(box(xmin, ymin, xmax, ymax))

    return {
        "bbox": bbox,
        "geometry": geom,
        "native_crs": crs.to_string() if crs else None
    }
from shapely.geometry import box, mapping
from pyproj import CRS, Transformer

def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
    header = metadata.get("header")
    vlrs = metadata.get("vlrs", [])
    native_crs = metadata.get("native_crs", None)

    if not header or "mins" not in header or "maxs" not in header:
        raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")

    xmin, ymin, zmin = header["mins"]
    xmax, ymax, zmax = header["maxs"]

    crs = None
    for vlr in vlrs:
        if vlr.get("user_id", "").lower() == "lasf_projection":
            if "wkt" in vlr and vlr["wkt"]:
                crs = CRS.from_wkt(vlr["wkt"])
                break
            elif "epsg_code" in vlr and vlr["epsg_code"]:
                crs = CRS.from_epsg(vlr["epsg_code"])
                break

    if not crs and native_crs:
        try:
            crs = CRS.from_user_input(native_crs)
        except Exception:
            crs = None

    bbox = [xmin, ymin, xmax, ymax]

    if crs and crs.to_epsg() != 4326:
        try:
            transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)

            min_lon, min_lat = transformer.transform(xmin, ymin)
            max_lon, max_lat = transformer.transform(xmax, ymax)

            # geom in WGS84
            bbox = [min_lon, min_lat, max_lon, max_lat]
            geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
        except Exception:
            pass 
    else:
        # geom in native CRS
        geom = mapping(box(xmin, ymin, xmax, ymax))

    return {
        "bbox": bbox,
        "geometry": geom,
        "native_crs": crs.to_string() if crs else None
    }

In [11]:

Copied!

item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))
item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))

{
  "bbox": [
    11.304866670991222,
    47.52637673872775,
    11.316376321635966,
    47.53158992234481
  ],
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          11.316376321635966,
          47.52637673872775
        ],
        [
          11.316376321635966,
          47.53158992234481
        ],
        [
          11.304866670991222,
          47.53158992234481
        ],
        [
          11.304866670991222,
          47.52637673872775
        ],
        [
          11.316376321635966,
          47.52637673872775
        ]
      ]
    ]
  },
  "native_crs": "EPSG:25832"
}

Create the item¶

With the temporal and spatial metadata in hand we instantiate pystac.Item. At this point the Item only contains the required geometry, bounding box, and datetime; later cells enrich it with extensions, properties, and assets.

In [12]:

Copied!





item = pystac.Item(
    id=item_id,
    geometry=item_geo["geometry"],
    bbox=item_geo["bbox"],
    datetime=item_datetime.replace(tzinfo=None),
    properties={}
)
item = pystac.Item(
    id=item_id,
    geometry=item_geo["geometry"],
    bbox=item_geo["bbox"],
    datetime=item_datetime.replace(tzinfo=None),
    properties={}
)

Adding STAC Extensions¶

We register the projection, point cloud, and timestamps extensions and then use the Topo4D extension to store topographic 4D specific metadata such as timezone, acquisition mode, and transformation matrices. Mapping custom fields onto established STAC extensions keeps the Item interoperable with other tooling.

In [13]:

Copied!

topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")

Topo4D Extension added: True

In [14]:

Copied!





# Add standard STAC extensions first
item.stac_extensions.extend([
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
    "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])

# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))

# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'
# Add standard STAC extensions first
item.stac_extensions.extend([
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
    "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])

# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))

# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'

In [15]:

Copied!

item.properties
item.properties

Out[15]:

{'topo4d:timezone': 'UTC+1',
 'topo4d:data_type': 'pointcloud',
 'proj:code': 'EPSG:25832',
 'instruments': ['DJI Phantom 4 RTK'],
 'pc:count': 23314488,
 'pc:type': 'other'}

Adding assets¶

The make_item_asset helper wraps the LAS file path with the appropriate media type and asset metadata. Adding the asset under a descriptive key lets downstream clients discover and download the raw point cloud directly from the Item.

In [16]:

Copied!





asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

item.add_asset(
    key=asset_name,
    asset=item_asset
)

item
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

item.add_asset(
    key=asset_name,
    asset=item_asset
)

item

Out[16]:

type "Feature"
stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- 1 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 3 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
id "Isar_20240812_UPH_10cm"
geometry
- type "Polygon"
- coordinates[] 1 items
  - 0[] 5 items
    
    0[] 2 items
    
    0 11.316376321635966
    
    1 47.52637673872775
    
    1[] 2 items
    
    0 11.316376321635966
    
    1 47.53158992234481
    
    2[] 2 items
    
    0 11.304866670991222
    
    1 47.53158992234481
    
    3[] 2 items
    
    0 11.304866670991222
    
    1 47.52637673872775
    
    4[] 2 items
    
    0 11.316376321635966
    
    1 47.52637673872775
bbox[] 4 items
- 0 11.304866670991222
- 1 47.52637673872775
- 2 11.316376321635966
- 3 47.53158992234481
properties
- topo4d:timezone "UTC+1"
- topo4d:data_type "pointcloud"
- proj:code "EPSG:25832"
- instruments[] 1 items
  - 0 "DJI Phantom 4 RTK"
- pc:count 23314488
- pc:type "other"
- datetime "2024-08-12T00:00:00Z"
links[] 0 items
assets
- pointcloud
  - href "./Isar_20240812_UPH_10cm.copc.laz"
  - type "application/vnd.laszip+copc"
  - roles[] 1 items
    
    0 "data"

Validating the item¶

item.validate() performs schema checks across the Item and its extensions.

In [17]:

Copied!

item.validate() # Uncomment this line to validate the item
item.validate() # Uncomment this line to validate the item

Out[17]:

['https://schemas.stacspec.org/v1.1.0/item-spec/json-schema/item.json',
 'https://tum-rsa.github.io/topo4d/v0.1.0/schema.json',
 'https://stac-extensions.github.io/projection/v2.0.0/schema.json',
 'https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json',
 'https://stac-extensions.github.io/timestamps/v1.1.0/schema.json']

Create more items¶

After confirming the helpers work for a single file we scale up to the full dataset. The loop below merges in the shared user metadata for each LAS file, builds a list of Items, and assigns self HREFs so they can be written to disk later on.

In [18]:

Copied!





def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
    item_id = get_id(metadata)
    item_datetime = get_datetime(metadata)
    item_geo = get_geo(metadata)

    item = pystac.Item(
        id=item_id,
        geometry=item_geo["geometry"],
        bbox=item_geo["bbox"],
        datetime=item_datetime.replace(tzinfo=None),
        properties={}
    )

    # Add standard STAC extensions
    item.stac_extensions.extend([
        "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
        "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
        "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
    ])

    # Apply topo4d extension
    topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
    topo4d_ext.tz = metadata.get("timezone", "UTC")
    topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
    topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)  # Fixed typo: was acquisition_date
    topo4d_ext.orientation = metadata.get("orientation", None)
    topo4d_ext.spatial_resolution = metadata.get("spatial_resolution", None)
    topo4d_ext.measurement_error = metadata.get("measurement_error", None)

    # Use standard STAC extensions for formerly topo4d properties
    item.properties["proj:code"] = item_geo.get("native_crs", None)
    item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
    item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
    item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'

    # Add timestamps if available
    if metadata.get("header", {}).get("creation_date"):
        try:
            creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
            item.properties["timestamps:published"] = creation_date
        except:
            pass

    asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

    item.add_asset(
        key=asset_name,
        asset=item_asset
    )

    return item
def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
    item_id = get_id(metadata)
    item_datetime = get_datetime(metadata)
    item_geo = get_geo(metadata)

    item = pystac.Item(
        id=item_id,
        geometry=item_geo["geometry"],
        bbox=item_geo["bbox"],
        datetime=item_datetime.replace(tzinfo=None),
        properties={}
    )

    # Add standard STAC extensions
    item.stac_extensions.extend([
        "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
        "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
        "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
    ])

    # Apply topo4d extension
    topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
    topo4d_ext.tz = metadata.get("timezone", "UTC")
    topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
    topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)  # Fixed typo: was acquisition_date
    topo4d_ext.orientation = metadata.get("orientation", None)
    topo4d_ext.spatial_resolution = metadata.get("spatial_resolution", None)
    topo4d_ext.measurement_error = metadata.get("measurement_error", None)

    # Use standard STAC extensions for formerly topo4d properties
    item.properties["proj:code"] = item_geo.get("native_crs", None)
    item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
    item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
    item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' else 'other'

    # Add timestamps if available
    if metadata.get("header", {}).get("creation_date"):
        try:
            creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
            item.properties["timestamps:published"] = creation_date
        except:
            pass

    asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

    item.add_asset(
        key=asset_name,
        asset=item_asset
    )

    return item

In [19]:

Copied!





meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    if file_id not in meta_userinput:
        print(f"Skipping {file_id} as it is not in user input metadata")
        continue

    metadata = meta_las
    metadata.update(meta_userinput[file_id])

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    if file_id not in meta_userinput:
        print(f"Skipping {file_id} as it is not in user input metadata")
        continue

    metadata = meta_las
    metadata.update(meta_userinput[file_id])

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)

Processing files: 100%|██████████| 3/3 [00:00<00:00, 374.41it/s]

In [20]:

Copied!

item_list[0]
item_list[0]

Out[20]:

type "Feature"
stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 1 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- 3 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
id "Isar_20240812_UPH_10cm"
geometry
- type "Polygon"
- coordinates[] 1 items
  - 0[] 5 items
    
    0[] 2 items
    
    0 11.316376321635966
    
    1 47.52637673872775
    
    1[] 2 items
    
    0 11.316376321635966
    
    1 47.53158992234481
    
    2[] 2 items
    
    0 11.304866670991222
    
    1 47.53158992234481
    
    3[] 2 items
    
    0 11.304866670991222
    
    1 47.52637673872775
    
    4[] 2 items
    
    0 11.316376321635966
    
    1 47.52637673872775
bbox[] 4 items
- 0 11.304866670991222
- 1 47.52637673872775
- 2 11.316376321635966
- 3 47.53158992234481
properties
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "UPH"
- topo4d:orientation "Nadir"
- topo4d:spatial_resolution 0.1
- topo4d:measurement_error 0.04
- proj:code "EPSG:25832"
- instruments[] 1 items
  - 0 "DJI Phantom 4 RTK"
- pc:count 23314488
- pc:type "other"
- timestamps:published "2025-05-05T00:00:00Z"
- datetime "2024-08-12T00:00:00Z"
links[] 1 items
- 0
  - rel "self"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20240812_UPH_10cm.json"
  - type "application/json"
assets
- pointcloud
  - href "./Isar_20240812_UPH_10cm.copc.laz"
  - type "application/vnd.laszip+copc"
  - roles[] 1 items
    
    0 "data"

Adding TrafoMeta¶

Transformation matrices describe how each acquisition relates to a reference epoch. We load the per-file matrices, reference them back to the chosen epoch Item, and store them via the Topo4D TrafoMeta helper so the downstream processing chain can reproduce alignment steps.

In [21]:

Copied!





trafo_paths = [f for f in Path(data_dir).glob("trafo*.txt")]
ref_id = "Isar_20241105_UPH_10cm"

# find the match id in item_list
if ref_id in [item.id for item in item_list]:
    reference_epoch = [item for item in item_list if item.id == ref_id][0]

    reference_epoch_link = pystac.Link(
        rel=reference_epoch.id,
        target=reference_epoch.get_self_href(),
        title="Reference Epoch",
        media_type=pystac.MediaType.JSON)

    for idx, trafo_path in enumerate(trafo_paths):
        with open(trafo_path, "r") as f:
            # read trafo meta from txt file as array
            trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

        # TrafoMeta.create() now requires reference_epoch as first argument
        trafo_meta = TrafoMeta.create(
            reference_epoch=reference_epoch_link.to_dict(),  # Required parameter
            transformation=trafo_mat,
        )

        item = item_list[idx]
        Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta
else:
    print(f"Warning: Reference epoch '{ref_id}' not found in item list")
trafo_paths = [f for f in Path(data_dir).glob("trafo*.txt")]
ref_id = "Isar_20241105_UPH_10cm"

# find the match id in item_list
if ref_id in [item.id for item in item_list]:
    reference_epoch = [item for item in item_list if item.id == ref_id][0]

    reference_epoch_link = pystac.Link(
        rel=reference_epoch.id,
        target=reference_epoch.get_self_href(),
        title="Reference Epoch",
        media_type=pystac.MediaType.JSON)

    for idx, trafo_path in enumerate(trafo_paths):
        with open(trafo_path, "r") as f:
            # read trafo meta from txt file as array
            trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

        # TrafoMeta.create() now requires reference_epoch as first argument
        trafo_meta = TrafoMeta.create(
            reference_epoch=reference_epoch_link.to_dict(),  # Required parameter
            transformation=trafo_mat,
        )

        item = item_list[idx]
        Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta
else:
    print(f"Warning: Reference epoch '{ref_id}' not found in item list")

Adding ProductMeta¶

Product metadata captures how a derivative product was generated. When available we convert product attributes into the Topo4D ProductMeta structure and also propagate publication timestamps to the standard timestamps extension.

In [22]:

Copied!





for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param={"param": meta.get("param", None)},
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass
for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param={"param": meta.get("param", None)},
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass

Create the Collection¶

A Collection summarises a group of related Items. The cells that follow fill out each required field so you can tailor the description, license, and providers to match your organisation's publishing standards.

Collection id

Choose a short, unique identifier that reflects the site and acquisition technique.

In [23]:

Copied!

collection_id = f"Isar_pointclouds_collection"
collection_id
collection_id = f"Isar_pointclouds_collection"
collection_id

Out[23]:

'Isar_pointclouds_collection'

Collection title

Provide a human friendly title that will appear in catalog browsers and search results.

In [24]:

Copied!

collection_title = "Isar Point Clouds Collection"
collection_title
collection_title = "Isar Point Clouds Collection"
collection_title

Out[24]:

'Isar Point Clouds Collection'

Collection description

Describe what the dataset contains, the sensors involved, and any processing highlights. Markdown formatting is supported.

In [25]:

Copied!

collection_desc = f'''### {collection_title}

A collection of point clouds from Isar riverbank, near Wallgau, Germany.
'''
print(collection_desc)
collection_desc = f'''### {collection_title}

A collection of point clouds from Isar riverbank, near Wallgau, Germany.
'''
print(collection_desc)

### Isar Point Clouds Collection

A collection of point clouds from Isar riverbank, near Wallgau, Germany.

Collection license

Declare the usage rights for the data. You can reference a standard SPDX identifier or supply a custom license string.

In [26]:

Copied!

collection_license = "CC-BY-4.0"
collection_license
collection_license = "CC-BY-4.0"
collection_license

Out[26]:

'CC-BY-4.0'

Collection provider

List the organisations responsible for producing, hosting, or licensing the dataset. Add as many providers as needed and assign the appropriate roles.

In [27]:

Copied!





collection_providers = [
    pystac.Provider(
        name="TUM Remote Sensing Applications",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://www.asg.ed.tum.de/rsa/startseite/",
    ),
]
collection_providers = [
    pystac.Provider(
        name="TUM Remote Sensing Applications",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://www.asg.ed.tum.de/rsa/startseite/",
    ),
]

Collection extent

The extent captures both spatial and temporal coverage. We seed it with broad bounds here and later tighten it using the Items so search clients receive accurate metadata.

In [28]:

Copied!

spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)
spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)

In [29]:

Copied!





collection_pc = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)
collection_pc = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)

In [30]:

Copied!

collection_pc.to_dict()
collection_pc.to_dict()

Out[30]:

{'type': 'Collection',
 'id': 'Isar_pointclouds_collection',
 'stac_version': '1.1.0',
 'description': '### Isar Point Clouds Collection\n\nA collection of point clouds from Isar riverbank, near Wallgau, Germany.\n',
 'links': [],
 'title': 'Isar Point Clouds Collection',
 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]},
  'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}},
 'license': 'CC-BY-4.0',
 'providers': [{'name': 'TUM Remote Sensing Applications',
   'roles': ['processor', 'producer', 'licensor', 'host'],
   'url': 'https://www.asg.ed.tum.de/rsa/startseite/'}]}

In [31]:

Copied!

# add items to the collection
for item in item_list:
    collection_pc.add_item(item)
# add items to the collection
for item in item_list:
    collection_pc.add_item(item)

Collection summaries

Summaries expose quick statistics about the Collection (counts, ranges, enumerations) and help STAC clients reason about the dataset without loading every Item.

In [32]:

Copied!





# add collection summaries
collection_pc.summaries.add("num_items", {"count": len(item_list)})
collection_pc.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_pc.summaries.add("temporal_resolution", {"resolution": "half-annual"})
# add collection summaries
collection_pc.summaries.add("num_items", {"count": len(item_list)})
collection_pc.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_pc.summaries.add("temporal_resolution", {"resolution": "half-annual"})

In [33]:

Copied!

collection_pc.update_extent_from_items()
collection_pc.extent.to_dict()
collection_pc
collection_pc.update_extent_from_items()
collection_pc.extent.to_dict()
collection_pc

Out[33]:

type "Collection"
id "Isar_pointclouds_collection"
stac_version "1.1.0"
description "### Isar Point Clouds Collection A collection of point clouds from Isar riverbank, near Wallgau, Germany. "
links[] 3 items
- 0
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20240812_UPH_10cm.json"
  - type "application/geo+json"
- 1
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20241105_UPH_10cm.json"
  - type "application/geo+json"
- 2
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_pointclouds/Isar_20250325_ULS_10cm.json"
  - type "application/geo+json"
title "Isar Point Clouds Collection"
extent
- spatial
  - bbox[] 1 items
    
    0[] 4 items
    
    0 11.304168016412058
    
    1 47.52600520593624
    
    2 11.3172511501222
    
    3 47.53281392856261
- temporal
  - interval[] 1 items
    
    0[] 2 items
    
    0 "2024-08-12T00:00:00Z"
    
    1 "2025-03-25T00:00:00Z"
license "CC-BY-4.0"
providers[] 1 items
- 0
  - name "TUM Remote Sensing Applications"
  - roles[] 4 items
    
    0 "processor"
    
    1 "producer"
    
    2 "licensor"
    
    3 "host"
  - url "https://www.asg.ed.tum.de/rsa/startseite/"
summaries
- num_items
  - count 3
- timestamp_list
  - list[] 3 items
    
    0 "2024-08-12T00:00:00"
    
    1 "2024-11-05T00:00:00"
    
    2 "2025-03-25T00:00:00"
- temporal_resolution
  - resolution "half-annual"

Create more collections¶

If the site has multiple sensor or other groupings, repeat the workflow to produce additional Collections. This section shows how to combine Items into thematic buckets (for example, M3C2 products).

In [34]:

Copied!





data_dir = "../../demo/Isar/data/Isar_m3c2"
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)

file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
data_dir = "../../demo/Isar/data/Isar_m3c2"
file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)

file_paths = [f for f in Path(data_dir).glob("*") if f.suffix.lower() in [".las", ".laz"]]

meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]

Processing ..\..\demo\Isar\data\Isar_m3c2\Isar_m3c2_20240812_20241105_10cm.copc.laz
Processing ..\..\demo\Isar\data\Isar_m3c2\Isar_m3c2_20240812_20250325_10cm.copc.laz

In [35]:

Copied!





meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    if file_id not in meta_userinput:
        print(f"Skipping {file_id} as it is not in user input metadata")
        continue

    metadata = meta_las
    metadata.update(meta_userinput[file_id])

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)
meta_userinput = pd.read_csv(os.path.join(data_dir, "auxilary.csv"), index_col="id").to_dict(orient="index")

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    if file_id not in meta_userinput:
        print(f"Skipping {file_id} as it is not in user input metadata")
        continue

    metadata = meta_las
    metadata.update(meta_userinput[file_id])

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)

Processing files: 100%|██████████| 2/2 [00:00<00:00, 303.85it/s]

In [36]:

Copied!

item_list[0].get_self_href()
item_list[0].get_self_href()

Out[36]:

'c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20241105_10cm.json'

In [37]:

Copied!





for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param={
                "cyl_radius": meta.get("cyl_radius", None),
                "reference_epoch": meta.get("reference_epoch", None),
                "compared_epoch": meta.get("compared_epoch", None)
                   },
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass
for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param={
                "cyl_radius": meta.get("cyl_radius", None),
                "reference_epoch": meta.get("reference_epoch", None),
                "compared_epoch": meta.get("compared_epoch", None)
                   },
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass

In [38]:

Copied!





collection_id = f"Isar_m3c2_collection"
collection_title = "Isar M3C2 Products Collection"
collection_desc = f'''### {collection_title}

A collection of M3C2 products from Isar riverbank, near Wallgau, Germany.
'''
collection_license = "CC-BY-4.0"
collection_providers = [
    pystac.Provider(
        name="TUM Remote Sensing Applications",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://www.asg.ed.tum.de/rsa/startseite/",
    ),
]

spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)

collection_m3c2 = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)
collection_m3c2.to_dict()
collection_id = f"Isar_m3c2_collection"
collection_title = "Isar M3C2 Products Collection"
collection_desc = f'''### {collection_title}

A collection of M3C2 products from Isar riverbank, near Wallgau, Germany.
'''
collection_license = "CC-BY-4.0"
collection_providers = [
    pystac.Provider(
        name="TUM Remote Sensing Applications",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://www.asg.ed.tum.de/rsa/startseite/",
    ),
]

spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)

collection_m3c2 = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)
collection_m3c2.to_dict()

Out[38]:

{'type': 'Collection',
 'id': 'Isar_m3c2_collection',
 'stac_version': '1.1.0',
 'description': '### Isar M3C2 Products Collection\n\nA collection of M3C2 products from Isar riverbank, near Wallgau, Germany.\n',
 'links': [],
 'title': 'Isar M3C2 Products Collection',
 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]},
  'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}},
 'license': 'CC-BY-4.0',
 'providers': [{'name': 'TUM Remote Sensing Applications',
   'roles': ['processor', 'producer', 'licensor', 'host'],
   'url': 'https://www.asg.ed.tum.de/rsa/startseite/'}]}

In [39]:

Copied!





# add items to the collection
for item in item_list:
    collection_m3c2.add_item(item)

# add collection summaries
collection_m3c2.summaries.add("num_items", {"count": len(item_list)})
collection_m3c2.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_m3c2.summaries.add("temporal_resolution", {"resolution": "half-annual"})

collection_m3c2.update_extent_from_items()
collection_m3c2.extent.to_dict()
collection_m3c2
# add items to the collection
for item in item_list:
    collection_m3c2.add_item(item)

# add collection summaries
collection_m3c2.summaries.add("num_items", {"count": len(item_list)})
collection_m3c2.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection_m3c2.summaries.add("temporal_resolution", {"resolution": "half-annual"})

collection_m3c2.update_extent_from_items()
collection_m3c2.extent.to_dict()
collection_m3c2

Out[39]:

type "Collection"
id "Isar_m3c2_collection"
stac_version "1.1.0"
description "### Isar M3C2 Products Collection A collection of M3C2 products from Isar riverbank, near Wallgau, Germany. "
links[] 2 items
- 0
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20241105_10cm.json"
  - type "application/geo+json"
- 1
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/Isar/data/Isar_m3c2/Isar_m3c2_20240812_20250325_10cm.json"
  - type "application/geo+json"
title "Isar M3C2 Products Collection"
extent
- spatial
  - bbox[] 1 items
    
    0[] 4 items
    
    0 11.304866670991222
    
    1 47.52637673872775
    
    2 11.31637632165577
    
    3 47.53158992279433
- temporal
  - interval[] 1 items
    
    0[] 2 items
    
    0 "2024-11-05T00:00:00Z"
    
    1 "2025-03-25T00:00:00Z"
license "CC-BY-4.0"
providers[] 1 items
- 0
  - name "TUM Remote Sensing Applications"
  - roles[] 4 items
    
    0 "processor"
    
    1 "producer"
    
    2 "licensor"
    
    3 "host"
  - url "https://www.asg.ed.tum.de/rsa/startseite/"
summaries
- num_items
  - count 2
- timestamp_list
  - list[] 2 items
    
    0 "2024-11-05T00:00:00"
    
    1 "2025-03-25T00:00:00"
- temporal_resolution
  - resolution "half-annual"

Create the Catalog¶

Finally we wrap the Collection in a top-level Catalog so the structure is portable. Catalogs can contain multiple Collections.

In [40]:

Copied!





catalog_id = "Isar-catalog"
catalog_desc = "This catalog contains multi-temporal point cloud data and M3C2 products from Isar river near Wallgau, Germany."
catalog = pystac.Catalog(
    id=catalog_id,
    description=catalog_desc
)
catalog_id = "Isar-catalog"
catalog_desc = "This catalog contains multi-temporal point cloud data and M3C2 products from Isar river near Wallgau, Germany."
catalog = pystac.Catalog(
    id=catalog_id,
    description=catalog_desc
)

In [41]:

Copied!

catalog.add_child(collection_pc)
catalog.add_child(collection_m3c2)
catalog.describe()
catalog.add_child(collection_pc)
catalog.add_child(collection_m3c2)
catalog.describe()

* <Catalog id=Isar-catalog>
    * <Collection id=Isar_pointclouds_collection>
      * <Item id=Isar_20240812_UPH_10cm>
      * <Item id=Isar_20241105_UPH_10cm>
      * <Item id=Isar_20250325_ULS_10cm>
    * <Collection id=Isar_m3c2_collection>
      * <Item id=Isar_m3c2_20240812_20241105_10cm>
      * <Item id=Isar_m3c2_20240812_20250325_10cm>

Save the STAC¶

Normalising HREFs writes a clean directory layout to disk. Saving as SELF_CONTAINED ensures every JSON document references relatives paths so moving the catalog elsewhere keeps the links intact.

In [42]:

Copied!





from pathlib import Path

root_path = str(Path(f"{data_root}"))
print(f"Root path for catalog: {root_path}")

from pystac.layout import TemplateLayoutStrategy

# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
    item_template="${id}.json"
)

catalog.normalize_hrefs(root_path, strategy=strategy)
from pathlib import Path

root_path = str(Path(f"{data_root}"))
print(f"Root path for catalog: {root_path}")

from pystac.layout import TemplateLayoutStrategy

# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
    item_template="${id}.json"
)

catalog.normalize_hrefs(root_path, strategy=strategy)

Root path for catalog: ..\..\demo\Isar\data

In [43]:

Copied!

catalog.validate_all() # Uncomment when the Schema url is public
catalog.validate_all() # Uncomment when the Schema url is public

Out[43]:

In [44]:

Copied!

catalog.save(pystac.CatalogType.SELF_CONTAINED)
catalog.save(pystac.CatalogType.SELF_CONTAINED)

In [45]:

Copied!

catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()
catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()

* <Catalog id=Isar-catalog>
    * <Collection id=Isar_pointclouds_collection>
      * <Item id=Isar_20240812_UPH_10cm>
      * <Item id=Isar_20241105_UPH_10cm>
      * <Item id=Isar_20250325_ULS_10cm>
    * <Collection id=Isar_m3c2_collection>
      * <Item id=Isar_m3c2_20240812_20241105_10cm>
      * <Item id=Isar_m3c2_20240812_20250325_10cm>