Creating a STAC of Near-Continuous 4D Point Clouds¶

Welcome! This notebook walks through building a STAC catalog for high-frequency terrestrial laser scanning (TLS) data captured at Kijkduin beach. We convert raw las/laz files into well structured STAC Items using the topo4d STAC extension together with core STAC features.

You will learn how to

inspect the source files and user supplied metadata
build individual STAC Items with topo4d, projection, point cloud and timestamps extensions
enrich items with domain specific Topo4D metadata such as transformation matrices and product information
assemble the items into a Collection and publish a local, self contained Catalog

Prerequisites

make sure the demo data under demo/kijkduin is available locally
pystac and other required libraries installed in the active environment
topo4d helper script locally stored and accessible
basic familiarity with Python and STAC terminology

The notebook highlights each step, explains why it is needed, and points to the relevant STAC fields so you can adapt the flow to your own projects.

Reference

Last update: 19.09.2025

Import libraries¶

We import required libraries alongside Topo4D STAC helper classes from topo4d. The sys.path.append call ensures the repository root is available so the local topo4d package can be imported while the notebook runs from docs/notebooks.

In [1]:

Copied!





import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
import pytz
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any

import sys
import os
sys.path.append(os.path.abspath('../..'))

import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset
import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
import pytz
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any

import sys
import os
sys.path.append(os.path.abspath('../..'))

import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset

Define dirs and paths¶

The demo files are stored in demo/kijkduin. Separating data_dir and aux_dir keeps the raw LAS files and auxiliary information tidy. Adapt these variables at your own dataset before executing the rest of the notebook.

In [2]:

Copied!

data_dir = "../../demo/kijkduin/data"
aux_dir = "../../demo/kijkduin/auxilary"
data_dir = "../../demo/kijkduin/data"
aux_dir = "../../demo/kijkduin/auxilary"

Extract metadata from las/laz file¶

extract_metadata_from_las parses the LAS headers, point counts, and VLR blocks for every LAS/LAZ file we find. We list all candidate files first, then iterate through them so the helper can persist metadata snapshots alongside the raw data. Having structured metadata up front makes the later STAC construction straightforward. Adapt these block based on your own dataset.

In [3]:

Copied!





file_dirs = os.listdir(data_dir)
file_paths = []

for f_dir in file_dirs:
    f_paths = Path(data_dir).glob(f"{f_dir}/*")
    for f_path in f_paths:
        if f_path.suffix.lower() in [".las", ".laz"]:
            file_paths.append(f_path)

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)
file_dirs = os.listdir(data_dir)
file_paths = []

for f_dir in file_dirs:
    f_paths = Path(data_dir).glob(f"{f_dir}/*")
    for f_path in f_paths:
        if f_path.suffix.lower() in [".las", ".laz"]:
            file_paths.append(f_path)

for file_path in file_paths:
    print(f"Processing {file_path}")
    meta_las = extract_metadata_from_las(file_path, if_save=True)

Processing ..\..\demo\kijkduin\data\161111\161111_200058.laz
Processing ..\..\demo\kijkduin\data\161111\161111_210058.laz
Processing ..\..\demo\kijkduin\data\161111\161111_220103.laz
Processing ..\..\demo\kijkduin\data\161111\161111_230104.laz
Processing ..\..\demo\kijkduin\data\161112\161112_000104.laz
Processing ..\..\demo\kijkduin\data\161112\161112_010104.laz
Processing ..\..\demo\kijkduin\data\161112\161112_020106.laz
Processing ..\..\demo\kijkduin\data\161112\161112_030107.laz
Processing ..\..\demo\kijkduin\data\161112\161112_040108.laz
Processing ..\..\demo\kijkduin\data\161112\161112_050109.laz
Processing ..\..\demo\kijkduin\data\161112\161112_060112.laz
Processing ..\..\demo\kijkduin\data\161112\161112_070110.laz
Processing ..\..\demo\kijkduin\data\161112\161112_080111.laz
Processing ..\..\demo\kijkduin\data\161112\161112_090111.laz
Processing ..\..\demo\kijkduin\data\161112\161112_100112.laz
Processing ..\..\demo\kijkduin\data\161112\161112_110113.laz
Processing ..\..\demo\kijkduin\data\161112\161112_120115.laz
Processing ..\..\demo\kijkduin\data\161112\161112_130115.laz
Processing ..\..\demo\kijkduin\data\161112\161112_140116.laz
Processing ..\..\demo\kijkduin\data\161112\161112_150118.laz
Processing ..\..\demo\kijkduin\data\161112\161112_160118.laz
Processing ..\..\demo\kijkduin\data\161112\161112_170118.laz
Processing ..\..\demo\kijkduin\data\161112\161112_180119.laz
Processing ..\..\demo\kijkduin\data\161112\161112_190130.laz
Processing ..\..\demo\kijkduin\data\161112\161112_200120.laz
Processing ..\..\demo\kijkduin\data\161112\161112_210121.laz
Processing ..\..\demo\kijkduin\data\161112\161112_220123.laz
Processing ..\..\demo\kijkduin\data\161112\161112_230123.laz

In [4]:

Copied!

# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]

Ingest metadata¶

Besides the automatically extracted values we add domain knowledge that cannot be inferred from the file alone: sensor model, acquisition mode, timezone, transformation matrix, and so on. The resulting dictionary merges machine derived and user supplied fields and becomes the single source of truth when we instantiate the STAC Item.

In [5]:

Copied!





global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
    # read trafo meta from txt file as array
    global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

meta_userinput = {
    "id": file_id,
    "data_type": DataType.POINTCLOUD,
    "native_crs": "EPSG:7415",
    "sensor": "RIEGL VZ2000",
    "acquisition_mode": "TLS",
    "datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
    "timezone": "Europe/Amsterdam",
    "asset_url": file_paths[0].as_posix(),
    "global_trafo": global_trafo_mat 
}

# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput)

print(json.dumps(metadata, indent=2))
global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
    # read trafo meta from txt file as array
    global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

meta_userinput = {
    "id": file_id,
    "data_type": DataType.POINTCLOUD,
    "native_crs": "EPSG:7415",
    "sensor": "RIEGL VZ2000",
    "acquisition_mode": "TLS",
    "datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
    "timezone": "Europe/Amsterdam",
    "asset_url": file_paths[0].as_posix(),
    "global_trafo": global_trafo_mat 
}

# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput)

print(json.dumps(metadata, indent=2))

{
  "id": "161111_200058",
  "header": {
    "DEFAULT_POINT_FORMAT": "<PointFormat(3, 0 bytes of extra dims)>",
    "DEFAULT_VERSION": [
      1,
      2
    ],
    "are_points_compressed": true,
    "creation_date": "2021-04-01",
    "evlrs": [],
    "extra_header_bytes": null,
    "extra_vlr_bytes": null,
    "file_source_id": 0,
    "generating_software": "txt2las (version 171231)",
    "global_encoding": "<laspy.header.GlobalEncoding object at 0x00000271AAF1E930>",
    "major_version": 1,
    "maxs": [
      162.585,
      1304.714,
      95.185
    ],
    "minor_version": 4,
    "mins": [
      -627.258,
      -1236.375,
      -42.409
    ],
    "number_of_evlrs": 0,
    "number_of_points_by_return": [
      1123106,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0
    ],
    "offset_to_point_data": 2065,
    "offsets": [
      0.0,
      0.0,
      0.0
    ],
    "point_count": 1123106,
    "point_format": "<PointFormat(0, 27 bytes of extra dims)>",
    "scales": [
      0.001,
      0.001,
      0.001
    ],
    "start_of_first_evlr": 0,
    "start_of_waveform_data_packet_record": 0,
    "system_identifier": "LAStools (c) by rapidlasso GmbH",
    "uuid": "00000000-0000-0000-0000-000000000000",
    "version": [
      1,
      4
    ],
    "vlrs": "[<ExtraBytesVlr(extra bytes structs: 8)>, <laspy.vlrs.known.LasZipVlr object at 0x00000271AAF75BE0>]",
    "x_max": 162.585,
    "x_min": -627.258,
    "x_offset": 0.0,
    "x_scale": 0.001,
    "y_max": 1304.714,
    "y_min": -1236.375,
    "y_offset": 0.0,
    "y_scale": 0.001,
    "z_max": 95.185,
    "z_min": -42.409,
    "z_offset": 0.0,
    "z_scale": 0.001
  },
  "vlrs": [
    {
      "user_id": "LASF_Spec",
      "record_id": 4,
      "description": "by LAStools of rapidlasso GmbH",
      "extra_bytes_structs": "[<ExtraBytesStruct(range, 6, b'')>, <ExtraBytesStruct(theta, 6, b'')>, <ExtraBytesStruct(phi, 6, b'')>, <ExtraBytesStruct(reflectance, 6, b'')>, <ExtraBytesStruct(amplitude, 6, b'')>, <ExtraBytesStruct(deviation, 4, b'')>, <ExtraBytesStruct(echo, 1, b'')>, <ExtraBytesStruct(time, 6, b'')>]"
    },
    {
      "user_id": "laszip encoded",
      "record_id": 22204,
      "description": "http://laszip.org"
    }
  ],
  "evlrs": [],
  "data_type": "pointcloud",
  "native_crs": "EPSG:7415",
  "sensor": "RIEGL VZ2000",
  "acquisition_mode": "TLS",
  "datetime": "2016-11-11T20:00:58",
  "timezone": "Europe/Amsterdam",
  "asset_url": "../../demo/kijkduin/data/161111/161111_200058.laz",
  "global_trafo": [
    [
      0.708647477,
      0.705554059,
      0.003496133,
      75124.664999332
    ],
    [
      -0.705561978,
      0.708645222,
      0.002060151,
      454173.49757747503
    ],
    [
      -0.00102397,
      -0.003926659,
      0.999991766,
      37.853088875
    ],
    [
      0.0,
      0.0,
      0.0,
      1.0
    ]
  ]
}

Create a STAC Item from metadata¶

To keep the workflow readable and testable we split the Item construction into small helper functions. Each of the next subsections covers one aspect of STAC metadata so you can reuse the logic in scripts or pipelines.

Item id

Every STAC Item needs a stable identifier. Here we derive it from the filename: each LAS is named after the acquisition timestamp, so the ID doubles as a human readable capture time and remains unique across the dataset.

In [6]:

Copied!

def get_id(metadata: Dict[str, Any]) -> str:
    return metadata.get("id")
def get_id(metadata: Dict[str, Any]) -> str:
    return metadata.get("id")

In [7]:

Copied!

item_id = get_id(metadata)
item_id
item_id = get_id(metadata)
item_id

Out[7]:

'161111_200058'

Item datetime

This helper normalises the acquisition timestamp. It parses the string, attaches the correct timezone (either a named timezone or an explicit UTC offset), and returns a timezone aware datetime. STAC Items expect naive UTC values, so we strip the tzinfo after performing the conversion.

In [8]:

Copied!





def get_datetime(metadata: Dict) -> datetime:
    dt_str = metadata.get("datetime")
    if not dt_str:
        raise ValueError("Missing 'datetime' in metadata")

    dt = parser.parse(dt_str)

    tz_str = metadata.get("timezone")
    if tz_str and tz_str.startswith("UTC"):
        sign = 1 if "+" in tz_str else -1
        offset_str = tz_str[3:].replace("+", "").replace("-", "")
        parts = offset_str.split(":")
        hours = int(parts[0]) if parts[0] else 0
        minutes = int(parts[1]) if len(parts) > 1 else 0
        offset = timedelta(hours=sign * hours, minutes=sign * minutes)
        tz = timezone(offset)
        dt = dt.replace(tzinfo=tz)
    elif tz_str and "/" in tz_str:
        tz = pytz.timezone(tz_str)
        dt = tz.localize(dt)
    elif dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    return dt
def get_datetime(metadata: Dict) -> datetime:
    dt_str = metadata.get("datetime")
    if not dt_str:
        raise ValueError("Missing 'datetime' in metadata")

    dt = parser.parse(dt_str)

    tz_str = metadata.get("timezone")
    if tz_str and tz_str.startswith("UTC"):
        sign = 1 if "+" in tz_str else -1
        offset_str = tz_str[3:].replace("+", "").replace("-", "")
        parts = offset_str.split(":")
        hours = int(parts[0]) if parts[0] else 0
        minutes = int(parts[1]) if len(parts) > 1 else 0
        offset = timedelta(hours=sign * hours, minutes=sign * minutes)
        tz = timezone(offset)
        dt = dt.replace(tzinfo=tz)
    elif tz_str and "/" in tz_str:
        tz = pytz.timezone(tz_str)
        dt = tz.localize(dt)
    elif dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    return dt

In [9]:

Copied!

item_datetime = get_datetime(metadata)
item_datetime
item_datetime = get_datetime(metadata)
item_datetime

Out[9]:

datetime.datetime(2016, 11, 11, 20, 0, 58, tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)

Item bbox, geometry and CRS information

We compute the spatial footprint from the LAS header min and max coordinates. When a global transformation matrix is available it is applied before determining the bounding box. We also scan the LAS VLRs to recover the native CRS so we can report the footprint both in the acquisition CRS and in WGS84 for the Projection extension.

In [10]:

Copied!





from shapely.geometry import box, mapping
from pyproj import CRS, Transformer

def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
    header = metadata.get("header")
    vlrs = metadata.get("vlrs", [])
    native_crs = metadata.get("native_crs", None)
    global_trafo = metadata.get("global_trafo", None)

    if not header or "mins" not in header or "maxs" not in header:
        raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")

    xmin, ymin, zmin = header["mins"]
    xmax, ymax, zmax = header["maxs"]

    if global_trafo:
        # Apply global transformation matrix to the bounding box
        trafo_matrix = np.array(global_trafo)
        bbox_corners = np.array([[xmin, ymin, zmin, 1],
                                 [xmax, ymin, zmin, 1],
                                 [xmin, ymax, zmin, 1],
                                 [xmax, ymax, zmin, 1]])

        transformed_corners = bbox_corners @ trafo_matrix.T
        xmin, ymin, zmin = transformed_corners.min(axis=0)[:3]
        xmax, ymax, zmax = transformed_corners.max(axis=0)[:3]

    crs = None
    for vlr in vlrs:
        if vlr.get("user_id", "").lower() == "lasf_projection":
            if "wkt" in vlr and vlr["wkt"]:
                crs = CRS.from_wkt(vlr["wkt"])
                break
            elif "epsg_code" in vlr and vlr["epsg_code"]:
                crs = CRS.from_epsg(vlr["epsg_code"])
                break

    if not crs and native_crs:
        try:
            crs = CRS.from_user_input(native_crs)
        except Exception:
            crs = None

    bbox = [xmin, ymin, xmax, ymax]

    if crs and crs.to_epsg() != 4326:
        try:
            transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)

            min_lon, min_lat = transformer.transform(xmin, ymin)
            max_lon, max_lat = transformer.transform(xmax, ymax)

            # geom in WGS84
            bbox = [min_lon, min_lat, max_lon, max_lat]
            geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
        except Exception:
            pass 
    else:
        # geom in native CRS
        geom = mapping(box(xmin, ymin, xmax, ymax))

    return {
        "bbox": bbox,
        "geometry": geom,
        "native_crs": crs.to_string() if crs else None
    }
from shapely.geometry import box, mapping
from pyproj import CRS, Transformer

def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
    header = metadata.get("header")
    vlrs = metadata.get("vlrs", [])
    native_crs = metadata.get("native_crs", None)
    global_trafo = metadata.get("global_trafo", None)

    if not header or "mins" not in header or "maxs" not in header:
        raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")

    xmin, ymin, zmin = header["mins"]
    xmax, ymax, zmax = header["maxs"]

    if global_trafo:
        # Apply global transformation matrix to the bounding box
        trafo_matrix = np.array(global_trafo)
        bbox_corners = np.array([[xmin, ymin, zmin, 1],
                                 [xmax, ymin, zmin, 1],
                                 [xmin, ymax, zmin, 1],
                                 [xmax, ymax, zmin, 1]])

        transformed_corners = bbox_corners @ trafo_matrix.T
        xmin, ymin, zmin = transformed_corners.min(axis=0)[:3]
        xmax, ymax, zmax = transformed_corners.max(axis=0)[:3]

    crs = None
    for vlr in vlrs:
        if vlr.get("user_id", "").lower() == "lasf_projection":
            if "wkt" in vlr and vlr["wkt"]:
                crs = CRS.from_wkt(vlr["wkt"])
                break
            elif "epsg_code" in vlr and vlr["epsg_code"]:
                crs = CRS.from_epsg(vlr["epsg_code"])
                break

    if not crs and native_crs:
        try:
            crs = CRS.from_user_input(native_crs)
        except Exception:
            crs = None

    bbox = [xmin, ymin, xmax, ymax]

    if crs and crs.to_epsg() != 4326:
        try:
            transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)

            min_lon, min_lat = transformer.transform(xmin, ymin)
            max_lon, max_lat = transformer.transform(xmax, ymax)

            # geom in WGS84
            bbox = [min_lon, min_lat, max_lon, max_lat]
            geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
        except Exception:
            pass 
    else:
        # geom in native CRS
        geom = mapping(box(xmin, ymin, xmax, ymax))

    return {
        "bbox": bbox,
        "geometry": geom,
        "native_crs": crs.to_string() if crs else None
    }

In [11]:

Copied!

item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))
item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))

{
  "bbox": [
    4.2032353058274445,
    52.060984149209446,
    4.236990109111119,
    52.082514395889724
  ],
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          4.236990109111119,
          52.060984149209446
        ],
        [
          4.236990109111119,
          52.082514395889724
        ],
        [
          4.2032353058274445,
          52.082514395889724
        ],
        [
          4.2032353058274445,
          52.060984149209446
        ],
        [
          4.236990109111119,
          52.060984149209446
        ]
      ]
    ]
  },
  "native_crs": "EPSG:7415"
}

Create the item

With the temporal and spatial metadata in hand we instantiate pystac.Item. At this point the Item only contains the required geometry, bounding box, and datetime; later cells enrich it with extensions, properties, and assets.

In [12]:

Copied!





item = pystac.Item(
    id=item_id,
    geometry=item_geo["geometry"],
    bbox=item_geo["bbox"],
    datetime=item_datetime.replace(tzinfo=None),
    properties={}
)
item = pystac.Item(
    id=item_id,
    geometry=item_geo["geometry"],
    bbox=item_geo["bbox"],
    datetime=item_datetime.replace(tzinfo=None),
    properties={}
)

Adding STAC Extensions

We register the projection, point cloud, and timestamps extensions and then use the Topo4D extension to store TLS specific metadata such as timezone, acquisition mode, and transformation matrices. Mapping custom fields onto established STAC extensions keeps the Item interoperable with other tooling.

In [13]:

Copied!

topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")

Topo4D Extension added: True

In [14]:

Copied!





# Add standard STAC extensions
item.stac_extensions.extend([
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
    "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])

# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)
topo4d_ext.global_trafo = metadata.get("global_trafo", None)

# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' or 'TLS' else 'other'
# Add standard STAC extensions
item.stac_extensions.extend([
    "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
    "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
    "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])

# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)
topo4d_ext.global_trafo = metadata.get("global_trafo", None)

# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' or 'TLS' else 'other'

In [15]:

Copied!

item.properties
item.properties

Out[15]:

{'topo4d:timezone': 'Europe/Amsterdam',
 'topo4d:data_type': 'pointcloud',
 'topo4d:acquisition_mode': 'TLS',
 'topo4d:global_trafo': [[0.708647477,
   0.705554059,
   0.003496133,
   75124.664999332],
  [-0.705561978, 0.708645222, 0.002060151, 454173.49757747503],
  [-0.00102397, -0.003926659, 0.999991766, 37.853088875],
  [0.0, 0.0, 0.0, 1.0]],
 'proj:code': 'EPSG:7415',
 'instruments': ['RIEGL VZ2000'],
 'pc:count': 1123106,
 'pc:type': 'lidar'}

Adding assets¶

The make_item_asset helper wraps the LAS file path with the appropriate media type and asset metadata. Adding the asset under a descriptive key lets downstream clients discover and download the raw point cloud directly from the Item.

In [16]:

Copied!





asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

item.add_asset(
    key=asset_name,
    asset=item_asset
)

item
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

item.add_asset(
    key=asset_name,
    asset=item_asset
)

item

Out[16]:

type "Feature"
stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- 1 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 3 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
id "161111_200058"
geometry
- type "Polygon"
- coordinates[] 1 items
  - 0[] 5 items
    
    0[] 2 items
    
    0 4.236990109111119
    
    1 52.060984149209446
    
    1[] 2 items
    
    0 4.236990109111119
    
    1 52.082514395889724
    
    2[] 2 items
    
    0 4.2032353058274445
    
    1 52.082514395889724
    
    3[] 2 items
    
    0 4.2032353058274445
    
    1 52.060984149209446
    
    4[] 2 items
    
    0 4.236990109111119
    
    1 52.060984149209446
bbox[] 4 items
- 0 4.2032353058274445
- 1 52.060984149209446
- 2 4.236990109111119
- 3 52.082514395889724
properties
- topo4d:timezone "Europe/Amsterdam"
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "TLS"
- topo4d:global_trafo[] 4 items
  - 0[] 4 items
    
    0 0.708647477
    
    1 0.705554059
    
    2 0.003496133
    
    3 75124.664999332
  - 1[] 4 items
    
    0 -0.705561978
    
    1 0.708645222
    
    2 0.002060151
    
    3 454173.49757747503
  - 2[] 4 items
    
    0 -0.00102397
    
    1 -0.003926659
    
    2 0.999991766
    
    3 37.853088875
  - 3[] 4 items
    
    0 0.0
    
    1 0.0
    
    2 0.0
    
    3 1.0
- proj:code "EPSG:7415"
- instruments[] 1 items
  - 0 "RIEGL VZ2000"
- pc:count 1123106
- pc:type "lidar"
- datetime "2016-11-11T20:00:58Z"
links[] 0 items
assets
- pointcloud
  - href "../../demo/kijkduin/data/161111/161111_200058.laz"
  - type "application/vnd.laszip+copc"
  - roles[] 1 items
    
    0 "data"

Validating the item¶

item.validate() performs schema checks across the Item and its extensions.

In [17]:

Copied!

item.validate() # Uncomment this line to validate the item
item.validate() # Uncomment this line to validate the item

Out[17]:

['https://schemas.stacspec.org/v1.1.0/item-spec/json-schema/item.json',
 'https://tum-rsa.github.io/topo4d/v0.1.0/schema.json',
 'https://stac-extensions.github.io/projection/v2.0.0/schema.json',
 'https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json',
 'https://stac-extensions.github.io/timestamps/v1.1.0/schema.json']

Create more items¶

After confirming the helpers work for a single file we scale up to the full dataset. The loop below merges in the shared user metadata for each LAS file, builds a list of Items, and assigns self HREFs so they can be written to disk later on.

In [18]:

Copied!





def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
    item_id = get_id(metadata)
    item_datetime = get_datetime(metadata)
    item_geo = get_geo(metadata)

    item = pystac.Item(
        id=item_id,
        geometry=item_geo["geometry"],
        bbox=item_geo["bbox"],
        datetime=item_datetime.replace(tzinfo=None),
        properties={}
    )

    # Add standard STAC extensions
    item.stac_extensions.extend([
        "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
        "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
        "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
    ])

    # Apply topo4d extension
    topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
    topo4d_ext.tz = metadata.get("timezone", "UTC")
    topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
    topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)  # Fixed typo: was acquisition_date
    topo4d_ext.global_trafo = metadata.get("global_trafo", None)

    # Use standard STAC extensions for formerly topo4d properties
    item.properties["proj:code"] = item_geo.get("native_crs", None)
    item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
    item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
    item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') in ['ULS', 'TLS'] else 'other'

    # Add timestamps if available
    if metadata.get("header", {}).get("creation_date"):
        try:
            creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
            item.properties["timestamps:published"] = creation_date
        except:
            pass

    asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

    item.add_asset(
        key=asset_name,
        asset=item_asset
    )

    return item
def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
    item_id = get_id(metadata)
    item_datetime = get_datetime(metadata)
    item_geo = get_geo(metadata)

    item = pystac.Item(
        id=item_id,
        geometry=item_geo["geometry"],
        bbox=item_geo["bbox"],
        datetime=item_datetime.replace(tzinfo=None),
        properties={}
    )

    # Add standard STAC extensions
    item.stac_extensions.extend([
        "https://stac-extensions.github.io/projection/v2.0.0/schema.json",
        "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json", 
        "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
    ])

    # Apply topo4d extension
    topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
    topo4d_ext.tz = metadata.get("timezone", "UTC")
    topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
    topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)  # Fixed typo: was acquisition_date
    topo4d_ext.global_trafo = metadata.get("global_trafo", None)

    # Use standard STAC extensions for formerly topo4d properties
    item.properties["proj:code"] = item_geo.get("native_crs", None)
    item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
    item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
    item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') in ['ULS', 'TLS'] else 'other'

    # Add timestamps if available
    if metadata.get("header", {}).get("creation_date"):
        try:
            creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
            item.properties["timestamps:published"] = creation_date
        except:
            pass

    asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)

    item.add_asset(
        key=asset_name,
        asset=item_asset
    )

    return item

In [19]:

Copied!





global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
    # read trafo meta from txt file as array
    global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    meta_userinput = {
        "id": file_id,
        "data_type": DataType.POINTCLOUD,
        "native_crs": "EPSG:7415",
        "sensor": "RIEGL VZ2000",
        "acquisition_mode": "TLS",
        "datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
        "timezone": "Europe/Amsterdam",
        "asset_url": file_path.as_posix(),
        "global_trafo": global_trafo_mat 
    }

    metadata = meta_las
    metadata.update(meta_userinput)

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)
global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
    # read trafo meta from txt file as array
    global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

item_list = []
metadata_list = []

for file_path in tqdm(file_paths, desc="Processing files"):
    meta_las = extract_metadata_from_las(file_path, if_save=True)
    file_id = Path(file_path).stem.split(".")[0]

    meta_userinput = {
        "id": file_id,
        "data_type": DataType.POINTCLOUD,
        "native_crs": "EPSG:7415",
        "sensor": "RIEGL VZ2000",
        "acquisition_mode": "TLS",
        "datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
        "timezone": "Europe/Amsterdam",
        "asset_url": file_path.as_posix(),
        "global_trafo": global_trafo_mat 
    }

    metadata = meta_las
    metadata.update(meta_userinput)

    metadata_list.append(metadata)

    item = item_from_metadata(metadata)
    # item.validate() # Uncomment when the Schema url is public

    item.set_self_href(f"{data_dir}/{item.id}.json")
    item_list.append(item)

Processing files: 100%|██████████| 28/28 [00:00<00:00, 79.60it/s]

In [20]:

Copied!

item_list[0]
item_list[0]

Out[20]:

type "Feature"
stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 1 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- 3 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
id "161111_200058"
geometry
- type "Polygon"
- coordinates[] 1 items
  - 0[] 5 items
    
    0[] 2 items
    
    0 4.236990109111119
    
    1 52.060984149209446
    
    1[] 2 items
    
    0 4.236990109111119
    
    1 52.082514395889724
    
    2[] 2 items
    
    0 4.2032353058274445
    
    1 52.082514395889724
    
    3[] 2 items
    
    0 4.2032353058274445
    
    1 52.060984149209446
    
    4[] 2 items
    
    0 4.236990109111119
    
    1 52.060984149209446
bbox[] 4 items
- 0 4.2032353058274445
- 1 52.060984149209446
- 2 4.236990109111119
- 3 52.082514395889724
properties
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "TLS"
- topo4d:global_trafo[] 4 items
  - 0[] 4 items
    
    0 0.708647477
    
    1 0.705554059
    
    2 0.003496133
    
    3 75124.664999332
  - 1[] 4 items
    
    0 -0.705561978
    
    1 0.708645222
    
    2 0.002060151
    
    3 454173.49757747503
  - 2[] 4 items
    
    0 -0.00102397
    
    1 -0.003926659
    
    2 0.999991766
    
    3 37.853088875
  - 3[] 4 items
    
    0 0.0
    
    1 0.0
    
    2 0.0
    
    3 1.0
- proj:code "EPSG:7415"
- instruments[] 1 items
  - 0 "RIEGL VZ2000"
- pc:count 1123106
- pc:type "lidar"
- timestamps:published "2021-04-01T00:00:00Z"
- datetime "2016-11-11T20:00:58Z"
links[] 1 items
- 0
  - rel "self"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_200058.json"
  - type "application/json"
assets
- pointcloud
  - href "../../demo/kijkduin/data/161111/161111_200058.laz"
  - type "application/vnd.laszip+copc"
  - roles[] 1 items
    
    0 "data"

Adding TrafoMeta¶

Multi-temporal TLS acquisitions often need to be aligned with per-epoch transformation matrices. We load the matrices saved next to each LAS file, link them to a reference epoch Item, and attach them with the Topo4D TrafoMeta helper.

In [21]:

Copied!

file_paths[0].stem
file_paths[0].stem

Out[21]:

'161111_200058'

In [22]:

Copied!





trafo_paths = [f"{f.parent}/{f.stem}_trafomat.txt" for f in file_paths]
ref_id = "161111_200058"
# find the match id in item_list
if ref_id in [item.id for item in item_list]:
    reference_epoch = [item for item in item_list if item.id == ref_id][0]

    reference_epoch_link = pystac.Link(
        rel=reference_epoch.id,
        target=reference_epoch.get_self_href(),
        title="Reference Epoch",
        media_type=pystac.MediaType.JSON)

    for idx, trafo_path in enumerate(trafo_paths):
        with open(trafo_path, "r") as f:
            # read trafo meta from txt file as array
            trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

        trafo_meta = TrafoMeta.create(
            reference_epoch=reference_epoch_link.to_dict(),
            transformation=trafo_mat,
            )

        item = item_list[idx]
        Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta
trafo_paths = [f"{f.parent}/{f.stem}_trafomat.txt" for f in file_paths]
ref_id = "161111_200058"
# find the match id in item_list
if ref_id in [item.id for item in item_list]:
    reference_epoch = [item for item in item_list if item.id == ref_id][0]

    reference_epoch_link = pystac.Link(
        rel=reference_epoch.id,
        target=reference_epoch.get_self_href(),
        title="Reference Epoch",
        media_type=pystac.MediaType.JSON)

    for idx, trafo_path in enumerate(trafo_paths):
        with open(trafo_path, "r") as f:
            # read trafo meta from txt file as array
            trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]

        trafo_meta = TrafoMeta.create(
            reference_epoch=reference_epoch_link.to_dict(),
            transformation=trafo_mat,
            )

        item = item_list[idx]
        Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta

Adding ProductMeta¶

If the extracted metadata includes product level information we convert it into the Topo4D ProductMeta structure. Publication or creation dates are recorded via the timestamps extension so other STAC clients can understand when derivatives were produced.

In [23]:

Copied!





for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param=meta.get("param", None),
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass
for idx, meta in enumerate(metadata_list):
    item = item_list[idx]
    if "product_name" in meta:
        product_meta = ProductMeta.create(
            product_name=meta["product_name"],
            param=meta.get("param", None),
            derived_from=meta.get("derived_from", None),
            product_level=meta.get("product_level", None),
        )
        Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
        
        # Use timestamps extension for publication date instead of ProductMeta.lastupdate
        if meta.get('header', {}).get("creation_date"):
            try:
                creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
                item.properties["timestamps:published"] = creation_date
            except:
                pass

Create the Collection¶

With all Items prepared we describe the dataset as a whole. The next cells populate the required Collection fields step by step so you can see how each attribute maps to the STAC Collection and adapt it to your own projects.

Collection id

Choose a concise, stable identifier for the Collection. We use Kijkduin_TLS here to capture both the location and the acquisition technique.

In [24]:

Copied!

collection_id = f"Kijkduin_TLS"
collection_id
collection_id = f"Kijkduin_TLS"
collection_id

Out[24]:

'Kijkduin_TLS'

Collection title

Provide a human friendly title that will appear in catalog browsers and search results.

In [25]:

Copied!

collection_title = "Kijkduin TLS Point Clouds Collection"
collection_title
collection_title = "Kijkduin TLS Point Clouds Collection"
collection_title

Out[25]:

'Kijkduin TLS Point Clouds Collection'

Collection description

Describe what the dataset contains, the sensors involved, and any processing highlights. Markdown formatting is supported.

In [26]:

Copied!

collection_desc = f'''### {collection_title}

A collection of point cloud from Kijkduin Sandy Beach, The Netherland.
'''
print(collection_desc)
collection_desc = f'''### {collection_title}

A collection of point cloud from Kijkduin Sandy Beach, The Netherland.
'''
print(collection_desc)

### Kijkduin TLS Point Clouds Collection

A collection of point cloud from Kijkduin Sandy Beach, The Netherland.

Collection license

Declare the usage rights for the data. You can reference a standard SPDX identifier or supply a custom license string.

In [27]:

Copied!

collection_license = "CC-BY-4.0"
collection_license
collection_license = "CC-BY-4.0"
collection_license

Out[27]:

'CC-BY-4.0'

Collection provider

List the organisations responsible for producing, hosting, or licensing the dataset. Add as many providers as needed and assign the appropriate roles.

In [28]:

Copied!





collection_providers = [
    pystac.Provider(
        name="TU Delft",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://coastscan.citg.tudelft.nl/",
    ),
]
collection_providers = [
    pystac.Provider(
        name="TU Delft",
        roles=[
            pystac.ProviderRole.PROCESSOR,
            pystac.ProviderRole.PRODUCER,
            pystac.ProviderRole.LICENSOR,
            pystac.ProviderRole.HOST
            ],
        url="https://coastscan.citg.tudelft.nl/",
    ),
]

Collection extent

The extent captures both spatial and temporal coverage. We seed it with broad bounds here and later tighten it using the Items so search clients receive accurate metadata.

In [29]:

Copied!

spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)
spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)

In [30]:

Copied!





collection = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)
collection = pystac.Collection(
    id=collection_id,
    title=collection_title,
    description=collection_desc,
    extent=collection_extent,
    license=collection_license,
    providers=collection_providers,
)

In [31]:

Copied!

collection.to_dict()
collection.to_dict()

Out[31]:

{'type': 'Collection',
 'id': 'Kijkduin_TLS',
 'stac_version': '1.1.0',
 'description': '### Kijkduin TLS Point Clouds Collection\n\nA collection of point cloud from Kijkduin Sandy Beach, The Netherland.\n',
 'links': [],
 'title': 'Kijkduin TLS Point Clouds Collection',
 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]},
  'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}},
 'license': 'CC-BY-4.0',
 'providers': [{'name': 'TU Delft',
   'roles': ['processor', 'producer', 'licensor', 'host'],
   'url': 'https://coastscan.citg.tudelft.nl/'}]}

In [32]:

Copied!

# add items to the collection
for item in item_list:
    collection.add_item(item)
# add items to the collection
for item in item_list:
    collection.add_item(item)

Collection summaries

Summaries expose quick statistics about the Collection (counts, ranges, enumerations) and help STAC clients reason about the dataset without loading every Item.

In [33]:

Copied!





# add collection summaries
collection.summaries.add("num_items", {"count": len(item_list)})
collection.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection.summaries.add("temporal_resolution", {"resolution": "1 hour"})
# add collection summaries
collection.summaries.add("num_items", {"count": len(item_list)})
collection.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection.summaries.add("temporal_resolution", {"resolution": "1 hour"})

In [34]:

Copied!

collection.update_extent_from_items()
collection.extent.to_dict()
collection
collection.update_extent_from_items()
collection.extent.to_dict()
collection

Out[34]:

type "Collection"
id "Kijkduin_TLS"
stac_version "1.1.0"
description "### Kijkduin TLS Point Clouds Collection A collection of point cloud from Kijkduin Sandy Beach, The Netherland. "
links[] 28 items
- 0
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_200058.json"
  - type "application/geo+json"
- 1
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_210058.json"
  - type "application/geo+json"
- 2
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_220103.json"
  - type "application/geo+json"
- 3
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_230104.json"
  - type "application/geo+json"
- 4
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_000104.json"
  - type "application/geo+json"
- 5
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_010104.json"
  - type "application/geo+json"
- 6
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_020106.json"
  - type "application/geo+json"
- 7
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_030107.json"
  - type "application/geo+json"
- 8
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_040108.json"
  - type "application/geo+json"
- 9
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_050109.json"
  - type "application/geo+json"
- 10
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_060112.json"
  - type "application/geo+json"
- 11
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_070110.json"
  - type "application/geo+json"
- 12
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_080111.json"
  - type "application/geo+json"
- 13
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_090111.json"
  - type "application/geo+json"
- 14
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_100112.json"
  - type "application/geo+json"
- 15
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_110113.json"
  - type "application/geo+json"
- 16
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_120115.json"
  - type "application/geo+json"
- 17
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_130115.json"
  - type "application/geo+json"
- 18
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_140116.json"
  - type "application/geo+json"
- 19
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_150118.json"
  - type "application/geo+json"
- 20
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_160118.json"
  - type "application/geo+json"
- 21
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_170118.json"
  - type "application/geo+json"
- 22
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_180119.json"
  - type "application/geo+json"
- 23
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_190130.json"
  - type "application/geo+json"
- 24
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_200120.json"
  - type "application/geo+json"
- 25
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_210121.json"
  - type "application/geo+json"
- 26
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_220123.json"
  - type "application/geo+json"
- 27
  - rel "item"
  - href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_230123.json"
  - type "application/geo+json"
title "Kijkduin TLS Point Clouds Collection"
extent
- spatial
  - bbox[] 1 items
    
    0[] 4 items
    
    0 4.197016592792564
    
    1 52.059182481715176
    
    2 4.238527655962627
    
    3 52.08538506475349
- temporal
  - interval[] 1 items
    
    0[] 2 items
    
    0 "2016-11-11T20:00:58Z"
    
    1 "2016-11-12T23:01:23Z"
license "CC-BY-4.0"
providers[] 1 items
- 0
  - name "TU Delft"
  - roles[] 4 items
    
    0 "processor"
    
    1 "producer"
    
    2 "licensor"
    
    3 "host"
  - url "https://coastscan.citg.tudelft.nl/"
summaries
- num_items
  - count 28
- timestamp_list
  - list[] 28 items
    
    0 "2016-11-11T20:00:58"
    
    1 "2016-11-11T21:00:58"
    
    2 "2016-11-11T22:01:03"
    
    3 "2016-11-11T23:01:04"
    
    4 "2016-11-12T00:01:04"
    
    5 "2016-11-12T01:01:04"
    
    6 "2016-11-12T02:01:06"
    
    7 "2016-11-12T03:01:07"
    
    8 "2016-11-12T04:01:08"
    
    9 "2016-11-12T05:01:09"
    
    10 "2016-11-12T06:01:12"
    
    11 "2016-11-12T07:01:10"
    
    12 "2016-11-12T08:01:11"
    
    13 "2016-11-12T09:01:11"
    
    14 "2016-11-12T10:01:12"
    
    15 "2016-11-12T11:01:13"
    
    16 "2016-11-12T12:01:15"
    
    17 "2016-11-12T13:01:15"
    
    18 "2016-11-12T14:01:16"
    
    19 "2016-11-12T15:01:18"
    
    20 "2016-11-12T16:01:18"
    
    21 "2016-11-12T17:01:18"
    
    22 "2016-11-12T18:01:19"
    
    23 "2016-11-12T19:01:30"
    
    24 "2016-11-12T20:01:20"
    
    25 "2016-11-12T21:01:21"
    
    26 "2016-11-12T22:01:23"
    
    27 "2016-11-12T23:01:23"
- temporal_resolution
  - resolution "1 hour"

Create the Catalog¶

Finally we wrap the Collection in a top-level Catalog so the structure is portable. Catalogs can contain multiple Collections; in this example we focus on the Kijkduin TLS dataset.

In [35]:

Copied!





catalog_id = "kijkduin-catalog"
catalog_desc = "This catalog contains TLS point cloud data from Kijkduin Sandy Beach, The Netherlands."
catalog = pystac.Catalog(
    id=catalog_id,
    description=catalog_desc
)
catalog_id = "kijkduin-catalog"
catalog_desc = "This catalog contains TLS point cloud data from Kijkduin Sandy Beach, The Netherlands."
catalog = pystac.Catalog(
    id=catalog_id,
    description=catalog_desc
)

In [36]:

Copied!

catalog.add_child(collection)
catalog.describe()
catalog.add_child(collection)
catalog.describe()

* <Catalog id=kijkduin-catalog>
    * <Collection id=Kijkduin_TLS>
      * <Item id=161111_200058>
      * <Item id=161111_210058>
      * <Item id=161111_220103>
      * <Item id=161111_230104>
      * <Item id=161112_000104>
      * <Item id=161112_010104>
      * <Item id=161112_020106>
      * <Item id=161112_030107>
      * <Item id=161112_040108>
      * <Item id=161112_050109>
      * <Item id=161112_060112>
      * <Item id=161112_070110>
      * <Item id=161112_080111>
      * <Item id=161112_090111>
      * <Item id=161112_100112>
      * <Item id=161112_110113>
      * <Item id=161112_120115>
      * <Item id=161112_130115>
      * <Item id=161112_140116>
      * <Item id=161112_150118>
      * <Item id=161112_160118>
      * <Item id=161112_170118>
      * <Item id=161112_180119>
      * <Item id=161112_190130>
      * <Item id=161112_200120>
      * <Item id=161112_210121>
      * <Item id=161112_220123>
      * <Item id=161112_230123>

Save the STAC¶

Normalising HREFs writes a clean directory layout to disk. Saving as SELF_CONTAINED ensures every JSON document references relatives paths so moving the catalog elsewhere keeps the links intact.

In [37]:

Copied!





from pathlib import Path

root_path = str(Path(f"{data_dir}"))
print(f"Root path for catalog: {root_path}")

from pystac.layout import TemplateLayoutStrategy

# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
    item_template="${id}.json"
)

catalog.normalize_hrefs(root_path, strategy=strategy)
from pathlib import Path

root_path = str(Path(f"{data_dir}"))
print(f"Root path for catalog: {root_path}")

from pystac.layout import TemplateLayoutStrategy

# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
    item_template="${id}.json"
)

catalog.normalize_hrefs(root_path, strategy=strategy)

Root path for catalog: ..\..\demo\kijkduin\data

In [38]:

Copied!

catalog.validate_all() # Uncomment when the Schema url is public
catalog.validate_all() # Uncomment when the Schema url is public

Out[38]:

In [39]:

Copied!

catalog.save(pystac.CatalogType.SELF_CONTAINED)
catalog.save(pystac.CatalogType.SELF_CONTAINED)

In [40]:

Copied!

catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()
catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()

* <Catalog id=kijkduin-catalog>
    * <Collection id=Kijkduin_TLS>
      * <Item id=161111_200058>
      * <Item id=161111_210058>
      * <Item id=161111_220103>
      * <Item id=161111_230104>
      * <Item id=161112_000104>
      * <Item id=161112_010104>
      * <Item id=161112_020106>
      * <Item id=161112_030107>
      * <Item id=161112_040108>
      * <Item id=161112_050109>
      * <Item id=161112_060112>
      * <Item id=161112_070110>
      * <Item id=161112_080111>
      * <Item id=161112_090111>
      * <Item id=161112_100112>
      * <Item id=161112_110113>
      * <Item id=161112_120115>
      * <Item id=161112_130115>
      * <Item id=161112_140116>
      * <Item id=161112_150118>
      * <Item id=161112_160118>
      * <Item id=161112_170118>
      * <Item id=161112_180119>
      * <Item id=161112_190130>
      * <Item id=161112_200120>
      * <Item id=161112_210121>
      * <Item id=161112_220123>
      * <Item id=161112_230123>