Creating a STAC of Near-Continuous 4D Point Clouds¶
Welcome! This notebook walks through building a STAC catalog for high-frequency terrestrial laser scanning (TLS) data captured at Kijkduin beach. We convert raw las/laz
files into well structured STAC Items using the topo4d STAC extension together with core STAC features.
You will learn how to
- inspect the source files and user supplied metadata
- build individual STAC Items with topo4d, projection, point cloud and timestamps extensions
- enrich items with domain specific Topo4D metadata such as transformation matrices and product information
- assemble the items into a Collection and publish a local, self contained Catalog
Prerequisites
- make sure the demo data under
demo/kijkduin
is available locally pystac
and other required libraries installed in the active environmenttopo4d
helper script locally stored and accessible- basic familiarity with Python and STAC terminology
The notebook highlights each step, explains why it is needed, and points to the relevant STAC fields so you can adapt the flow to your own projects.
Reference
- The STAC Specification
- Create a Basic STAC Catalog Using PySTAC
- Topographic 4D Extension Specification
Last update: 19.09.2025
Import libraries¶
We import required libraries alongside Topo4D STAC helper classes from topo4d
. The sys.path.append
call ensures the repository root is available so the local topo4d
package can be imported while the notebook runs from docs/notebooks
.
import json
from datetime import datetime
from pathlib import Path
import pandas as pd
from datetime import datetime, timezone, timedelta
import pytz
from dateutil import parser
import numpy as np
from tqdm import tqdm
from typing import Dict, Any
import sys
import os
sys.path.append(os.path.abspath('../..'))
import pystac
from topo4d.topo4d_ext import DataType, Topo4DExtension, ProductMeta, TrafoMeta
from topo4d.builder import extract_metadata_from_las, make_item_asset
Define dirs and paths¶
The demo files are stored in demo/kijkduin
. Separating data_dir
and aux_dir
keeps the raw LAS files and auxiliary information tidy. Adapt these variables at your own dataset before executing the rest of the notebook.
data_dir = "../../demo/kijkduin/data"
aux_dir = "../../demo/kijkduin/auxilary"
Extract metadata from las/laz file¶
extract_metadata_from_las
parses the LAS headers, point counts, and VLR blocks for every LAS/LAZ file we find. We list all candidate files first, then iterate through them so the helper can persist metadata snapshots alongside the raw data. Having structured metadata up front makes the later STAC construction straightforward. Adapt these block based on your own dataset.
file_dirs = os.listdir(data_dir)
file_paths = []
for f_dir in file_dirs:
f_paths = Path(data_dir).glob(f"{f_dir}/*")
for f_path in f_paths:
if f_path.suffix.lower() in [".las", ".laz"]:
file_paths.append(f_path)
for file_path in file_paths:
print(f"Processing {file_path}")
meta_las = extract_metadata_from_las(file_path, if_save=True)
Processing ..\..\demo\kijkduin\data\161111\161111_200058.laz Processing ..\..\demo\kijkduin\data\161111\161111_210058.laz Processing ..\..\demo\kijkduin\data\161111\161111_220103.laz Processing ..\..\demo\kijkduin\data\161111\161111_230104.laz Processing ..\..\demo\kijkduin\data\161112\161112_000104.laz Processing ..\..\demo\kijkduin\data\161112\161112_010104.laz Processing ..\..\demo\kijkduin\data\161112\161112_020106.laz Processing ..\..\demo\kijkduin\data\161112\161112_030107.laz Processing ..\..\demo\kijkduin\data\161112\161112_040108.laz Processing ..\..\demo\kijkduin\data\161112\161112_050109.laz Processing ..\..\demo\kijkduin\data\161112\161112_060112.laz Processing ..\..\demo\kijkduin\data\161112\161112_070110.laz Processing ..\..\demo\kijkduin\data\161112\161112_080111.laz Processing ..\..\demo\kijkduin\data\161112\161112_090111.laz Processing ..\..\demo\kijkduin\data\161112\161112_100112.laz Processing ..\..\demo\kijkduin\data\161112\161112_110113.laz Processing ..\..\demo\kijkduin\data\161112\161112_120115.laz Processing ..\..\demo\kijkduin\data\161112\161112_130115.laz Processing ..\..\demo\kijkduin\data\161112\161112_140116.laz Processing ..\..\demo\kijkduin\data\161112\161112_150118.laz Processing ..\..\demo\kijkduin\data\161112\161112_160118.laz Processing ..\..\demo\kijkduin\data\161112\161112_170118.laz Processing ..\..\demo\kijkduin\data\161112\161112_180119.laz Processing ..\..\demo\kijkduin\data\161112\161112_190130.laz Processing ..\..\demo\kijkduin\data\161112\161112_200120.laz Processing ..\..\demo\kijkduin\data\161112\161112_210121.laz Processing ..\..\demo\kijkduin\data\161112\161112_220123.laz Processing ..\..\demo\kijkduin\data\161112\161112_230123.laz
# Use the first epoch of file as an example
meta_las = extract_metadata_from_las(file_paths[0], if_save=True)
file_id = Path(file_paths[0]).stem.split(".")[0]
Ingest metadata¶
Besides the automatically extracted values we add domain knowledge that cannot be inferred from the file alone: sensor model, acquisition mode, timezone, transformation matrix, and so on. The resulting dictionary merges machine derived and user supplied fields and becomes the single source of truth when we instantiate the STAC Item.
global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
# read trafo meta from txt file as array
global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]
meta_userinput = {
"id": file_id,
"data_type": DataType.POINTCLOUD,
"native_crs": "EPSG:7415",
"sensor": "RIEGL VZ2000",
"acquisition_mode": "TLS",
"datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
"timezone": "Europe/Amsterdam",
"asset_url": file_paths[0].as_posix(),
"global_trafo": global_trafo_mat
}
# merge meta_las and meta_userinput
metadata = meta_las
metadata.update(meta_userinput)
print(json.dumps(metadata, indent=2))
{ "id": "161111_200058", "header": { "DEFAULT_POINT_FORMAT": "<PointFormat(3, 0 bytes of extra dims)>", "DEFAULT_VERSION": [ 1, 2 ], "are_points_compressed": true, "creation_date": "2021-04-01", "evlrs": [], "extra_header_bytes": null, "extra_vlr_bytes": null, "file_source_id": 0, "generating_software": "txt2las (version 171231)", "global_encoding": "<laspy.header.GlobalEncoding object at 0x00000271AAF1E930>", "major_version": 1, "maxs": [ 162.585, 1304.714, 95.185 ], "minor_version": 4, "mins": [ -627.258, -1236.375, -42.409 ], "number_of_evlrs": 0, "number_of_points_by_return": [ 1123106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "offset_to_point_data": 2065, "offsets": [ 0.0, 0.0, 0.0 ], "point_count": 1123106, "point_format": "<PointFormat(0, 27 bytes of extra dims)>", "scales": [ 0.001, 0.001, 0.001 ], "start_of_first_evlr": 0, "start_of_waveform_data_packet_record": 0, "system_identifier": "LAStools (c) by rapidlasso GmbH", "uuid": "00000000-0000-0000-0000-000000000000", "version": [ 1, 4 ], "vlrs": "[<ExtraBytesVlr(extra bytes structs: 8)>, <laspy.vlrs.known.LasZipVlr object at 0x00000271AAF75BE0>]", "x_max": 162.585, "x_min": -627.258, "x_offset": 0.0, "x_scale": 0.001, "y_max": 1304.714, "y_min": -1236.375, "y_offset": 0.0, "y_scale": 0.001, "z_max": 95.185, "z_min": -42.409, "z_offset": 0.0, "z_scale": 0.001 }, "vlrs": [ { "user_id": "LASF_Spec", "record_id": 4, "description": "by LAStools of rapidlasso GmbH", "extra_bytes_structs": "[<ExtraBytesStruct(range, 6, b'')>, <ExtraBytesStruct(theta, 6, b'')>, <ExtraBytesStruct(phi, 6, b'')>, <ExtraBytesStruct(reflectance, 6, b'')>, <ExtraBytesStruct(amplitude, 6, b'')>, <ExtraBytesStruct(deviation, 4, b'')>, <ExtraBytesStruct(echo, 1, b'')>, <ExtraBytesStruct(time, 6, b'')>]" }, { "user_id": "laszip encoded", "record_id": 22204, "description": "http://laszip.org" } ], "evlrs": [], "data_type": "pointcloud", "native_crs": "EPSG:7415", "sensor": "RIEGL VZ2000", "acquisition_mode": "TLS", "datetime": "2016-11-11T20:00:58", "timezone": "Europe/Amsterdam", "asset_url": "../../demo/kijkduin/data/161111/161111_200058.laz", "global_trafo": [ [ 0.708647477, 0.705554059, 0.003496133, 75124.664999332 ], [ -0.705561978, 0.708645222, 0.002060151, 454173.49757747503 ], [ -0.00102397, -0.003926659, 0.999991766, 37.853088875 ], [ 0.0, 0.0, 0.0, 1.0 ] ] }
Create a STAC Item from metadata¶
To keep the workflow readable and testable we split the Item construction into small helper functions. Each of the next subsections covers one aspect of STAC metadata so you can reuse the logic in scripts or pipelines.
- Item
id
Every STAC Item needs a stable identifier. Here we derive it from the filename: each LAS is named after the acquisition timestamp, so the ID doubles as a human readable capture time and remains unique across the dataset.
def get_id(metadata: Dict[str, Any]) -> str:
return metadata.get("id")
item_id = get_id(metadata)
item_id
'161111_200058'
- Item
datetime
This helper normalises the acquisition timestamp. It parses the string, attaches the correct timezone (either a named timezone or an explicit UTC offset), and returns a timezone aware datetime
. STAC Items expect naive UTC values, so we strip the tzinfo after performing the conversion.
def get_datetime(metadata: Dict) -> datetime:
dt_str = metadata.get("datetime")
if not dt_str:
raise ValueError("Missing 'datetime' in metadata")
dt = parser.parse(dt_str)
tz_str = metadata.get("timezone")
if tz_str and tz_str.startswith("UTC"):
sign = 1 if "+" in tz_str else -1
offset_str = tz_str[3:].replace("+", "").replace("-", "")
parts = offset_str.split(":")
hours = int(parts[0]) if parts[0] else 0
minutes = int(parts[1]) if len(parts) > 1 else 0
offset = timedelta(hours=sign * hours, minutes=sign * minutes)
tz = timezone(offset)
dt = dt.replace(tzinfo=tz)
elif tz_str and "/" in tz_str:
tz = pytz.timezone(tz_str)
dt = tz.localize(dt)
elif dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
item_datetime = get_datetime(metadata)
item_datetime
datetime.datetime(2016, 11, 11, 20, 0, 58, tzinfo=<DstTzInfo 'Europe/Amsterdam' CET+1:00:00 STD>)
- Item
bbox
,geometry
and CRS information
We compute the spatial footprint from the LAS header min and max coordinates. When a global transformation matrix is available it is applied before determining the bounding box. We also scan the LAS VLRs to recover the native CRS so we can report the footprint both in the acquisition CRS and in WGS84 for the Projection extension.
from shapely.geometry import box, mapping
from pyproj import CRS, Transformer
def get_geo(metadata: Dict[str, Any]) -> Dict[str, Any]:
header = metadata.get("header")
vlrs = metadata.get("vlrs", [])
native_crs = metadata.get("native_crs", None)
global_trafo = metadata.get("global_trafo", None)
if not header or "mins" not in header or "maxs" not in header:
raise ValueError("Invalid header format. 'mins' and 'maxs' are required.")
xmin, ymin, zmin = header["mins"]
xmax, ymax, zmax = header["maxs"]
if global_trafo:
# Apply global transformation matrix to the bounding box
trafo_matrix = np.array(global_trafo)
bbox_corners = np.array([[xmin, ymin, zmin, 1],
[xmax, ymin, zmin, 1],
[xmin, ymax, zmin, 1],
[xmax, ymax, zmin, 1]])
transformed_corners = bbox_corners @ trafo_matrix.T
xmin, ymin, zmin = transformed_corners.min(axis=0)[:3]
xmax, ymax, zmax = transformed_corners.max(axis=0)[:3]
crs = None
for vlr in vlrs:
if vlr.get("user_id", "").lower() == "lasf_projection":
if "wkt" in vlr and vlr["wkt"]:
crs = CRS.from_wkt(vlr["wkt"])
break
elif "epsg_code" in vlr and vlr["epsg_code"]:
crs = CRS.from_epsg(vlr["epsg_code"])
break
if not crs and native_crs:
try:
crs = CRS.from_user_input(native_crs)
except Exception:
crs = None
bbox = [xmin, ymin, xmax, ymax]
if crs and crs.to_epsg() != 4326:
try:
transformer = Transformer.from_crs(crs, CRS.from_epsg(4326), always_xy=True)
min_lon, min_lat = transformer.transform(xmin, ymin)
max_lon, max_lat = transformer.transform(xmax, ymax)
# geom in WGS84
bbox = [min_lon, min_lat, max_lon, max_lat]
geom = mapping(box(min_lon, min_lat, max_lon, max_lat))
except Exception:
pass
else:
# geom in native CRS
geom = mapping(box(xmin, ymin, xmax, ymax))
return {
"bbox": bbox,
"geometry": geom,
"native_crs": crs.to_string() if crs else None
}
item_geo = get_geo(metadata)
print(json.dumps(item_geo, indent=2))
{ "bbox": [ 4.2032353058274445, 52.060984149209446, 4.236990109111119, 52.082514395889724 ], "geometry": { "type": "Polygon", "coordinates": [ [ [ 4.236990109111119, 52.060984149209446 ], [ 4.236990109111119, 52.082514395889724 ], [ 4.2032353058274445, 52.082514395889724 ], [ 4.2032353058274445, 52.060984149209446 ], [ 4.236990109111119, 52.060984149209446 ] ] ] }, "native_crs": "EPSG:7415" }
- Create the
item
With the temporal and spatial metadata in hand we instantiate pystac.Item
. At this point the Item only contains the required geometry, bounding box, and datetime; later cells enrich it with extensions, properties, and assets.
item = pystac.Item(
id=item_id,
geometry=item_geo["geometry"],
bbox=item_geo["bbox"],
datetime=item_datetime.replace(tzinfo=None),
properties={}
)
- Adding STAC Extensions
We register the projection, point cloud, and timestamps extensions and then use the Topo4D extension to store TLS specific metadata such as timezone, acquisition mode, and transformation matrices. Mapping custom fields onto established STAC extensions keeps the Item interoperable with other tooling.
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
print(f"Topo4D Extension added: {Topo4DExtension.has_extension(item)}")
Topo4D Extension added: True
# Add standard STAC extensions
item.stac_extensions.extend([
"https://stac-extensions.github.io/projection/v2.0.0/schema.json",
"https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json",
"https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])
# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.timezone = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None)
topo4d_ext.global_trafo = metadata.get("global_trafo", None)
# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') == 'ULS' or 'TLS' else 'other'
item.properties
{'topo4d:timezone': 'Europe/Amsterdam', 'topo4d:data_type': 'pointcloud', 'topo4d:acquisition_mode': 'TLS', 'topo4d:global_trafo': [[0.708647477, 0.705554059, 0.003496133, 75124.664999332], [-0.705561978, 0.708645222, 0.002060151, 454173.49757747503], [-0.00102397, -0.003926659, 0.999991766, 37.853088875], [0.0, 0.0, 0.0, 1.0]], 'proj:code': 'EPSG:7415', 'instruments': ['RIEGL VZ2000'], 'pc:count': 1123106, 'pc:type': 'lidar'}
Adding assets¶
The make_item_asset
helper wraps the LAS file path with the appropriate media type and asset metadata. Adding the asset under a descriptive key lets downstream clients discover and download the raw point cloud directly from the Item.
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)
item.add_asset(
key=asset_name,
asset=item_asset
)
item
- type "Feature"
- stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- 1 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 3 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- id "161111_200058"
geometry
- type "Polygon"
coordinates[] 1 items
0[] 5 items
0[] 2 items
- 0 4.236990109111119
- 1 52.060984149209446
1[] 2 items
- 0 4.236990109111119
- 1 52.082514395889724
2[] 2 items
- 0 4.2032353058274445
- 1 52.082514395889724
3[] 2 items
- 0 4.2032353058274445
- 1 52.060984149209446
4[] 2 items
- 0 4.236990109111119
- 1 52.060984149209446
bbox[] 4 items
- 0 4.2032353058274445
- 1 52.060984149209446
- 2 4.236990109111119
- 3 52.082514395889724
properties
- topo4d:timezone "Europe/Amsterdam"
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "TLS"
topo4d:global_trafo[] 4 items
0[] 4 items
- 0 0.708647477
- 1 0.705554059
- 2 0.003496133
- 3 75124.664999332
1[] 4 items
- 0 -0.705561978
- 1 0.708645222
- 2 0.002060151
- 3 454173.49757747503
2[] 4 items
- 0 -0.00102397
- 1 -0.003926659
- 2 0.999991766
- 3 37.853088875
3[] 4 items
- 0 0.0
- 1 0.0
- 2 0.0
- 3 1.0
- proj:code "EPSG:7415"
instruments[] 1 items
- 0 "RIEGL VZ2000"
- pc:count 1123106
- pc:type "lidar"
- datetime "2016-11-11T20:00:58Z"
links[] 0 items
assets
pointcloud
- href "../../demo/kijkduin/data/161111/161111_200058.laz"
- type "application/vnd.laszip+copc"
roles[] 1 items
- 0 "data"
Validating the item¶
item.validate()
performs schema checks across the Item and its extensions.
item.validate() # Uncomment this line to validate the item
['https://schemas.stacspec.org/v1.1.0/item-spec/json-schema/item.json', 'https://tum-rsa.github.io/topo4d/v0.1.0/schema.json', 'https://stac-extensions.github.io/projection/v2.0.0/schema.json', 'https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json', 'https://stac-extensions.github.io/timestamps/v1.1.0/schema.json']
Create more items¶
After confirming the helpers work for a single file we scale up to the full dataset. The loop below merges in the shared user metadata for each LAS file, builds a list of Items, and assigns self HREFs so they can be written to disk later on.
def item_from_metadata(metadata: Dict[str, Any]) -> pystac.Item:
item_id = get_id(metadata)
item_datetime = get_datetime(metadata)
item_geo = get_geo(metadata)
item = pystac.Item(
id=item_id,
geometry=item_geo["geometry"],
bbox=item_geo["bbox"],
datetime=item_datetime.replace(tzinfo=None),
properties={}
)
# Add standard STAC extensions
item.stac_extensions.extend([
"https://stac-extensions.github.io/projection/v2.0.0/schema.json",
"https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json",
"https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
])
# Apply topo4d extension
topo4d_ext = Topo4DExtension.ext(item, add_if_missing=True)
topo4d_ext.tz = metadata.get("timezone", "UTC")
topo4d_ext.data_type = DataType(metadata.get("data_type", "pointcloud"))
topo4d_ext.acquisition_mode = metadata.get("acquisition_mode", None) # Fixed typo: was acquisition_date
topo4d_ext.global_trafo = metadata.get("global_trafo", None)
# Use standard STAC extensions for formerly topo4d properties
item.properties["proj:code"] = item_geo.get("native_crs", None)
item.properties["instruments"] = [metadata.get("sensor")] if metadata.get("sensor") else []
item.properties["pc:count"] = metadata.get("header", {}).get("point_count", None)
item.properties["pc:type"] = 'lidar' if metadata.get('acquisition_mode') in ['ULS', 'TLS'] else 'other'
# Add timestamps if available
if metadata.get("header", {}).get("creation_date"):
try:
creation_date = datetime.strptime(metadata["header"]["creation_date"], "%Y-%m-%d").isoformat() + "Z"
item.properties["timestamps:published"] = creation_date
except:
pass
asset_name, item_asset = make_item_asset(asset_url=metadata.get("asset_url"), user_input=metadata)
item.add_asset(
key=asset_name,
asset=item_asset
)
return item
global_trafo_path = f"{aux_dir}/_Global transformation matrix/Kijkduin_Global_Transformation.txt"
with open(global_trafo_path, "r") as f:
# read trafo meta from txt file as array
global_trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]
item_list = []
metadata_list = []
for file_path in tqdm(file_paths, desc="Processing files"):
meta_las = extract_metadata_from_las(file_path, if_save=True)
file_id = Path(file_path).stem.split(".")[0]
meta_userinput = {
"id": file_id,
"data_type": DataType.POINTCLOUD,
"native_crs": "EPSG:7415",
"sensor": "RIEGL VZ2000",
"acquisition_mode": "TLS",
"datetime": datetime.strptime(file_id, "%y%m%d_%H%M%S").isoformat(),
"timezone": "Europe/Amsterdam",
"asset_url": file_path.as_posix(),
"global_trafo": global_trafo_mat
}
metadata = meta_las
metadata.update(meta_userinput)
metadata_list.append(metadata)
item = item_from_metadata(metadata)
# item.validate() # Uncomment when the Schema url is public
item.set_self_href(f"{data_dir}/{item.id}.json")
item_list.append(item)
Processing files: 100%|██████████| 28/28 [00:00<00:00, 79.60it/s]
item_list[0]
- type "Feature"
- stac_version "1.1.0"
stac_extensions[] 4 items
- 0 "https://stac-extensions.github.io/projection/v2.0.0/schema.json"
- 1 "https://stac-extensions.github.io/pointcloud/v2.0.0/schema.json"
- 2 "https://stac-extensions.github.io/timestamps/v1.1.0/schema.json"
- 3 "https://tum-rsa.github.io/topo4d/v0.1.0/schema.json"
- id "161111_200058"
geometry
- type "Polygon"
coordinates[] 1 items
0[] 5 items
0[] 2 items
- 0 4.236990109111119
- 1 52.060984149209446
1[] 2 items
- 0 4.236990109111119
- 1 52.082514395889724
2[] 2 items
- 0 4.2032353058274445
- 1 52.082514395889724
3[] 2 items
- 0 4.2032353058274445
- 1 52.060984149209446
4[] 2 items
- 0 4.236990109111119
- 1 52.060984149209446
bbox[] 4 items
- 0 4.2032353058274445
- 1 52.060984149209446
- 2 4.236990109111119
- 3 52.082514395889724
properties
- topo4d:data_type "pointcloud"
- topo4d:acquisition_mode "TLS"
topo4d:global_trafo[] 4 items
0[] 4 items
- 0 0.708647477
- 1 0.705554059
- 2 0.003496133
- 3 75124.664999332
1[] 4 items
- 0 -0.705561978
- 1 0.708645222
- 2 0.002060151
- 3 454173.49757747503
2[] 4 items
- 0 -0.00102397
- 1 -0.003926659
- 2 0.999991766
- 3 37.853088875
3[] 4 items
- 0 0.0
- 1 0.0
- 2 0.0
- 3 1.0
- proj:code "EPSG:7415"
instruments[] 1 items
- 0 "RIEGL VZ2000"
- pc:count 1123106
- pc:type "lidar"
- timestamps:published "2021-04-01T00:00:00Z"
- datetime "2016-11-11T20:00:58Z"
links[] 1 items
0
- rel "self"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_200058.json"
- type "application/json"
assets
pointcloud
- href "../../demo/kijkduin/data/161111/161111_200058.laz"
- type "application/vnd.laszip+copc"
roles[] 1 items
- 0 "data"
Adding TrafoMeta¶
Multi-temporal TLS acquisitions often need to be aligned with per-epoch transformation matrices. We load the matrices saved next to each LAS file, link them to a reference epoch Item, and attach them with the Topo4D TrafoMeta
helper.
file_paths[0].stem
'161111_200058'
trafo_paths = [f"{f.parent}/{f.stem}_trafomat.txt" for f in file_paths]
ref_id = "161111_200058"
# find the match id in item_list
if ref_id in [item.id for item in item_list]:
reference_epoch = [item for item in item_list if item.id == ref_id][0]
reference_epoch_link = pystac.Link(
rel=reference_epoch.id,
target=reference_epoch.get_self_href(),
title="Reference Epoch",
media_type=pystac.MediaType.JSON)
for idx, trafo_path in enumerate(trafo_paths):
with open(trafo_path, "r") as f:
# read trafo meta from txt file as array
trafo_mat = [np.array(line.strip().split(), dtype=float).tolist() for line in f if line.strip()]
trafo_meta = TrafoMeta.create(
reference_epoch=reference_epoch_link.to_dict(),
transformation=trafo_mat,
)
item = item_list[idx]
Topo4DExtension.ext(item, add_if_missing=True).trafometa = trafo_meta
Adding ProductMeta¶
If the extracted metadata includes product level information we convert it into the Topo4D ProductMeta
structure. Publication or creation dates are recorded via the timestamps extension so other STAC clients can understand when derivatives were produced.
for idx, meta in enumerate(metadata_list):
item = item_list[idx]
if "product_name" in meta:
product_meta = ProductMeta.create(
product_name=meta["product_name"],
param=meta.get("param", None),
derived_from=meta.get("derived_from", None),
product_level=meta.get("product_level", None),
)
Topo4DExtension.ext(item, add_if_missing=True).productmeta = product_meta
# Use timestamps extension for publication date instead of ProductMeta.lastupdate
if meta.get('header', {}).get("creation_date"):
try:
creation_date = datetime.strptime(meta['header']["creation_date"], "%Y-%m-%d").isoformat() + "Z"
item.properties["timestamps:published"] = creation_date
except:
pass
Create the Collection¶
With all Items prepared we describe the dataset as a whole. The next cells populate the required Collection fields step by step so you can see how each attribute maps to the STAC Collection and adapt it to your own projects.
- Collection
id
Choose a concise, stable identifier for the Collection. We use Kijkduin_TLS
here to capture both the location and the acquisition technique.
collection_id = f"Kijkduin_TLS"
collection_id
'Kijkduin_TLS'
- Collection
title
Provide a human friendly title that will appear in catalog browsers and search results.
collection_title = "Kijkduin TLS Point Clouds Collection"
collection_title
'Kijkduin TLS Point Clouds Collection'
- Collection
description
Describe what the dataset contains, the sensors involved, and any processing highlights. Markdown formatting is supported.
collection_desc = f'''### {collection_title}
A collection of point cloud from Kijkduin Sandy Beach, The Netherland.
'''
print(collection_desc)
### Kijkduin TLS Point Clouds Collection A collection of point cloud from Kijkduin Sandy Beach, The Netherland.
- Collection
license
Declare the usage rights for the data. You can reference a standard SPDX identifier or supply a custom license string.
collection_license = "CC-BY-4.0"
collection_license
'CC-BY-4.0'
- Collection
provider
List the organisations responsible for producing, hosting, or licensing the dataset. Add as many providers as needed and assign the appropriate roles.
collection_providers = [
pystac.Provider(
name="TU Delft",
roles=[
pystac.ProviderRole.PROCESSOR,
pystac.ProviderRole.PRODUCER,
pystac.ProviderRole.LICENSOR,
pystac.ProviderRole.HOST
],
url="https://coastscan.citg.tudelft.nl/",
),
]
- Collection
extent
The extent captures both spatial and temporal coverage. We seed it with broad bounds here and later tighten it using the Items so search clients receive accurate metadata.
spatial_extent = pystac.SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
temporal_extent = pystac.TemporalExtent([[datetime(2013, 6, 1), None]])
collection_extent = pystac.Extent(spatial_extent, temporal_extent)
collection = pystac.Collection(
id=collection_id,
title=collection_title,
description=collection_desc,
extent=collection_extent,
license=collection_license,
providers=collection_providers,
)
collection.to_dict()
{'type': 'Collection', 'id': 'Kijkduin_TLS', 'stac_version': '1.1.0', 'description': '### Kijkduin TLS Point Clouds Collection\n\nA collection of point cloud from Kijkduin Sandy Beach, The Netherland.\n', 'links': [], 'title': 'Kijkduin TLS Point Clouds Collection', 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, 'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}}, 'license': 'CC-BY-4.0', 'providers': [{'name': 'TU Delft', 'roles': ['processor', 'producer', 'licensor', 'host'], 'url': 'https://coastscan.citg.tudelft.nl/'}]}
# add items to the collection
for item in item_list:
collection.add_item(item)
- Collection
summaries
Summaries expose quick statistics about the Collection (counts, ranges, enumerations) and help STAC clients reason about the dataset without loading every Item.
# add collection summaries
collection.summaries.add("num_items", {"count": len(item_list)})
collection.summaries.add("timestamp_list", {"list": [item.datetime.isoformat() for item in item_list]})
collection.summaries.add("temporal_resolution", {"resolution": "1 hour"})
collection.update_extent_from_items()
collection.extent.to_dict()
collection
- type "Collection"
- id "Kijkduin_TLS"
- stac_version "1.1.0"
- description "### Kijkduin TLS Point Clouds Collection A collection of point cloud from Kijkduin Sandy Beach, The Netherland. "
links[] 28 items
0
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_200058.json"
- type "application/geo+json"
1
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_210058.json"
- type "application/geo+json"
2
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_220103.json"
- type "application/geo+json"
3
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161111_230104.json"
- type "application/geo+json"
4
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_000104.json"
- type "application/geo+json"
5
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_010104.json"
- type "application/geo+json"
6
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_020106.json"
- type "application/geo+json"
7
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_030107.json"
- type "application/geo+json"
8
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_040108.json"
- type "application/geo+json"
9
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_050109.json"
- type "application/geo+json"
10
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_060112.json"
- type "application/geo+json"
11
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_070110.json"
- type "application/geo+json"
12
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_080111.json"
- type "application/geo+json"
13
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_090111.json"
- type "application/geo+json"
14
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_100112.json"
- type "application/geo+json"
15
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_110113.json"
- type "application/geo+json"
16
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_120115.json"
- type "application/geo+json"
17
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_130115.json"
- type "application/geo+json"
18
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_140116.json"
- type "application/geo+json"
19
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_150118.json"
- type "application/geo+json"
20
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_160118.json"
- type "application/geo+json"
21
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_170118.json"
- type "application/geo+json"
22
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_180119.json"
- type "application/geo+json"
23
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_190130.json"
- type "application/geo+json"
24
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_200120.json"
- type "application/geo+json"
25
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_210121.json"
- type "application/geo+json"
26
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_220123.json"
- type "application/geo+json"
27
- rel "item"
- href "c:/Users/jiapan/02_projects/2025_4DWORKS/4D-WORKS/demo/kijkduin/data/161112_230123.json"
- type "application/geo+json"
- title "Kijkduin TLS Point Clouds Collection"
extent
spatial
bbox[] 1 items
0[] 4 items
- 0 4.197016592792564
- 1 52.059182481715176
- 2 4.238527655962627
- 3 52.08538506475349
temporal
interval[] 1 items
0[] 2 items
- 0 "2016-11-11T20:00:58Z"
- 1 "2016-11-12T23:01:23Z"
- license "CC-BY-4.0"
providers[] 1 items
0
- name "TU Delft"
roles[] 4 items
- 0 "processor"
- 1 "producer"
- 2 "licensor"
- 3 "host"
- url "https://coastscan.citg.tudelft.nl/"
summaries
num_items
- count 28
timestamp_list
list[] 28 items
- 0 "2016-11-11T20:00:58"
- 1 "2016-11-11T21:00:58"
- 2 "2016-11-11T22:01:03"
- 3 "2016-11-11T23:01:04"
- 4 "2016-11-12T00:01:04"
- 5 "2016-11-12T01:01:04"
- 6 "2016-11-12T02:01:06"
- 7 "2016-11-12T03:01:07"
- 8 "2016-11-12T04:01:08"
- 9 "2016-11-12T05:01:09"
- 10 "2016-11-12T06:01:12"
- 11 "2016-11-12T07:01:10"
- 12 "2016-11-12T08:01:11"
- 13 "2016-11-12T09:01:11"
- 14 "2016-11-12T10:01:12"
- 15 "2016-11-12T11:01:13"
- 16 "2016-11-12T12:01:15"
- 17 "2016-11-12T13:01:15"
- 18 "2016-11-12T14:01:16"
- 19 "2016-11-12T15:01:18"
- 20 "2016-11-12T16:01:18"
- 21 "2016-11-12T17:01:18"
- 22 "2016-11-12T18:01:19"
- 23 "2016-11-12T19:01:30"
- 24 "2016-11-12T20:01:20"
- 25 "2016-11-12T21:01:21"
- 26 "2016-11-12T22:01:23"
- 27 "2016-11-12T23:01:23"
temporal_resolution
- resolution "1 hour"
Create the Catalog¶
Finally we wrap the Collection in a top-level Catalog so the structure is portable. Catalogs can contain multiple Collections; in this example we focus on the Kijkduin TLS dataset.
catalog_id = "kijkduin-catalog"
catalog_desc = "This catalog contains TLS point cloud data from Kijkduin Sandy Beach, The Netherlands."
catalog = pystac.Catalog(
id=catalog_id,
description=catalog_desc
)
catalog.add_child(collection)
catalog.describe()
* <Catalog id=kijkduin-catalog> * <Collection id=Kijkduin_TLS> * <Item id=161111_200058> * <Item id=161111_210058> * <Item id=161111_220103> * <Item id=161111_230104> * <Item id=161112_000104> * <Item id=161112_010104> * <Item id=161112_020106> * <Item id=161112_030107> * <Item id=161112_040108> * <Item id=161112_050109> * <Item id=161112_060112> * <Item id=161112_070110> * <Item id=161112_080111> * <Item id=161112_090111> * <Item id=161112_100112> * <Item id=161112_110113> * <Item id=161112_120115> * <Item id=161112_130115> * <Item id=161112_140116> * <Item id=161112_150118> * <Item id=161112_160118> * <Item id=161112_170118> * <Item id=161112_180119> * <Item id=161112_190130> * <Item id=161112_200120> * <Item id=161112_210121> * <Item id=161112_220123> * <Item id=161112_230123>
Save the STAC¶
Normalising HREFs writes a clean directory layout to disk. Saving as SELF_CONTAINED
ensures every JSON document references relatives paths so moving the catalog elsewhere keeps the links intact.
from pathlib import Path
root_path = str(Path(f"{data_dir}"))
print(f"Root path for catalog: {root_path}")
from pystac.layout import TemplateLayoutStrategy
# Set up a flatten layout strategy
strategy = TemplateLayoutStrategy(
item_template="${id}.json"
)
catalog.normalize_hrefs(root_path, strategy=strategy)
Root path for catalog: ..\..\demo\kijkduin\data
catalog.validate_all() # Uncomment when the Schema url is public
28
catalog.save(pystac.CatalogType.SELF_CONTAINED)
catalog_load = pystac.Catalog.from_file(f"{root_path}/catalog.json")
catalog_load.describe()
* <Catalog id=kijkduin-catalog> * <Collection id=Kijkduin_TLS> * <Item id=161111_200058> * <Item id=161111_210058> * <Item id=161111_220103> * <Item id=161111_230104> * <Item id=161112_000104> * <Item id=161112_010104> * <Item id=161112_020106> * <Item id=161112_030107> * <Item id=161112_040108> * <Item id=161112_050109> * <Item id=161112_060112> * <Item id=161112_070110> * <Item id=161112_080111> * <Item id=161112_090111> * <Item id=161112_100112> * <Item id=161112_110113> * <Item id=161112_120115> * <Item id=161112_130115> * <Item id=161112_140116> * <Item id=161112_150118> * <Item id=161112_160118> * <Item id=161112_170118> * <Item id=161112_180119> * <Item id=161112_190130> * <Item id=161112_200120> * <Item id=161112_210121> * <Item id=161112_220123> * <Item id=161112_230123>