Skip to content

Commit 7fe08ac

Browse files
authored
Merge branch 'main' into check-antimeridian
2 parents 88962e1 + 9daa5a6 commit 7fe08ac

File tree

9 files changed

+353
-12
lines changed

9 files changed

+353
-12
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ The format is (loosely) based on [Keep a Changelog](http://keepachangelog.com/)
1313
- Detects and reports cases where a bbox incorrectly "belts the globe" instead of properly crossing the antimeridian
1414
- Provides clear error messages to help users fix incorrectly formatted bboxes
1515
- Added sponsors and supporters section with logos ([#122](https://github.com/stac-utils/stac-check/pull/122))
16+
- Added check to verify that bbox matches item's polygon geometry ([#123](https://github.com/stac-utils/stac-check/pull/123))
17+
- Added configuration documentation to README ([#124](https://github.com/stac-utils/stac-check/pull/124))
18+
- Added `--pydantic` option for validating STAC objects using stac-pydantic models, providing enhanced type checking and validation ([#126](https://github.com/stac-utils/stac-check/pull/126))
19+
20+
### Enhanced
21+
22+
- Improved bbox validation output to show detailed information about mismatches between bbox and geometry bounds, including which specific coordinates differ and by how much ([#126](https://github.com/stac-utils/stac-check/pull/126))
1623

1724
### Updated
1825

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ The intent of this project is to provide a validation tool that also follows the
2525
- [Docker](#docker)
2626
- [Usage](#usage)
2727
- [CLI Usage](#cli-usage)
28+
- [Configuration](#configuration)
2829
- [Python API Usage](#python-api-usage)
2930
- [Examples](#examples)
3031
- [Basic Validation](#basic-validation)
@@ -85,9 +86,64 @@ Options:
8586
(enabled by default).
8687
--header KEY VALUE HTTP header to include in the requests. Can be used
8788
multiple times.
89+
--pydantic Use stac-pydantic for enhanced validation with Pydantic models.
8890
--help Show this message and exit.
8991
```
9092

93+
### Configuration
94+
95+
stac-check uses a configuration file to control which validation checks are performed. By default, it uses the built-in configuration at `stac_check/stac-check.config.yml`. You can customize the validation behavior by creating your own configuration file.
96+
97+
The configuration file has two main sections:
98+
99+
1. **linting**: Controls which best practices checks are enabled
100+
2. **settings**: Configures thresholds for certain checks
101+
102+
Here's an example of the configuration options:
103+
104+
```yaml
105+
linting:
106+
# Identifiers should consist of only lowercase characters, numbers, '_', and '-'
107+
searchable_identifiers: true
108+
# Item name should not contain ':' or '/'
109+
percent_encoded: true
110+
# Item file names should match their ids
111+
item_id_file_name: true
112+
# Collections and catalogs should be named collection.json and catalog.json
113+
catalog_id_file_name: true
114+
# A STAC collection should contain a summaries field
115+
check_summaries: true
116+
# Datetime fields should not be set to null
117+
null_datetime: true
118+
# Check unlocated items to make sure bbox field is not set
119+
check_unlocated: true
120+
# Check if bbox matches the bounds of the geometry
121+
check_bbox_geometry_match: true
122+
# Check to see if there are too many links
123+
bloated_links: true
124+
# Check for bloated metadata in properties
125+
bloated_metadata: true
126+
# Ensure thumbnail is a small file size ["png", "jpeg", "jpg", "webp"]
127+
check_thumbnail: true
128+
# Ensure that links in catalogs and collections include a title field
129+
links_title: true
130+
# Ensure that links in catalogs and collections include self link
131+
links_self: true
132+
133+
settings:
134+
# Number of links before the bloated links warning is shown
135+
max_links: 20
136+
# Number of properties before the bloated metadata warning is shown
137+
max_properties: 20
138+
```
139+
140+
To use a custom configuration file, set the `STAC_CHECK_CONFIG` environment variable to the path of your configuration file:
141+
142+
```bash
143+
export STAC_CHECK_CONFIG=/path/to/your/config.yml
144+
stac-check sample_files/1.0.0/core-item.json
145+
```
146+
91147
### Python API Usage
92148

93149
```python

sample_files/1.0.0/bad-item.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
-122.59750209,
99
37.48803556,
1010
-122.2880486,
11-
37.613537207
11+
37.613531207
1212
],
1313
"geometry": {
1414
"type": "Polygon",

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from setuptools import find_packages, setup
55

6-
__version__ = "1.6.0"
6+
__version__ = "1.7.0"
77

88
with open("README.md", "r") as fh:
99
long_description = fh.read()
@@ -20,7 +20,7 @@
2020
"requests>=2.32.3",
2121
"jsonschema>=4.23.0",
2222
"click>=8.1.8",
23-
"stac-validator>=3.6.0",
23+
"stac-validator[pydantic]>=3.7.0",
2424
"PyYAML",
2525
"python-dotenv",
2626
],

stac_check/cli.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,13 @@ def intro_message(linter: Linter) -> None:
9191
f"Validator: stac-validator {linter.validator_version}", bg="blue", fg="white"
9292
)
9393

94+
# Always show validation method
95+
validation_method = (
96+
"Pydantic" if hasattr(linter, "pydantic") and linter.pydantic else "JSONSchema"
97+
)
98+
click.secho()
99+
click.secho(f"Validation method: {validation_method}", bg="yellow", fg="black")
100+
94101
click.secho()
95102

96103

@@ -111,7 +118,17 @@ def cli_message(linter: Linter) -> None:
111118

112119
""" schemas validated for core object """
113120
click.secho()
114-
if len(linter.schema) > 0:
121+
122+
# Determine if we're using Pydantic validation
123+
using_pydantic = hasattr(linter, "pydantic") and linter.pydantic
124+
125+
# For Pydantic validation, always show the appropriate schema model
126+
if using_pydantic:
127+
click.secho("Schemas validated: ", fg="blue")
128+
asset_type = linter.asset_type.capitalize() if linter.asset_type else "Item"
129+
click.secho(f" stac-pydantic {asset_type} model")
130+
# For JSONSchema validation or when schemas are available
131+
elif len(linter.schema) > 0:
115132
click.secho("Schemas validated: ", fg="blue")
116133
for schema in linter.schema:
117134
click.secho(f" {schema}")
@@ -194,10 +211,15 @@ def cli_message(linter: Linter) -> None:
194211
multiple=True,
195212
help="HTTP header to include in the requests. Can be used multiple times.",
196213
)
214+
@click.option(
215+
"--pydantic",
216+
is_flag=True,
217+
help="Use stac-pydantic for enhanced validation with Pydantic models.",
218+
)
197219
@click.command()
198220
@click.argument("file")
199221
@click.version_option(version=importlib.metadata.distribution("stac-check").version)
200-
def main(file, recursive, max_depth, assets, links, no_assets_urls, header):
222+
def main(file, recursive, max_depth, assets, links, no_assets_urls, header, pydantic):
201223
linter = Linter(
202224
file,
203225
assets=assets,
@@ -206,6 +228,7 @@ def main(file, recursive, max_depth, assets, links, no_assets_urls, header):
206228
max_depth=max_depth,
207229
assets_open_urls=not no_assets_urls,
208230
headers=dict(header),
231+
pydantic=pydantic,
209232
)
210233
intro_message(linter)
211234
if recursive > 0:

stac_check/lint.py

Lines changed: 136 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import json
44
import os
55
from dataclasses import dataclass, field
6-
from typing import Any, Dict, List, Optional, Union
6+
from typing import Any, Dict, List, Optional, Tuple, Union
77

88
import requests
99
import yaml
@@ -27,6 +27,7 @@ class Linter:
2727
max_depth (Optional[int], optional): An optional integer indicating the maximum depth to validate recursively. Defaults to None.
2828
assets_open_urls (bool): Whether to open assets URLs when validating assets. Defaults to True.
2929
headers (dict): HTTP headers to include in the requests.
30+
pydantic (bool, optional): A boolean value indicating whether to use pydantic validation. Defaults to False.
3031
3132
Attributes:
3233
data (dict): A dictionary representing the STAC JSON file.
@@ -125,14 +126,15 @@ def check_summaries(self) -> bool:
125126
Creates a message with best practices recommendations for the STAC JSON file.
126127
"""
127128

128-
item: Union[str, dict] # url, file name, or dictionary
129+
item: Union[str, Dict]
129130
config_file: Optional[str] = None
130131
assets: bool = False
131132
links: bool = False
132133
recursive: bool = False
133134
max_depth: Optional[int] = None
134135
assets_open_urls: bool = True
135-
headers: dict = field(default_factory=dict)
136+
headers: Dict = field(default_factory=dict)
137+
pydantic: bool = False
136138

137139
def __post_init__(self):
138140
self.data = self.load_data(self.item)
@@ -276,16 +278,21 @@ def validate_file(self, file: Union[str, dict]) -> Dict[str, Any]:
276278
assets=self.assets,
277279
assets_open_urls=self.assets_open_urls,
278280
headers=self.headers,
281+
pydantic=self.pydantic,
279282
)
280283
stac.run()
281284
elif isinstance(file, dict):
282285
stac = StacValidate(
283-
assets_open_urls=self.assets_open_urls, headers=self.headers
286+
assets_open_urls=self.assets_open_urls,
287+
headers=self.headers,
288+
pydantic=self.pydantic,
284289
)
285290
stac.validate_dict(file)
286291
else:
287292
raise ValueError("Input must be a file path or STAC dictionary.")
288-
return stac.message[0]
293+
294+
message = stac.message[0]
295+
return message
289296

290297
def recursive_validation(self, file: Union[str, Dict[str, Any]]) -> str:
291298
"""Recursively validate a STAC item or catalog file and its child items.
@@ -308,6 +315,7 @@ def recursive_validation(self, file: Union[str, Dict[str, Any]]) -> str:
308315
max_depth=self.max_depth,
309316
assets_open_urls=self.assets_open_urls,
310317
headers=self.headers,
318+
pydantic=self.pydantic,
311319
)
312320
stac.run()
313321
else:
@@ -316,6 +324,7 @@ def recursive_validation(self, file: Union[str, Dict[str, Any]]) -> str:
316324
max_depth=self.max_depth,
317325
assets_open_urls=self.assets_open_urls,
318326
headers=self.headers,
327+
pydantic=self.pydantic,
319328
)
320329
stac.validate_dict(file)
321330
return stac.message
@@ -456,10 +465,75 @@ def check_geometry_null(self) -> bool:
456465
bool: A boolean indicating whether the geometry property is null (True) or not (False).
457466
"""
458467
if "geometry" in self.data:
459-
return self.data["geometry"] is None
468+
return self.data.get("geometry") is None
460469
else:
461470
return False
462471

472+
def check_bbox_matches_geometry(
473+
self,
474+
) -> Union[bool, Tuple[bool, List[float], List[float], List[float]]]:
475+
"""Checks if the bbox of a STAC item matches its geometry.
476+
477+
This function verifies that the bounding box (bbox) accurately represents
478+
the minimum bounding rectangle of the item's geometry. It only applies to
479+
items with non-null geometry of type Polygon or MultiPolygon.
480+
481+
Returns:
482+
Union[bool, Tuple[bool, List[float], List[float], List[float]]]:
483+
- True if the bbox matches the geometry or if the check is not applicable
484+
(e.g., null geometry or non-polygon type).
485+
- When there's a mismatch: a tuple containing (False, calculated_bbox, actual_bbox, differences)
486+
"""
487+
# Skip check if geometry is null or bbox is not present
488+
if (
489+
"geometry" not in self.data
490+
or self.data.get("geometry") is None
491+
or "bbox" not in self.data
492+
or self.data.get("bbox") is None
493+
):
494+
return True
495+
496+
geometry = self.data.get("geometry")
497+
bbox = self.data.get("bbox")
498+
499+
# Only process Polygon and MultiPolygon geometries
500+
geom_type = geometry.get("type")
501+
if geom_type not in ["Polygon", "MultiPolygon"]:
502+
return True
503+
504+
# Extract coordinates based on geometry type
505+
coordinates = []
506+
if geom_type == "Polygon":
507+
# For Polygon, use the exterior ring (first element)
508+
if len(geometry.get("coordinates", [])) > 0:
509+
coordinates = geometry.get("coordinates")[0]
510+
elif geom_type == "MultiPolygon":
511+
# For MultiPolygon, collect all coordinates from all polygons
512+
for polygon in geometry.get("coordinates", []):
513+
if len(polygon) > 0:
514+
coordinates.extend(polygon[0])
515+
516+
# If no valid coordinates, skip check
517+
if not coordinates:
518+
return True
519+
520+
# Calculate min/max from coordinates
521+
lons = [coord[0] for coord in coordinates]
522+
lats = [coord[1] for coord in coordinates]
523+
524+
calc_bbox = [min(lons), min(lats), max(lons), max(lats)]
525+
526+
# Allow for differences that would be invisible when rounded to 6 decimal places
527+
# 1e-6 would be exactly at the 6th decimal place, so use 5e-7 to be just under that threshold
528+
epsilon = 5e-7
529+
differences = [abs(bbox[i] - calc_bbox[i]) for i in range(4)]
530+
531+
if any(diff > epsilon for diff in differences):
532+
# Return False along with the calculated bbox, actual bbox, and the differences
533+
return (False, calc_bbox, bbox, differences)
534+
535+
return True
536+
463537
def check_searchable_identifiers(self) -> bool:
464538
"""Checks if the identifiers of a STAC item are searchable, i.e.,
465539
they only contain lowercase letters, numbers, hyphens, and underscores.
@@ -622,6 +696,62 @@ def create_best_practices_dict(self) -> Dict:
622696
msg_1 = "All items should have a geometry field. STAC is not meant for non-spatial data"
623697
best_practices_dict["null_geometry"] = [msg_1]
624698

699+
# best practices - check if bbox matches geometry
700+
bbox_check_result = self.check_bbox_matches_geometry()
701+
bbox_mismatch = False
702+
703+
if isinstance(bbox_check_result, tuple):
704+
bbox_mismatch = not bbox_check_result[0]
705+
else:
706+
bbox_mismatch = not bbox_check_result
707+
708+
if bbox_mismatch and config.get("check_bbox_geometry_match", True) == True:
709+
if isinstance(bbox_check_result, tuple):
710+
# Unpack the result
711+
_, calc_bbox, actual_bbox, differences = bbox_check_result
712+
713+
# Format the bbox values for display
714+
calc_bbox_str = ", ".join([f"{v:.6f}" for v in calc_bbox])
715+
actual_bbox_str = ", ".join([f"{v:.6f}" for v in actual_bbox])
716+
717+
# Create a more detailed message about which coordinates differ
718+
coordinate_labels = [
719+
"min longitude",
720+
"min latitude",
721+
"max longitude",
722+
"max latitude",
723+
]
724+
mismatch_details = []
725+
726+
# Use the same epsilon threshold as in check_bbox_matches_geometry
727+
epsilon = 5e-7
728+
729+
for i, (diff, label) in enumerate(zip(differences, coordinate_labels)):
730+
if diff > epsilon:
731+
mismatch_details.append(
732+
f"{label}: calculated={calc_bbox[i]:.6f}, actual={actual_bbox[i]:.6f}, diff={diff:.7f}"
733+
)
734+
735+
msg_1 = "The bbox field does not match the bounds of the geometry. The bbox should be the minimum bounding rectangle of the geometry."
736+
msg_2 = f"Calculated bbox from geometry: [{calc_bbox_str}]"
737+
msg_3 = f"Actual bbox in metadata: [{actual_bbox_str}]"
738+
739+
messages = [msg_1, msg_2, msg_3]
740+
if mismatch_details:
741+
messages.append("Mismatched coordinates:")
742+
messages.extend(mismatch_details)
743+
else:
744+
# If we got here but there are no visible differences at 6 decimal places,
745+
# add a note explaining that the differences are too small to matter
746+
messages.append(
747+
"Note: The differences are too small to be visible at 6 decimal places and can be ignored."
748+
)
749+
750+
best_practices_dict["bbox_geometry_mismatch"] = messages
751+
else:
752+
msg_1 = "The bbox field does not match the bounds of the geometry. The bbox should be the minimum bounding rectangle of the geometry."
753+
best_practices_dict["bbox_geometry_mismatch"] = [msg_1]
754+
625755
# check to see if there are too many links
626756
if (
627757
self.check_bloated_links(max_links=max_links)

stac_check/stac-check.config.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ linting:
1515
check_unlocated: true
1616
# best practices - recommend items have a geometry
1717
check_geometry: true
18+
# best practices - check if bbox matches the bounds of the geometry
19+
check_bbox_geometry_match: true
1820
# check to see if there are too many links
1921
bloated_links: true
2022
# best practices - check for bloated metadata in properties

0 commit comments

Comments
 (0)