You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Posted in Gitter:
Hi. I encountered a very weird problem when using osm4scala which I cannot really explain :-(.
I have a PBF which has a node with id 5103977631
# osmium getid input.osm.pbf n5103977631 -o /tmp/output.osm.pbf
[======================================================================] 100%
# osmium cat /tmp/output.osm.pbf -f osm
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.13.1">
<bounds minlat="-90" minlon="-180" maxlat="90" maxlon="180"/>
<node id="5103977631" version="1" timestamp="2017-09-13T19:57:39Z" uid="74746" changeset="52018502" lat="26.1914693" lon="-81.689915"/>
</osm>
When reading the same input PBF using osm4scala, I can perfectly read the same node:
spark.read.format("osm.pbf")
.load("/mnt/data/input.osm.pbf")
.filter("type == 0")
.select("id","type","latitude","longitude","nodes","relations","tags")
.filter("id == 5103977631")
+----------+----+-----------------+------------------+-----+---------+----+
| id|type| latitude| longitude|nodes|relations|tags|
+----------+----+-----------------+------------------+-----+---------+----+
|5103977631| 0|26.1914693 |-81.689915 | []| []| {}|
+----------+----+-----------------+------------------+-----+---------+----+
However, when I add the column "info" in the select cause, I'm getting this:
spark.read.format("osm.pbf")
.load("/mnt/data/input.osm.pbf")
.filter("type == 0")
.select("id","type","latitude","longitude","nodes","relations","tags","info")
.filter("id == 5103977631")
+---+----+--------+---------+-----+---------+----+----+
| id|type|latitude|longitude|nodes|relations|tags|info|
+---+----+--------+---------+-----+---------+----+----+
+---+----+--------+---------+-----+---------+----+----+
=> suddenly, the node can no longer be found?
So you would assume something is wrong with the "info" column, right? Let's try removing the "tags" column and we keep the "info" column
Bug reason:
There are 4 bytes before every header that says the length of the header.
If the start of a split is between these bytes, the next header (so the next blob as well) is ignored.
Usually, because there are not a lot of blocks (17.126 on the planet from Geofabrik), the chance to get this error is really low. But because in pbfs file with a lot of small blocks (let's say million), then the chances to get the error is high.
Uh oh!
There was an error while loading. Please reload this page.
Posted in Gitter:
Hi. I encountered a very weird problem when using osm4scala which I cannot really explain :-(.
I have a PBF which has a node with id 5103977631
However, when I add the column "info" in the select cause, I'm getting this:
So you would assume something is wrong with the "info" column, right? Let's try removing the "tags" column and we keep the "info" column
The node can be found again??? Hu! :-D
Some environment details:
Somebody an idea what I'm doing wrong?
The text was updated successfully, but these errors were encountered: