Skip to content

Add GeoJson style map to counter style duplication #1046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sgvandijk
Copy link
Contributor

Often when I use a GeoJson, feature styling is based on a categorical/ordinal property with only a small number of distinct values. Currently when supplying a style function, a style dictionary is added to each feature, which possibly results in a lot of repetition and unnecessarily large output, especially when the GeoJson contains many more features than number of styles.

This PR introduces the GeoJsonStyleMap class to help combat that: it takes a dictionary with styles and creates a JavaScript style function that applies the styles on the client side, rather than adding the styles to each feature beforehand.

I still want to add some better input handling, tests, proper docstring, etc, but already wanted to put this forward to see if the approach is acceptable and get any feedback early :) it was inspired by the GeoJsonToolTip implementation.

@jtbaker
Copy link
Contributor

jtbaker commented Dec 29, 2018

@sgvandijk thanks for putting together this PR. The bloated styling embedded in GeoJson is something I've wanted to address for a while too, but hadn't thought of a good way to approach it - I was initially trying to think of a way to directly translate a python function -> JS function.

I think your style dictionary mapper approach is really good. I have a couple of additional ideas on possible functionality.

  1. Reordering the arguments so that they read left-to-right clearly in a logical manner so it's quite clear which property is being queried, the condition being set, etc - i.e. property_name="____", style_map={__:__}

  2. Adding an op or operator kwarg that takes string representations of bitwise operators, to allow us do to some additional comparative logic with quantitative values. Default to op="==".
    So perhaps take this line to something like:

if (val {{this.op}} {{key}}) { return {{this.style_map[key]}}; }
  1. Inherit 'default' style properties to each style mapper from default. To remove redundancy for properties you'd like to persist for every feature in the map - like weight, dash-array, etc. So update keys for the default mapper for each style where the {property} {operator value} {key value} returns {True/true(JS)}. Something like:
    for key in style_map:
        copy = default.copy()
        copy.update(style_map[key])
        style_map[key] = copy

Maybe wrap this ^ into a function so that the copy variable doesn't persist out into the global scope.

how I could envision it looking:

population_density_values={
    40: {'fillColor': 'lightblue'},
    100: {'fillColor': 'blue'},
    200: {'fillColor': 'darkblue'}
}

default = {'fillColor':'white', 'color': 'black', 'weight': 2}

my_style_mapper = GeoJsonStyleMap(property_name='density', op='>=', style_map=population_density_values, default=default)

Python will convert the numeric dictionary keys into strings, but JavaScript will still compare numeric strings to Number values fine.
edit:
Looks like python dictionary preserves the keys as Integer/booleans, etc. But json.dumps() writes the keys out with string quotes.

Let me know what you think about these ideas. Thanks for your contribution to the project!

@sgvandijk
Copy link
Contributor Author

Thanks for the feedback @jtbaker! I'll address your points:

  1. Reordering the arguments so that they read left-to-right clearly

I also didn't like this, but it was because I wanted to make it so that a feature's ID is used by default to determine the value, unless a property is given, so the default value is None, so it couldn't come before the style map. This was when I followed the examples in the notebook, where the ID is used, but now I realize that using the ID results in an if and style dictionary in the output for every single feature, which is exactly when this map shouldn't be used :) I will change it to make property_name obligatory and then can indeed put it first.

  1. Adding an op or operator kwarg

I'm hesitant about this, at least for this PR. I feel this would allow the user to break things too easily by providing some operator that is not valid (in JS), or makes the output more difficult to predict; for instance if > or similar comparison is used, the output highly depends on the ordering of the keys in the provided dictionary. For instance, your example already wouldn't do what you expect I think :) because the output will be like:

if (value >= 40) { return {...}; }
if (value >= 100) { return {...}; }
if (value >= 200) { return {...}; }

So if the value is greater or equal than 40, the other ifs will never be checked.

At the moment it is always possible to encode the more complex logic in an additional property (like I did in the notebook) and use equality on this property. I like it that that way it's all explicit on the Python side. I also have an idea how to more easily support different common style functions, by using different objects, I think I will build that out as an example here.

  1. Inherit 'default' style properties to each style mapper from default

That's a good idea, this can be easily done on the JS side, I will add that!

Regarding your comment about json.dumps: it doesn't add quotes around integers so it's all good (or boolean values, the original reason I also use dumps for the key):

In [2]: 'an integer: {}'.format(json.dumps(4))
Out[2]: 'an integer: 4'

@sgvandijk
Copy link
Contributor Author

I have broken up the class into a base and derived class, with another derived class, GeoJsonStylePropertyFunction, that is used in the original case when a plain function is passed in, and can be used separately to use the default style functionality to minimise duplication.

I thought about introducing more examples, such as a binning function that could be used for defining styles for ranges of values to implement @jtbaker's example, but decided that would be too much feature creep for this PR.

I have also added docstrings and basic tests, please have a look and let me know any further feedback!

@sgvandijk
Copy link
Contributor Author

Updated to fix merge conflicts. Possibly something for @talbertc-usgs and @Conengmo to look at too, because there was a conflict with 75a44c3, which has new features, cleanups and stylistic changes squashed together into one commit. Tests and notebooks seem fine, but it's more difficult to see what all changes are for and whether I've resolved conflicts correctly.

@Conengmo
Copy link
Member

Conengmo commented Jan 12, 2019

Hi @sgvandijk, thanks for making this PR! I have some comments I hope we can discuss.

which possibly results in a lot of repetition and unnecessarily large output

You're right, the current way is not very smart. But it's not that big of a problem I would say. I think usability is more important than filesize, so I'm critical of adding new ways of working, new syntax. Preferable there is only one way to style a GeoJson object. I think the current way a user provides a (lambda) style function is pretty straight forward, so I wouldn't want to make that more complicated without good reason.

What would be awesome if we could make embedding styling in the html more efficient without placing the burden of that on the user. Can we run the function a users passes to the style_function argument and make a better way to embed the styling? Something like:

  • Run style_function for each feature, get a list of style dictionaries.
  • Combine identical dictionaries.
  • Also create a mapping from feature id to these style dictionaries.
  • Store the style dictionaries and the mapping in Javascript.
  • Set the style for each feature using the mapping.

So in short I'm certainly not against this, but I want to keep it simple for the user (and ourselves as maintainers). I hope to hear your thoughts on this.

@Conengmo Conengmo added the in discussion This PR or issue is being discussed label Jan 12, 2019
@sgvandijk
Copy link
Contributor Author

Hi @Conengmo, thanks for the input, I think those are indeed good points. I also didn't want to take away the very convenient option to just pass in a lambda, this PR leaves that intact so there is no added complexity for those who don't want it. My aim is only to give more options to those who do care about the output size, like in our case where we regularly create and store maps from GeoJson with 1000s of features with a range of style properties set. This really adds to the size, into the megabytes; in the very simple US states example in the GeoJson example notebook, using GeoJsonStyleMap already removes more than 3KB.

I think your idea of creating a map automatically would be pretty cool. The work in this PR supports that as it captures some of those steps already. I would not add that to this PR though.

As a final note, part of my motivation for this work is also to be able to reinstate the ability to load GeoJson from an external source (in a separate PR), rather than having to embed it. With GeoJsonStyleMap, or another implementation of its base, this is possible again, by having the styling completely separated from the data.

Related to that, having a separate class (hierarchy) responsible for styling, away from the main GeoJson class, IMHO could make maintenance simpler too. Except if it results into a whole bunch of different classes, but those could be maintained in separate plugins perhaps.

Let me know what you think!

@Conengmo
Copy link
Member

part of my motivation for this work is also to be able to reinstate the ability to load GeoJson from an external source (in a separate PR), rather than having to embed it

That's a more important goal than skimming of style declaration size IMO. They are related though; if we can keep the style out of the geojson data, we don't need to embed the geojson data. (Except for in Jupyter notebooks.)

having a separate class (hierarchy) responsible for styling (...) could make maintenance simpler too. Except if it results into a whole bunch of different classes

I think your idea of creating a map automatically would be pretty cool. (...) I would not add that to this PR though.

I feel that the way the PR is now the added complexity doesn't outweigh the benefits. For me that's mainly due to 1. the new styling syntax and 2. the added classes. I think changing or expanding the syntax is a no-go for me unless really necessary or warranted.

If we agree that the goal is to have html files without embedded geojson data, than I'd rather regard this problem as a whole:

  1. Store styling separately from geojson data.
  2. Add option to have external geojson data.
  3. Reduce styling declaration size. (possibly during 1)

So I would like to invite you to think about how we can achieve 1 (and optionally 3 at the same time) without changing the syntax for the users. I appreciate you taking the time to help discuss this topic, getting this right is maybe not as straight forward as a simple plugin but will be rewarding I think.

@sgvandijk
Copy link
Contributor Author

Thanks again for the feedback. I definitely see your point about aiming to minimise complexity. However. I'm not sure I really see the difference between this and the new GeoJsonToolTip for instance, where for both you can still pass in a simple lambda/string, or you can use the bit more complex classes to achieve more complex things. Maybe because the added tool tip complexity adds more visual features? In our use cases, where we use folium to do more than show a one-off map for a single user in Jupyter, significantly reducing data bloat is at least as relevant as more nice visual features. But if that's not a priority for this project for now, that's fine. Though if you agree being able to avoid embedding is a priority, I again think you can't do without these extensions.

Because I don't think you will be able to generate HTML files without embedding while still be able to define styling without changing the syntax: if you restrict the user to only pass in a function you will always need to fetch the data in Python to create the right function in JS, which would defeat the purposes of not having the data embedded; if you only can pass in styling that is a function of the data, they cannot be separate and you won't achieve 1. Except maybe if you a) transpile the Python function to JS which is a nonstarter I'd think b) define a subset of functions that can generate the JS function in a deterministic way without having to see the data, which I've gone for, for which I don't see how you can get around having to extend the syntax.

I'd be very happy to be proven wrong though if you can think of a smart way to achieve it still :)

@Conengmo
Copy link
Member

Conengmo commented Jan 21, 2019

significantly reducing data bloat

Definitely a goal! Not first priority though, but important, let’s do this.

Your comment is very helpful. Your premise is that not embedding the data means not seeing it at all. I don’t agree with that, I think we should always see the data, even when we don’t embed it in the html. It’s possible, load the data in Python but don’t put it in the html.

The reason is that we should do all our parsing and error checking in Python, that’s where our users are and that’s where we can provide meaningful error messages. A lot of the issues here and on other channels are from users seeing blank maps, not knowing how to debug the JavaScript. I want to prevent that as much as possible, also to reduce the strain on the team having to handle the issues. That means checking whether a field name provided by a user actually matches the geojson data. We’ve not done that fully in Choropleth for example and you can see a lot of questions about that on Stackoverflow.

So what I’m trying to say is that I think it’s a given we should see the GeoJson data in Python, even when not embedding in the html.

It may be more efficient not to load the data, but I think a more important goal is to have any rendered html be error free, so all errors are shown to the user in Python.

I hope to hear your thoughts on this! I’ve also worked on a demonstrator which I’ll share later, might be interesting to discuss it.

@sgvandijk
Copy link
Contributor Author

a more important goal is to have any rendered html be error free, so all errors are shown to the user in Python

Fair enough, that's a good reason yeah, and can definitely understand your point as a maintainer to make it impossible to use things incorrectly. Our priorities are a bit different, where we'd like the instance running the Python app generating the map not having to be granted access to the datasource, or where generating the GeoJson is quite complex, so we want to present the map as fast as possible with other info and have the GeoJson loaded separately. I will have a look at putting this kind of things into a separate plugin, ExternalGeoJson or something like that, so it's separate from the core, and with warnings that if you get issues on the JS side you should use the standard option, or you're on your own ;) would you agree with that approach?

@Conengmo
Copy link
Member

Thanks for understanding. Can I ask what your use case is exactly? Interesting to know what folium is used for in practice.

It would be cool if we can accomodate more advanced usage as well. But I wonder, if speed is really important and you don’t have your data in Python, why not use Leaflet natively in JavaScript. You of course have your reasons so I’m not judging.

Interesting idea to create a sort of geojson class for experts without error checking, we could consider that. As a plugin, or possibly in the core.

Alternatively, we could consider adding a sort of “don’t do any checks” flag for the regular geojson class, after we applied the style embedding and data embedding changes. But whether that’s possible depends on how the code turns out.

@ocefpaf ocefpaf force-pushed the master branch 3 times, most recently from 53546b8 to 9f2299a Compare February 26, 2019 19:49
@Conengmo
Copy link
Member

Closing this since we tackled the style duplication and enabling not embedding geojson data in #1058.

Note that we chose to always load the geojson data in Python for error checking. If your use case requires speed, I reckon you can put together a custom class by inheriting from MacroElement. If you got something working it would be nice if you could share it by making an example notebook for in our gallery :)

@Conengmo Conengmo closed this Mar 24, 2019
@sgvandijk
Copy link
Contributor Author

I saw the work on on the deduplication, looks nice 👍 :)

If your use case requires speed, I reckon you can put together a custom class by inheriting from MacroElement. If you got something working it would be nice if you could share it by making an example notebook for in our gallery :)

Not meaning to drag this out further :) but this PR did do just that, it has a class based on MacroElement that gave some support for making custom style functions. That could be used for #1112 too for instance.

@Conengmo Conengmo removed the in discussion This PR or issue is being discussed label Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants