Skip to content

Proposal: Add path column to categories table as ltree type to make tree traversing easier #1122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Aug 30, 2018
Merged

Proposal: Add path column to categories table as ltree type to make tree traversing easier #1122

merged 22 commits into from
Aug 30, 2018

Conversation

erewok
Copy link
Contributor

@erewok erewok commented Oct 11, 2017

Description

This is a proposal inspired by tickets (1, and 2) with the goal of eventually revealing more information about categories on pages where category information is available. It is also motivated by the current application code surrounding categories, which contains lots of SQL with operations like this: split_part(c2.slug, '::', 1).

This PR is intended as a suggestion for what to do with the categories table.

Following are changes included here:

  • Adds a path column as a Postgresql ltree type to the categories table in order to more easily query trees of categories and subcategories. This allows navigating trees of arbitrary depth and potentially more ergonomic path-finding queries.

  • Adds a parent_categories method to Category, which returns parent categories in order of traversal from root to this node.

Trade-offs to this Approach

Pros

  • Easily handle trees of arbitrary depth, ex: "category::sub-category::sub-catgeory-A::sub-category-A-1".
  • Makes queries to find a whole tree of nodes simpler (and possibly quicker?). Ex:
    select slug, path from categories where path <@ 'root.web_programming';
  • Makes finding all root-level nodes simpler (assuming one is cool with the syntax). Ex:
    select slug, path from categories where path ~ 'root.*{1}';
  • Makes finding the direct path from the root node to a particular node easier (see parent_categories method). Ex:
    select slug, path from categories where path @> 'root.web_programming.http_client' order by path;
  • Invisible to current application code: nothing needs to be updated to handle the path column because it happens via database triggers. (This could also be a con depending on one's perspective.)

Cons

  • Postgresql only (harder to switch databases in the future)
  • Makes certain characters in slugs disallowed: [',",-]. Path labels must consist of "A-Za-z0-9_" separated by "." We swap - for _ and :: for . in the included Postgresql procedure.
  • This path column is nowhere in application code, so this is all hand-written SQL (but the current solution is doing that as well).
  • Error messages are opaque if you happen to have a bad character in a slug:
cargo_registry=# update categories set slug = 'test-new::bla$bla-::parent::child' where slug = 'test-new::parent::child';
ERROR:  syntax error at position 17
CONTEXT:  PL/pgSQL function set_category_path_to_slug() line 3 at assignment

Notes

  • Needs a test for the parent_categories method.

@erewok erewok changed the title Proposal: Add path column as ltree type to make tree traversing easier Proposal: Add path column to categories table as ltree type to make tree traversing easier Oct 11, 2017
src/category.rs Outdated
@@ -161,6 +162,24 @@ impl Category {
).bind::<Text, _>(&self.category)
.load(conn)
}

pub fn parent_categories(&self, conn: &PgConnection) -> QueryResult<Vec<Category>> {
Copy link
Contributor

@vignesh-sankaran vignesh-sankaran Oct 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc comment here outlining what this function does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know I thought I had done that but I forgot about the three-slashes thing. Sorry about that.

@kivikakk
Copy link
Contributor

I've now adapted #1130 into this PR! Parent category traversal is now easier (and accounts for arbitrary category tree depth).

@carols10cents
Copy link
Member

Sorry this has taken me so long to look at-- my first impression is WHERE HAS LTREE BEEN ALL MY LIFE?!!?!? Like seriously, is this new? If not, I am irrationally angry at everyone i've interacted with who has known about this feature and not told me about it. I'm now going to start telling people as soon as I meet them-- "Hi, I'm Carol, have you heard about Postgres' ltree?"

Digging into the code and trying this out now :)

@erewok
Copy link
Contributor Author

erewok commented Oct 18, 2017

@carols10cents ltree has been part of Postgresql since at least 2004, I think. I only found out about it in the last couple of years and I have been surprised how rarely people recommend it or even know about it because it seems like a surprisingly useful data type. My theory is that Postgresql offers so many different options that it's difficult to actually know about all of them. For instance, I'm only aware of a handful of the myriad index types and I feel like I don't hear mention of those either.

The syntax, perhaps, takes a bit to get used to, but I find it pretty ergonomic myself:

@> some_path: means a path that's greater than or equal to some_path.
<@ some_path: means a path that's less than or equal to some_path.
~ some_path: looks sort of like a regex comparison, usually?

In this PR, I added root. to all the constructed paths, and this makes path-finding a whole lot easier (as long as no one ever adds a new category with a slug called root!) because you can find all top-level categories by simply querying all categories where the path is a direct descendent of root.. That looks like this:

select * from categories where path ~ 'root.*{1}';

Copy link
Member

@carols10cents carols10cents left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and it's working great too! The migration worked, adding crates to categories worked, adding new categories to categories.toml and restarting the server worked!

I also like how this easily enabled the parent category links!!

The only reason I'm not going to merge this right now is that when I generated the schema after running the migration, by running diesel print-schema --with-docs > src/schema.rs, I couldn't then use that schema because diesel doesn't define an Ltree type (for the type that it happily prints out).

I've filed a bug over in diesel's repo: diesel-rs/diesel#1262 and we'll see what they say :)

Thank you for this work, both of you!!!!


-- Create some indices that allow us to use GIST operators: '@>', etc
CREATE INDEX path_gist_categories_idx ON categories USING GIST(path);
CREATE INDEX path_categories_idx ON categories USING btree(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have both a gist and a btree index? The postgres docs say:

F.21.3. Indexes
ltree supports several types of indexes that can speed up the indicated operators:

  • B-tree index over ltree: <, <=, =, >=, >
  • GiST index over ltree: <, <=, =, >=, >, @>, <@, @, ~, ?

which looks like gist indexes take care of the same operators that btree indexes take care of?

All of this is probably moot anyway since the categories table is so smalll :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I've just seen everyone do it that way, so I figured it was a best practice, if not necessary.

@@ -1,8 +1,11 @@
{{ title category.category ' - Categories' }}

<div id='crates-heading'>
{{svg-jar "crate"}}
<h1>{{ category.category }}</h1>
{{#link-to "categories"}}{{svg-jar "crate"}}{{/link-to}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat easter-egg :)

@erewok
Copy link
Contributor Author

erewok commented Oct 18, 2017

thanks @carols10cents and thanks to @kivikakk for working on the frontend.

One thing I considered was rewriting the other queries that do split_part(some_value, "::", 1) in order for them to be all be consistent. I didn't do that because I wasn't sure if there would be buy-in for these changes. I knew there would be an issue with diesel, and I didn't want to rock the boat too much, even though I felt like this would be a useful direction to go in.

@sgrif
Copy link
Contributor

sgrif commented Oct 18, 2017

You can just delete the path column from schema.rs, or create struct Ltree; somewhere and use it at the top of the table declaration.

@kivikakk
Copy link
Contributor

kivikakk commented Oct 19, 2017

It looks like print-schema won't automatically do what we want with extensions (going by this and #1133).

I've created a super basic diesel_ltree crate with an Ltree struct that does what we want (i.e. almost nothing right now), and manually added it to schema.rs in b7c49f8.

(Feel free to back this commit out if we don't want this after all.)

@kivikakk
Copy link
Contributor

@erewok

One thing I considered was rewriting the other queries that do split_part(some_value, "::", 1) in order for them to be all be consistent. I didn't do that because I wasn't sure if there would be buy-in for these changes. I knew there would be an issue with diesel, and I didn't want to rock the boat too much, even though I felt like this would be a useful direction to go in.

diesel_ltree now supports the common operators and functions on ltrees/lqueries, so we could indeed redo those idiomatically from here on out.

@sgrif
Copy link
Contributor

sgrif commented Oct 22, 2017

@kivikakk diesel_ltree needs to use http://docs.diesel.rs/diesel/pg/struct.PgMetadataLookup.html#method.lookup_type for its HasSqlType impls. Types from extensions do not have stable OIDs. I suspect that the only reason this isn't breaking things is because you aren't currently dealing with bind params of those types

@kivikakk
Copy link
Contributor

Using latest ltree which fixes this. 👍 Thanks sgrif.

@kivikakk
Copy link
Contributor

@carols10cents This is brought up to speed with master; would you mind taking another look? Just got an accessibility violation to clear up.

src/category.rs Outdated
use diesel::sql_types::Text;

sql_query(
"SELECT c.id, c.category, c.slug, c.description, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this whole thing into a .sql file and use include_str!, add some more newlines and indentation? The whole thing is very hard to parse. It's not obvious where the call to COALESCE ends for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done.

src/category.rs Outdated
#[test]
fn category_parent_categories_includes_path_to_node_with_count() {
let conn = pg_connection();
conn.batch_execute(
Copy link
Contributor

@sgrif sgrif Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Diesel's public API for this? (I need to update the other tests). You can write this as

insert_into(categories)
    .values(vec![
        (category.eq("Cat 1"), slug.eq("cat1"), crates_cnt.eq(1)),
        ...
    ])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this (and fixed up the other tests too).

src/lib.rs Outdated
@@ -5,7 +5,7 @@
//! [krate](krate/index.html), [user](user/index.html) and [version](version/index.html) modules.
#![deny(warnings)]
#![deny(missing_debug_implementations, missing_copy_implementations)]
#![recursion_limit = "128"]
#![recursion_limit = "256"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without:

error: recursion limit reached while expanding the macro `table_body`
   --> src/schema.rs:223:1
    |
223 | / table! {
224 | |     use diesel_full_text_search::{TsVector as Tsvector};
225 | |     use diesel_ltree::Ltree;
226 | |     use diesel::sql_types::*;
...   |
322 | |     }
323 | | }
    | |_^
    |
    = help: consider adding a `#![recursion_limit="256"]` attribute to your crate
    = note: this error originates in a macro outside of the current crate

error: Could not compile `cargo-registry`.

@kivikakk
Copy link
Contributor

kivikakk commented Jan 25, 2018

error[E0063]: missing field `parent_categories` in initializer of `category::EncodableCategoryWithSubcategories`
   --> src/category.rs:526:19
    |
526 |         let cat = EncodableCategoryWithSubcategories {
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `parent_categories`

There is no line 526 of category.rs? The same similar line where it actually exists does have a parent_categories? The line as highlighted has not existed in any parent of the current branch? Travis what are you doing

I'll try amend committing to get a different SHA and force push.

git fetch wasn't updating the right remote nEVER MIND

@erewok
Copy link
Contributor Author

erewok commented Jan 25, 2018

nice work @kivikakk. I was actually wondering about this PR the other day. Thank you for continuing to push it forward.

@kivikakk
Copy link
Contributor

kivikakk commented Feb 1, 2018

Brought this back up to date.

WHERE split_part(c.slug, '::', 1) = c.slug
GROUP BY c.id
sql_query(format!(
"SELECT c.id, c.category, c.slug, c.description, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about moving this into a separate file and include_str!ing it. We can also change this to call .bind rather than using string interpolation (other than the order by)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kivikakk
Copy link
Contributor

Thank you for the ✅, sgrif!

Let me know if there's anything else to be done to get this merged, if we're still happy with it.

@carols10cents
Copy link
Member

Thank you @erewok and @kivikakk for keeping this up to date, and sorry for the delay!

I tested this out and it looks like it's working great in the UI!

Buuuuut I noticed in the schema.rs that path was added as a column on the crates table, but if anything, it should be added to the categories table... but if I add it to the categories table by running diesel print-schema --with-docs > src/schema.rs and then manually adding use diesel_ltree::Ltree; within the table! call for categories (i.e. this commit) I then get errors, a bunch like this:

error[E0412]: cannot find type `Int4` in this scope
  --> src/schema.rs:83:15
   |
83 |         id -> Int4,
   |               ^^^^ did you mean `Into`?

So then I added use diesel::sql_types::*; in this commit to be more like the crates table!, which has custom types and works.

That gets rid of ~15 errors, but leaves me with 4 that look like this:

error[E0277]: the trait bound `(i32, std::string::String, std::string::String, std::string::String, i32, chrono::NaiveDateTime): diesel::Queryable<(diesel::sql_types::Integer, diesel::sql_types::Text, diesel::sql_types::Text, diesel::sql_types::Text, diesel::sql_types::Integer, diesel::sql_types::Timestamp, diesel_ltree::Ltree), _>` is not satisfied
  --> src/controllers/category.rs:42:10
   |
42 |         .first::<Category>(&*conn)?;
   |          ^^^^^ the trait `diesel::Queryable<(diesel::sql_types::Integer, diesel::sql_types::Text, diesel::sql_types::Text, diesel::sql_types::Text, diesel::sql_types::Integer, diesel::sql_types::Timestamp, diesel_ltree::Ltree), _>` is not implemented for `(i32, std::string::String, std::string::String, std::string::String, i32, chrono::NaiveDateTime)`

which, ok, the Category struct that's Queryable doesn't have the path column specified.

I tried adding path as type LTree in this commit, but then I get these errors:

error[E0277]: the trait bound `diesel_ltree::Ltree: diesel::Queryable<diesel_ltree::Ltree, diesel::pg::Pg>` is not satisfied
  --> src/controllers/category.rs:42:10
   |
42 |         .first::<Category>(&*conn)?;
   |          ^^^^^ the trait `diesel::Queryable<diesel_ltree::Ltree, diesel::pg::Pg>` is not implemented for `diesel_ltree::Ltree`
   |

The tests in diesel_ltree use Queryable with the path column as type String, so I tried that too in this commit, but then I get these errors:

error[E0277]: the trait bound `*const str: diesel::deserialize::FromSql<diesel_ltree::Ltree, diesel::pg::Pg>` is not satisfied
  --> src/controllers/category.rs:42:10
   |
42 |         .first::<Category>(&*conn)?;
   |          ^^^^^ the trait `diesel::deserialize::FromSql<diesel_ltree::Ltree, diesel::pg::Pg>` is not implemented for `*const str`
   |
   = help: the following implementations were found:
             <*const [u8] as diesel::deserialize::FromSql<diesel::sql_types::Binary, DB>>
             <*const str as diesel::deserialize::FromSql<diesel::sql_types::Text, DB>>

soooo @sgrif @kivikakk am I using diesel or diesel_ltree wrong? or is there something incompatible/missing here?

erewok added 2 commits June 20, 2018 20:04
* Add migration for ltree path column on categories table

* Add "ancestors" property to categories/:category_id endpoint
kivikakk and others added 5 commits June 20, 2018 23:15
Not sure why this is needed exactly, but the crates table does this sooo
This is necessary because:
- The path column is of type Ltree
- To get text out of an ltree column, it has to be selected using the
postgres function ltree2text
- We can't modify the select statement generated by the table! macro to
select columns with type ltree differently
- So we'd have to explicitly select columns for all Category queries
anyway
- But we never use categories.path, only query against it, so save all
this trouble and just never select it in the same way Crate never
selects the fulltext search columns.

There might be better ways to do this, or at the very least a better way
of defining the type aliases, but this is what I got working.
@carols10cents
Copy link
Member

carols10cents commented Jun 26, 2018

Ok, I think I've successfully figured this one out. It worked when the path column was accidentally on the crates table because the crates code was already explicitly selecting fewer than all columns; when the path column was added to the category table in the schema instead, the same don't-select-all-columns code needed to happen for category too.

@sgrif if you have a minute, could you review the "Explicitly never select categories.path column" commit of mine and make sure i did this right?

@kivikakk
Copy link
Contributor

It'd be nice if we could land this!

@carols10cents
Copy link
Member

Yes, yes it would. YOLO

bors: r+

@carols10cents
Copy link
Member

hMMMM why didn't that work....

bors: r+

bors-voyager bot added a commit that referenced this pull request Aug 29, 2018
1122: Proposal: Add `path` column to categories table as `ltree` type to make tree traversing easier r=carols10cents a=erewok

## Description

This is a proposal inspired by tickets ([1](#1093), and [2](#721)) with the goal of eventually revealing more information about categories on pages where category information is available. It is also motivated by the current application code surrounding categories, which contains lots of SQL with operations like this: `split_part(c2.slug, '::', 1)`.

This PR is intended as a suggestion for what to do with the `categories` table.

Following are changes included here:

- Adds a `path` column as a Postgresql `ltree` type to the `categories` table in order to more easily query trees of categories and subcategories. This allows navigating trees of arbitrary depth and potentially more ergonomic path-finding queries.

- Adds a `parent_categories` method to `Category`, which returns parent categories _in order of traversal_ from root to this node.

## Trade-offs to this Approach

### Pros

- Easily handle trees of arbitrary depth, ex: "category::sub-category::sub-catgeory-A::sub-category-A-1".
- Makes queries to find a whole tree of nodes simpler (and possibly quicker?). Ex:
  `select slug, path from categories where path <@ 'root.web_programming';`
- Makes finding all root-level nodes simpler (assuming one is cool with the syntax). Ex:
  `select slug, path from categories where path ~ 'root.*{1}';`
- Makes finding the direct path from the root node to a particular node easier (see `parent_categories` method). Ex:
  `select slug, path from categories where path @> 'root.web_programming.http_client' order by path;`
- Invisible to current application code: nothing needs to be updated to handle the `path` column because it happens via database triggers. (This could also be a _con_ depending on one's perspective.)

### Cons

- Postgresql only (harder to switch databases in the future)
- Makes certain characters in slugs disallowed: `[',",-]`. Path labels must consist of "A-Za-z0-9_" separated by "." We swap `-` for `_` and `::` for `.` in the included Postgresql procedure.
- This `path` column is nowhere in application code, so this is all hand-written SQL (but the current solution is doing that as well).
- Error messages are opaque if you happen to have a bad character in a slug:
```
cargo_registry=# update categories set slug = 'test-new::bla$bla-::parent::child' where slug = 'test-new::parent::child';
ERROR:  syntax error at position 17
CONTEXT:  PL/pgSQL function set_category_path_to_slug() line 3 at assignment
```

## Notes

- [x] Needs a test for the `parent_categories` method.

Co-authored-by: Erik Aker <[email protected]>
Co-authored-by: Ashe Connor <[email protected]>
Co-authored-by: Carol (Nichols || Goulding) <[email protected]>
@carols10cents
Copy link
Member

there we go. good bot!

@bors-voyager
Copy link
Contributor

bors-voyager bot commented Aug 29, 2018

Build failed

@carols10cents
Copy link
Member

hmmmm looks like rustfmt had differences and clippy failed to install? i get different results when running rustfmt locally, fun. we'll see if this fixes it

@carols10cents
Copy link
Member

ok let's try this one more time!

bors: r+

bors-voyager bot added a commit that referenced this pull request Aug 30, 2018
1122: Proposal: Add `path` column to categories table as `ltree` type to make tree traversing easier r=carols10cents a=erewok

## Description

This is a proposal inspired by tickets ([1](#1093), and [2](#721)) with the goal of eventually revealing more information about categories on pages where category information is available. It is also motivated by the current application code surrounding categories, which contains lots of SQL with operations like this: `split_part(c2.slug, '::', 1)`.

This PR is intended as a suggestion for what to do with the `categories` table.

Following are changes included here:

- Adds a `path` column as a Postgresql `ltree` type to the `categories` table in order to more easily query trees of categories and subcategories. This allows navigating trees of arbitrary depth and potentially more ergonomic path-finding queries.

- Adds a `parent_categories` method to `Category`, which returns parent categories _in order of traversal_ from root to this node.

## Trade-offs to this Approach

### Pros

- Easily handle trees of arbitrary depth, ex: "category::sub-category::sub-catgeory-A::sub-category-A-1".
- Makes queries to find a whole tree of nodes simpler (and possibly quicker?). Ex:
  `select slug, path from categories where path <@ 'root.web_programming';`
- Makes finding all root-level nodes simpler (assuming one is cool with the syntax). Ex:
  `select slug, path from categories where path ~ 'root.*{1}';`
- Makes finding the direct path from the root node to a particular node easier (see `parent_categories` method). Ex:
  `select slug, path from categories where path @> 'root.web_programming.http_client' order by path;`
- Invisible to current application code: nothing needs to be updated to handle the `path` column because it happens via database triggers. (This could also be a _con_ depending on one's perspective.)

### Cons

- Postgresql only (harder to switch databases in the future)
- Makes certain characters in slugs disallowed: `[',",-]`. Path labels must consist of "A-Za-z0-9_" separated by "." We swap `-` for `_` and `::` for `.` in the included Postgresql procedure.
- This `path` column is nowhere in application code, so this is all hand-written SQL (but the current solution is doing that as well).
- Error messages are opaque if you happen to have a bad character in a slug:
```
cargo_registry=# update categories set slug = 'test-new::bla$bla-::parent::child' where slug = 'test-new::parent::child';
ERROR:  syntax error at position 17
CONTEXT:  PL/pgSQL function set_category_path_to_slug() line 3 at assignment
```

## Notes

- [x] Needs a test for the `parent_categories` method.

Co-authored-by: Erik Aker <[email protected]>
Co-authored-by: Ashe Connor <[email protected]>
Co-authored-by: Carol (Nichols || Goulding) <[email protected]>
@bors-voyager
Copy link
Contributor

bors-voyager bot commented Aug 30, 2018

Build succeeded

@bors-voyager bors-voyager bot merged commit 81dff1e into rust-lang:master Aug 30, 2018
@kivikakk kivikakk deleted the category_tree_experimental branch August 30, 2018 23:21
@kivikakk
Copy link
Contributor

Yay!! ✨

@kivikakk
Copy link
Contributor

@carols10cents Thank you so much for pushing this through!

@sgrif
Copy link
Contributor

sgrif commented Aug 30, 2018

🎉

@carols10cents
Copy link
Member

carols10cents commented Aug 31, 2018 via email

@erewok
Copy link
Contributor Author

erewok commented Sep 4, 2018

Wow, nice work all. I completely forgot about this PR (sorry about that) and found it just now in my github unread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants