Skip to content

Support u64 version parts in SQL function semver_no_prerelease #1846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 2, 2019

Conversation

carols10cents
Copy link
Member

@carols10cents carols10cents commented Sep 19, 2019

Fixes #1839.

So the reverse dependencies API request for crc32fast started failing because the susu crate depends on crc32fast and has published versions like 0.1.20190509190436.

In the logs for the request that were responding with 500, I saw a message that said value "20190409163015" is out of range for type integer 😰

So I went digging, and after isolating which query was the problem and then which part of the query was the problem, I figured out it was this call to to_semver_no_prerelease.

That function was defined in this migration, which defines a type semver_triple as PostgreSQL type int4, which corresponds to Rust type i32. And 20,190,409,163,015 > 2,147,483,647 😰

The SemVer spec doesn't actually specify the size of integers that should be used for the version parts.

The semver crate, that we use in the Rust structures, uses u64. Cargo uses the semver crate as well, and cargo build doesn't work if you give the current crate a version number with one part set to the value of std::u64::MAX + 1.

PostgreSQL's integer types include an int8 type, but it's a signed integer, so that wouldn't support values between i64::MAX and u64::MAX that we need to support. So we have to use the numeric type.

Here's a travis build showing the test failing

@rust-highfive
Copy link

r? @jtgeibel

(rust_highfive has picked a reviewer for you, use r? to override)

@sgrif
Copy link
Contributor

sgrif commented Sep 23, 2019

Do you have some perf numbers on how this affects the query on a large crate like serde? I'd also be interested in comparing this with a version that casts to int8 and rescues the exception with null

@carols10cents
Copy link
Member Author

The changes in this PR add 20ms to the query to get serde's reverse dependencies; EXPLAIN ANALYZE output below. I don't think this is a significant performance degradation.

Query used for analysis (reverse dependencies of serde on a recent production snapshot locally)
SELECT *, COUNT(*) OVER () as total FROM (
    SELECT DISTINCT ON (crate_downloads, crate_name)
    dependencies.*,
    crates.downloads AS crate_downloads,
    crates.name AS crate_name
    FROM dependencies
    INNER JOIN (
        SELECT versions.*,
        row_number() OVER (
            PARTITION BY crate_id
            ORDER BY to_semver_no_prerelease(num) DESC NULLS LAST
        ) rn
        FROM versions
        WHERE NOT yanked
        AND crate_id = ANY(
            SELECT versions.crate_id
            FROM versions
            INNER JOIN dependencies
            ON dependencies.version_id = versions.id
            WHERE dependencies.crate_id = 463
        )
    ) versions
      ON versions.id = dependencies.version_id
    INNER JOIN crates
      ON crates.id = versions.crate_id
    WHERE dependencies.crate_id = 463
      AND rn = 1
    ORDER BY crate_downloads DESC
) t
OFFSET 0
LIMIT 10
;
EXPLAIN ANALYZE output before this change (Execution Time: 720.728 ms)
                                                                                                     QUERY PLAN

Limit (cost=141754.79..141755.09 rows=10 width=90) (actual time=714.269..714.280 rows=10 loops=1)
-> WindowAgg (cost=141754.79..141760.76 rows=199 width=90) (actual time=714.269..714.271 rows=10 loops=1)
-> Unique (cost=141754.79..141756.29 rows=199 width=82) (actual time=711.451..713.341 rows=5274 loops=1)
-> Sort (cost=141754.79..141755.29 rows=199 width=82) (actual time=711.450..711.846 rows=5336 loops=1)
Sort Key: crates.downloads DESC, crates.name
Sort Method: quicksort Memory: 945kB
-> Nested Loop (cost=86852.68..141747.19 rows=199 width=82) (actual time=616.426..703.089 rows=5336 loops=1)
-> Nested Loop (cost=86852.40..140907.52 rows=199 width=71) (actual time=616.412..685.551 rows=5336 loops=1)
-> Subquery Scan on versions (cost=86851.97..134424.41 rows=842 width=8) (actual time=616.373..667.874 rows=5447 loops=1)
Filter: (versions.rn = 1)
Rows Removed by Filter: 42083
-> WindowAgg (cost=86851.97..132319.43 rows=168398 width=173) (actual time=616.371..662.615 rows=47530 loops=1)
-> Sort (cost=86851.97..87272.97 rows=168398 width=40) (actual time=616.345..624.877 rows=47530 loops=1)
Sort Key: versions_1.crate_id, (to_semver_no_prerelease((versions_1.num)::text)) DESC NULLS LAST
Sort Method: external merge Disk: 2456kB
-> Hash Join (cost=18816.05..67627.75 rows=168398 width=40) (actual time=198.748..525.807 rows=47530 loops=1)
Hash Cond: (versions_1.crate_id = versions_2.crate_id)
-> Seq Scan on versions versions_1 (cost=0.00..4396.73 rows=168398 width=14) (actual time=0.013..34.578 rows=166778 loops=1)
Filter: (NOT yanked)
Rows Removed by Filter: 5952
-> Hash (cost=18593.68..18593.68 rows=17789 width=4) (actual time=198.431..198.431 rows=5537 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 451kB
-> HashAggregate (cost=18415.79..18593.68 rows=17789 width=4) (actual time=196.593..197.438 rows=5537 loops=1)
Group Key: versions_2.crate_id
-> Hash Join (cost=8007.28..18316.29 rows=39802 width=4) (actual time=105.373..187.666 rows=38157 loops=1)
Hash Cond: (dependencies_1.version_id = versions_2.id)
-> Bitmap Heap Scan on dependencies dependencies_1 (cost=748.89..9959.42 rows=39802 width=4) (actual time=3.839..44.720 rows=38157 loops=1)
Recheck Cond: (crate_id = 463)
Heap Blocks: exact=8146
-> Bitmap Index Scan on index_dependencies_crate_id (cost=0.00..738.94 rows=39802 width=0) (actual time=2.898..2.898 rows=38157 loops=1)
Index Cond: (crate_id = 463)
-> Hash (cost=4396.73..4396.73 rows=174373 width=8) (actual time=101.047..101.048 rows=172730 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 2711kB
-> Seq Scan on versions versions_2 (cost=0.00..4396.73 rows=174373 width=8) (actual time=0.010..47.729 rows=172730 loops=1)
-> Index Scan using dependencies_crate_id_version_id_idx on dependencies (cost=0.42..7.69 rows=1 width=67) (actual time=0.003..0.003 rows=1 loops=5447)
Index Cond: ((crate_id = 463) AND (version_id = versions.id))
-> Index Scan using packages_pkey on crates (cost=0.29..4.22 rows=1 width=19) (actual time=0.003..0.003 rows=1 loops=5336)
Index Cond: (id = versions.crate_id)
Planning Time: 2.195 ms
Execution Time: 720.728 ms
(40 rows)

EXPLAIN ANALYZE output after this change (Execution Time: 739.230 ms)
                                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=141754.79..141755.09 rows=10 width=90) (actual time=733.653..733.667 rows=10 loops=1)
   ->  WindowAgg  (cost=141754.79..141760.76 rows=199 width=90) (actual time=733.644..733.649 rows=10 loops=1)
         ->  Unique  (cost=141754.79..141756.29 rows=199 width=82) (actual time=728.943..732.357 rows=5274 loops=1)
               ->  Sort  (cost=141754.79..141755.29 rows=199 width=82) (actual time=728.940..730.170 rows=5336 loops=1)
                     Sort Key: crates.downloads DESC, crates.name
                     Sort Method: quicksort  Memory: 945kB
                     ->  Nested Loop  (cost=86852.68..141747.19 rows=199 width=82) (actual time=615.490..718.231 rows=5336 loops=1)
                           ->  Nested Loop  (cost=86852.40..140907.52 rows=199 width=71) (actual time=615.480..698.636 rows=5336 loops=1)
                                 ->  Subquery Scan on versions  (cost=86851.97..134424.41 rows=842 width=8) (actual time=615.457..679.089 rows=5447 loops=1)
                                       Filter: (versions.rn = 1)
                                       Rows Removed by Filter: 42083
                                       ->  WindowAgg  (cost=86851.97..132319.43 rows=168398 width=173) (actual time=615.454..674.191 rows=47530 loops=1)
                                             ->  Sort  (cost=86851.97..87272.97 rows=168398 width=40) (actual time=615.402..626.794 rows=47530 loops=1)
                                                   Sort Key: versions_1.crate_id, (to_semver_no_prerelease((versions_1.num)::text)) DESC NULLS LAST
                                                   Sort Method: external merge  Disk: 2384kB
                                                   ->  Hash Join  (cost=18816.05..67627.75 rows=168398 width=40) (actual time=197.830..507.063 rows=47530 loops=1)
                                                         Hash Cond: (versions_1.crate_id = versions_2.crate_id)
                                                         ->  Seq Scan on versions versions_1  (cost=0.00..4396.73 rows=168398 width=14) (actual time=0.091..32.748 rows=166778 loops=1)
                                                               Filter: (NOT yanked)
                                                               Rows Removed by Filter: 5952
                                                         ->  Hash  (cost=18593.68..18593.68 rows=17789 width=4) (actual time=195.636..195.636 rows=5537 loops=1)
                                                               Buckets: 32768  Batches: 1  Memory Usage: 451kB
                                                               ->  HashAggregate  (cost=18415.79..18593.68 rows=17789 width=4) (actual time=193.655..194.582 rows=5537 loops=1)
                                                                     Group Key: versions_2.crate_id
                                                                     ->  Hash Join  (cost=8007.28..18316.29 rows=39802 width=4) (actual time=102.123..184.939 rows=38157 loops=1)
                                                                           Hash Cond: (dependencies_1.version_id = versions_2.id)
                                                                           ->  Bitmap Heap Scan on dependencies dependencies_1  (cost=748.89..9959.42 rows=39802 width=4) (actual time=13.055..54.528 rows=38157 loops=1)
                                                                                 Recheck Cond: (crate_id = 463)
                                                                                 Heap Blocks: exact=8146
                                                                                 ->  Bitmap Index Scan on index_dependencies_crate_id  (cost=0.00..738.94 rows=39802 width=0) (actual time=10.257..10.257 rows=38157 loops=1)
                                                                                       Index Cond: (crate_id = 463)
                                                                           ->  Hash  (cost=4396.73..4396.73 rows=174373 width=8) (actual time=87.490..87.490 rows=172730 loops=1)
                                                                                 Buckets: 131072  Batches: 4  Memory Usage: 2711kB
                                                                                 ->  Seq Scan on versions versions_2  (cost=0.00..4396.73 rows=174373 width=8) (actual time=0.076..43.125 rows=172730 loops=1)
                                 ->  Index Scan using dependencies_crate_id_version_id_idx on dependencies  (cost=0.42..7.69 rows=1 width=67) (actual time=0.003..0.003 rows=1 loops=5447)
                                       Index Cond: ((crate_id = 463) AND (version_id = versions.id))
                           ->  Index Scan using packages_pkey on crates  (cost=0.29..4.22 rows=1 width=19) (actual time=0.003..0.003 rows=1 loops=5336)
                                 Index Cond: (id = versions.crate_id)
 Planning Time: 3.351 ms
 Execution Time: 739.230 ms
(40 rows)

@jtgeibel
Copy link
Member

jtgeibel commented Oct 2, 2019

This looks good to me.

@bors r+

@bors
Copy link
Contributor

bors commented Oct 2, 2019

📌 Commit 5b42015 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Oct 2, 2019

⌛ Testing commit 5b42015 with merge ceb8c82...

bors added a commit that referenced this pull request Oct 2, 2019
Support u64 version parts in SQL function semver_no_prerelease

Fixes #1839.

So the reverse dependencies API request for crc32fast started failing because the susu crate depends on crc32fast and has [published versions](https://crates.io/crates/susu/versions) like 0.1.20190509190436.

In the logs for the request that were responding with 500, I saw a message that said `value "20190409163015" is out of range for type integer` 😰

So I went digging, and after isolating which query was the problem and then which part of the query was the problem, I figured out it was [this call to `to_semver_no_prerelease`](https://github.com/rust-lang/crates.io/blob/12c64bc716a4a7d1fd25f2a658f80cbdcbdcbf08/src/models/krate_reverse_dependencies.sql#L15).

That function was defined in [this migration](https://github.com/rust-lang/crates.io/blob/a27c704faa2982ddd75a3dc564da85c0217b950e/migrations/20170308140537_create_to_semver_no_prerelease/up.sql), which defines a type `semver_triple` as PostgreSQL type `int4`, which corresponds to Rust type `i32`. And 20,190,409,163,015 > 2,147,483,647 😰

The SemVer spec [doesn't actually specify](https://semver.org/#does-semver-have-a-size-limit-on-the-version-string) the size of integers that should be used for the version parts.

The `semver` crate, that we use in the Rust structures, [uses `u64`](https://github.com/steveklabnik/semver/blob/15189a1d97911abc7e22fb2e7959df12d5d6f462/src/version.rs#L115-L121). Cargo uses the `semver` crate as well, and `cargo build` doesn't work if you give the current crate a version number with one part set to the value of `std::u64::MAX` + 1.

[PostgreSQL's integer types](https://www.postgresql.org/docs/9.1/datatype-numeric.html) include an `int8` type, but it's a signed integer, so that wouldn't support values between `i64::MAX` and `u64::MAX` that we need to support. So we have to use the `numeric` type.

[Here's a travis build showing the test failing](https://travis-ci.com/rust-lang/crates.io/jobs/237375256#L833)
@bors
Copy link
Contributor

bors commented Oct 2, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing ceb8c82 to master...

@bors bors merged commit 5b42015 into rust-lang:master Oct 2, 2019
@sgrif
Copy link
Contributor

sgrif commented Oct 2, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reverse dependency graph returns internal server error
5 participants