You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1312: Switch most recently downloaded summary query to be faster r=jtgeibel
Related to #1304.
Because we only want to show the top 10 for /api/v1/summary, move the
LIMIT within the query for the top downloads rather than joining on
everything and then limiting.
This should help alleviate at least some of the slowdowns we were seeing today with the /api/v1/summary call; there are other queries that are taking up some load that will need other mitigation strategies though :(
Here's an explain analyze from the old query on a production dump from today:
```
explain analyze SELECT "crates"."id", "crates"."name", "crates"."updated_at", "crates"."created_at", "crates"."downloads", "crates"."description", "crates"."homepage", "crates"."documentation", "crates"."readme", "crates"."readme_file", "crates"."license", "crates"."repository", "crates"."max_upload_size" FROM (
"crates" LEFT OUTER JOIN "crate_downloads" ON "crates"."id" = "crate_downloads"."crate_id" AND "crate_downloads"."date" > date(CURRENT_TIMESTAMP - INTERVAL '90 DAYS')
) GROUP BY "crates"."id" ORDER BY SUM(crate_downloads.downloads) DESC NULLS LAST LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=81546.14..81546.17 rows=10 width=858) (actual time=546.007..546.009 rows=10 loops=1)
-> Sort (cost=81546.14..81583.65 rows=15004 width=858) (actual time=546.006..546.007 rows=10 loops=1)
Sort Key: (sum(crate_downloads.downloads)) DESC NULLS LAST
Sort Method: top-N heapsort Memory: 37kB
-> GroupAggregate (cost=65223.31..81221.91 rows=15004 width=858) (actual time=273.398..539.672 rows=15004 loops=1)
Group Key: crates.id
-> Merge Left Join (cost=65223.31..78880.53 rows=438269 width=854) (actual time=273.331..485.878 rows=435230 loops=1)
Merge Cond: (crates.id = crate_downloads.crate_id)
-> Index Scan using packages_pkey on crates (cost=0.29..5950.28 rows=15004 width=850) (actual time=0.009..9.989 rows=15004 loops=1)
-> Materialize (cost=65223.03..67414.37 rows=438269 width=8) (actual time=273.317..386.191 rows=435229 loops=1)
-> Sort (cost=65223.03..66318.70 rows=438269 width=8) (actual time=273.314..332.477 rows=435229 loops=1)
Sort Key: crate_downloads.crate_id
Sort Method: external merge Disk: 7712kB
-> Index Scan using index_crate_downloads_date on crate_downloads (cost=0.44..18162.03 rows=438269 width=8) (actual time=0.017..112.539 rows=435229 loops=1)
Index Cond: (date > date((CURRENT_TIMESTAMP - '90 days'::interval)))
Planning time: 0.502 ms
Execution time: 605.679 ms
(17 rows)
```
And the new query:
```
explain analyze SELECT crates.* FROM crates JOIN ( SELECT crate_downloads.crate_id, SUM(crate_downloads.downloads) FROM crate_downloads WHERE crate_downloads.date > date(CURRENT_TIMESTAMP - INTERVAL '90 days') GROUP BY crate_downloads.crate_id ORDER BY SUM(crate_downloads.downloads) DESC NULLS LAST LIMIT 10 ) cd ON crates.id = cd.crate_id
production_crates_io_explain-# ORDER BY cd.sum DESC NULLS LAST;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=20323.24..20402.11 rows=10 width=1057) (actual time=180.585..180.682 rows=10 loops=1)
-> Limit (cost=20322.96..20322.98 rows=10 width=12) (actual time=180.537..180.539 rows=10 loops=1)
-> Sort (cost=20322.96..20350.82 rows=11146 width=12) (actual time=180.536..180.537 rows=10 loops=1)
Sort Key: (sum(crate_downloads.downloads)) DESC NULLS LAST
Sort Method: top-N heapsort Memory: 25kB
-> Finalize HashAggregate (cost=19970.64..20082.10 rows=11146 width=12) (actual time=176.709..178.856 rows=15003 loops=1)
Group Key: crate_downloads.crate_id
-> Gather (cost=17518.52..19859.18 rows=22292 width=12) (actual time=150.947..163.257 rows=44654 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial HashAggregate (cost=16518.52..16629.98 rows=11146 width=12) (actual time=146.173..149.575 rows=14885 loops=3)
Group Key: crate_downloads.crate_id
-> Parallel Index Scan using index_crate_downloads_date on crate_downloads (cost=0.44..15605.46 rows=182612 width=8) (actual time=0.045..86.019 rows=145076 loops=3)
Index Cond: (date > date((CURRENT_TIMESTAMP - '90 days'::interval)))
-> Index Scan using packages_pkey on crates (cost=0.29..7.90 rows=1 width=1049) (actual time=0.012..0.012 rows=1 loops=10)
Index Cond: (id = crate_downloads.crate_id)
Planning time: 1.253 ms
Execution time: 181.550 ms
(18 rows)
```
0 commit comments