Skip to content

summary: Fill recent_downloads fields #2452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2020

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Apr 26, 2020

This will allow us to show something like "3,442 downloads in the past 90 days" in the "Recent Downloads" list on the frontpage.

r? @jtgeibel

@jtgeibel
Copy link
Member

jtgeibel commented May 2, 2020

I have 2 potential concerns here, one technical and one social. On the technical side, the change looks fine to me but I haven't had a chance to look at EXPLAIN output for the query. This is already one of the slower endpoints so I wouldn't want to regress performance here.

On the social side, I'm a bit worried that displaying this information prominently could encourage people to seek to inflate their download counts. Nothing really prevents such bad behavior today, but displaying an exact value of this metric could be perceived by some as a target to be reached. (On the other hand, we already expose this metric for crates via the API so this data is already available, just not currently shown in the UI as far as I am aware.)

I'll add this to the agenda to discuss with the team.

@Turbo87
Copy link
Member Author

Turbo87 commented May 2, 2020

This is already one of the slower endpoints so I wouldn't want to regress performance here.

I basically just tried to mimic the code that was already existing for the crates endpoint. I was hoping that it was fast enough 😅

The good news is that on the frontend side we've lately added two changes that make a slightly slower response time on this request not as bad anymore. First, we cache the result of the request for the runtime of the app so that we only need to request it once and not for every visit of the frontpage. And second, we no longer block the rendering of the rest of the frontpage on the response of that request. Instead, we display placeholders for those parts of the frontpage that are still loading.

just not currently shown in the UI as far as I am aware

Bildschirmfoto 2020-05-02 um 19 34 02

we already display it in the crate rows for search pages, etc. and we have it as a sort option for most of the pages. if we didn't show the the top10 recent downloads crates on the frontpage I'd say we didn't need it, but since we do it seems strange to display a ranking, but not the actual numbers that are related to that ranking.

@jtgeibel
Copy link
Member

jtgeibel commented May 2, 2020

Okay, I'll rescind my concerns about showing recent downloads. I think this just needs a few tweaks for performance reasons. For the final query, would you switch back to an inner join and drop the call to nulls_last? In particular, it looks like NULLS LAST causes the query to use a sequential scan of all crates, instead of an index scan. (I saw no regressions in the other queries.)

Original query

EXPLAIN ANALYZE SELECT "crates"."id", "crates"."name", "crates"."updated_at", "crates"."created_at", "crates"."downloads", "crates"."description", "crates"."homepage", "crates"."documentation", "crates"."repository", "crates"."max_upload_size" FROM ("crates" INNER JOIN "recent_crate_downloads" ON "recent_crate_downloads"."crate_id" = "crates"."id") ORDER BY "recent_crate_downloads"."downloads" DESC LIMIT 10;
                                                                                           QUERY PLAN                                                                                           
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.14..14.60 rows=10 width=213) (actual time=0.070..0.335 rows=10 loops=1)
   ->  Nested Loop  (cost=0.14..57030.24 rows=39432 width=213) (actual time=0.069..0.331 rows=10 loops=1)
         ->  Index Scan Backward using index_recent_crate_downloads_by_downloads on recent_crate_downloads  (cost=0.06..1794.15 rows=39536 width=12) (actual time=0.029..0.117 rows=10 loops=1)
         ->  Index Scan using packages_pkey on crates  (cost=0.08..1.40 rows=1 width=205) (actual time=0.021..0.021 rows=1 loops=10)
               Index Cond: (id = recent_crate_downloads.crate_id)
 Planning Time: 0.701 ms
 Execution Time: 0.400 ms

Current query

EXPLAIN ANALYZE SELECT "crates"."id", "crates"."name", "crates"."updated_at", "crates"."created_at", "crates"."downloads", "crates"."description", "crates"."homepage", "crates"."documentation", "crates"."repository", "crates"."max_upload_size", "recent_crate_downloads"."downloads" FROM ("crates" LEFT OUTER JOIN "recent_crate_downloads" ON "recent_crate_downloads"."crate_id" = "crates"."id") ORDER BY "recent_crate_downloads"."downloads" DESC NULLS LAST LIMIT 10;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=33688.41..33688.41 rows=10 width=213) (actual time=143.095..143.101 rows=10 loops=1)
   ->  Sort  (cost=33688.41..33708.12 rows=39432 width=213) (actual time=143.093..143.094 rows=10 loops=1)
         Sort Key: recent_crate_downloads.downloads DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 29kB
         ->  Hash Left Join  (cost=725.98..33517.99 rows=39432 width=213) (actual time=12.186..129.100 rows=39537 loops=1)
               Hash Cond: (crates.id = recent_crate_downloads.crate_id)
               ->  Seq Scan on crates  (cost=0.00..32771.30 rows=39432 width=205) (actual time=0.009..100.651 rows=39537 loops=1)
               ->  Hash  (cost=587.61..587.61 rows=39536 width=12) (actual time=12.089..12.090 rows=39536 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 2211kB
                     ->  Seq Scan on recent_crate_downloads  (cost=0.00..587.61 rows=39536 width=12) (actual time=0.007..5.914 rows=39536 loops=1)
 Planning Time: 0.322 ms
 Execution Time: 143.150 ms

My proposed tweaks

EXPLAIN ANALYZE SELECT "crates"."id", "crates"."name", "crates"."updated_at", "crates"."created_at", "crates"."downloads", "crates"."description", "crates"."homepage", "crates"."documentation", "crates"."repository", "crates"."max_upload_size", "recent_crate_downloads"."downloads" FROM ("crates" INNER JOIN "recent_crate_downloads" ON "recent_crate_downloads"."crate_id" = "crates"."id") ORDER BY "recent_crate_downloads"."downloads" DESC LIMIT 10;
                                                                                           QUERY PLAN                                                                                           
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.14..14.60 rows=10 width=213) (actual time=0.025..0.109 rows=10 loops=1)
   ->  Nested Loop  (cost=0.14..57030.24 rows=39432 width=213) (actual time=0.023..0.106 rows=10 loops=1)
         ->  Index Scan Backward using index_recent_crate_downloads_by_downloads on recent_crate_downloads  (cost=0.06..1794.15 rows=39536 width=12) (actual time=0.009..0.032 rows=10 loops=1)
         ->  Index Scan using packages_pkey on crates  (cost=0.08..1.40 rows=1 width=205) (actual time=0.007..0.007 rows=1 loops=10)
               Index Cond: (id = recent_crate_downloads.crate_id)
 Planning Time: 0.286 ms
 Execution Time: 0.141 ms

@Turbo87
Copy link
Member Author

Turbo87 commented May 2, 2020

I'll see if I can get that to compile... 😅

@Turbo87 Turbo87 force-pushed the recent-download branch from e7d791c to 375c058 Compare May 3, 2020 07:10
@Turbo87
Copy link
Member Author

Turbo87 commented May 3, 2020

@jtgeibel done, thanks for the help! :)

@jtgeibel
Copy link
Member

jtgeibel commented May 3, 2020

LGTM!

@bors r+

@bors
Copy link
Contributor

bors commented May 3, 2020

📌 Commit 375c058 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented May 3, 2020

⌛ Testing commit 375c058 with merge dc9bdd5...

@bors
Copy link
Contributor

bors commented May 3, 2020

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing dc9bdd5 to master...

@bors bors merged commit dc9bdd5 into rust-lang:master May 3, 2020
@Turbo87 Turbo87 deleted the recent-download branch May 5, 2020 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants