Skip to content

Commit f70ae1e

Browse files
committed
Auto merge of #3648 - pietroalbini:seek-crates, r=jtgeibel
Implement seek-based pagination for filterless sortless crates endpoint This PR implements seek-based pagination for parts of the `/api/v1/crates` endpoint, and adds the infrastructure to parse `seek=` parameters in paginated APIs. The reason for this PR is that we have multiple crawlers scraping the whole `/api/v1/crates` endpoint every day, and offset-based pagination is not feasible after a high number of pages. To keep the implementation simple seek-based pagination is only supported for requests matching these constraints: * **No filters:** when the request is filtered the total results count is not the same as the total number of crates, so the current implementation to gather the number of results (`SELECT COUNT(*) FROM crates;`) does not work. This can be lifted in the future by refactoring how we gather the total number of results. * **No sorting other than alphanumeric:** this is just not implemented yet, we'd need to change the condition used to seek to the correct position. Note that for anything that's not alphanumeric we'll need to sort by `($param, id)`, not just `$param`, to ensure the results are consistently ordered (there could be multiple crates with the same download count). * **No search queries:** search queries complicate the SQL query quite a bit, introducing relevance sorting and hardcoding the exact match as the first result. Not worth implementing for the time being IMO. When the endpoint is called with a request that supports seeking, no `meta.prev_page` will be provided and `meta.next_page` will include the `seek` parameter instead of the `page` parameter. `meta.total` will continue to be provided as usual. The value of `seek` is a base64-encoded JSON value. The reason base64 is used is to signal to our API consumers the seek value is an implementation detail and they shouldn't manually create one. This is what other big providers do (for example GitHub). The reason to encode the value in JSON is to support more complex seek keys in the future: right now we're just encoding a bare number, and the JSON representation is the same as the ASCII representation. This PR is fully backward compatible: if clients just rely on `meta.next_page` they will transparently start using seek-based pagination, but offset-based pagination still works. When the `page=` attribute is provided offset-based pagination will be forced, and `meta.next_page` will include `page` instead of `seek`. This way, clients manually creating `?page=` URLs will not break. One unfortunate aspect of this PR is that we still have to provide `meta.total`, as that's required by the Cargo registries API. Getting the total count requires a full scan over the table, which right now in production is done using an index-only scan in ~20ms. The PR is best reviewed commit-by-commit. r? `@jtgeibel`
2 parents d608b13 + 5cfdbca commit f70ae1e

File tree

7 files changed

+419
-106
lines changed

7 files changed

+419
-106
lines changed

src/controllers/category.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ pub fn index(req: &mut dyn RequestExt) -> EndpointResult {
1111
// FIXME: There are 69 categories, 47 top level. This isn't going to
1212
// grow by an OoM. We need a limit for /summary, but we don't need
1313
// to paginate this.
14-
let options = PaginationOptions::new(req)?;
14+
let options = PaginationOptions::builder().gather(req)?;
1515
let offset = options.offset().unwrap_or_default();
1616
let sort = query.get("sort").map_or("alpha", String::as_str);
1717

0 commit comments

Comments
 (0)