Skip to content

Add an adapter for benchmark.oss_ci_benchmark_v3 #5921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 15, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Nov 15, 2024

The schema comes from https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql. An example S3 path is s3://ossci-benchmarks/v3/pytorch/pytorch/11850871071/33027181871/add_loop_eager_dynamic.json

I think we should figure out how to test changes to these replicator lambdas. Otherwise, we might lose some data if they break. Any thoughts?

@huydhn huydhn requested a review from clee2000 November 15, 2024 02:39
Copy link

vercel bot commented Nov 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Nov 15, 2024 7:32pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 15, 2024
@huydhn
Copy link
Contributor Author

huydhn commented Nov 15, 2024

@clee2000 Do you know if the file on S3 needs to be public accessible for the replication to work? i.e. https://ossci-benchmarks.s3.us-east-1.amazonaws.com/v3/pytorch/pytorch/11831781500/33022921916/add_loop_eager_dynamic.json needs to be public

@clee2000
Copy link
Contributor

@clee2000 Do you know if the file on S3 needs to be public accessible for the replication to work? i.e. https://ossci-benchmarks.s3.us-east-1.amazonaws.com/v3/pytorch/pytorch/11831781500/33022921916/add_loop_eager_dynamic.json needs to be public

It needs either public access or for the clickhouse role to have access. You can test if it works by performing the query in the query console or by running the script locally

@huydhn
Copy link
Contributor Author

huydhn commented Nov 15, 2024

Yeah, I have been able to run this locally and add some records to that table from S3

extra_info Map(String, String)
),
"""
general_adapter(table, bucket, key, schema, ["none"], "JSONEachRow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you use gzip in the upload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought that this is the format of the source files on S3, they aren't gzipped, but let me add gzip here

@huydhn huydhn merged commit c2d5609 into main Nov 15, 2024
7 checks passed
@huydhn huydhn deleted the upload-benchmark-results-ch branch November 15, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants