-
Notifications
You must be signed in to change notification settings - Fork 916
Refactor s3-benchmarks and include v1 transfermanager tests #2575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor s3-benchmarks and include v1 transfermanager tests #2575
Conversation
b7c297f
to
59f1b88
Compare
59f1b88
to
a7e4a94
Compare
SonarCloud Quality Gate failed. |
transferManager = S3TransferManager.builder() | ||
.s3ClientConfiguration(b -> b.targetThroughputInGbps(config.targetThroughput()) | ||
.minimumPartSizeInBytes(partSizeInMb)) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed S3TransferManager and used the internal S3CrtClient for testing because S3TransferManager doesn't allow customizing S3 client , and we can't reuse the same S3 crt client.
} | ||
|
||
private void warmUp() throws InterruptedException { | ||
logger.info(() -> "Starting to warm up"); | ||
|
||
for (int i = 0; i < WARMUP_ITERATIONS; i++) { | ||
for (int i = 0; i < PRE_WARMUP_ITERATIONS; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"warmup" or "pre-warmup"? The latter seems to imply:
- pre-warmup
- warmup
- actual run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually pre-warmup. I'll extract this logic out to make it more clear. Pre-warmup is just jvm/sdk warming up, sending downloading and uploading requests for smaller object. Warmup is to send the same request as the actual run. Warmup is needed for crt based s3 client because it takes sometime to resolve the IP addresses and create new connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the sake of simplicity, I would still consider merging all of this into the pure "warmup" phase, and running more iterations if needed.
import java.util.List; | ||
import software.amazon.awssdk.utils.Logger; | ||
|
||
public class BenchmarkUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final
.average() | ||
.orElse(0.0); | ||
|
||
double lowestLatency = metrics.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Percentiles might be interesting here as well, rather than just min/max.
https://guava.dev/releases/23.0/api/docs/com/google/common/math/Quantiles.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, good idea
|
||
public TransferManagerDownloadBenchmark(TransferManagerBenchmarkConfig config) { | ||
super(config); | ||
this.contentLength = s3Sync.headObject(b -> b.bucket(bucket).key(key)).contentLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should parameterize (i.e., @Param
) the file/object size. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the CLI is designed to only download or upload a single object based on the bucket and key in the input params. I think it would make more sense to parameterize it if we decide to run this as part of the release pipeline, and we can create a workflow for it, something like uploading objects of size 512MB, 1GB, xxx
and then downloading them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think different object sizes may have very different performance profiles, and it would be helpful to be able to use this benchmark to determine at what thresholds the CRT/v1 behave differently, or when TransferManager becomes faster than a standard getObject request (TransferManager may be slower for very small objects, for example). The CRT input params should not be a constraint here. We can do something like this:
// 1KB 1MB 10MB
@Param({"1024", "1048576", "10485760"})
public long objectSize;
@Setup(Level.Invocation)
public void setUp() {
// put object with objectSize
}
@Benchmark
public void benchmark() {
// get object
}
.destination(downloadPath)); | ||
download.completionFuture().join(); | ||
|
||
s3.getObject(b -> b.bucket(bucket).key(key), AsyncResponseTransformer.toFile(downloadPath)).join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we, or should we, delete this file as part of clean up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the file gets deleted on line 94. I'll add try-finally block and move it there
s3Client.shutdown(); | ||
} | ||
|
||
private void warmUp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not using JMH and its native @Warmup
support here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because I feel we need more customizations in s3 perf tests, for example, we send different requests (small object download/upload) for warmup here and we also have additional warmup steps for other tests.
|
||
private void downloadOnceToFile(List<Double> latencies) { | ||
Path downloadPath = new File(this.sourcePath).toPath(); | ||
long start = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider Instant.now()
and Duration
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are mainly measuring the elapsed time here. Is there any benefit of using Instant or Duration? I guess we could use System.nanoTime()
, which provides nanosecond precision, but seems a bit overkill as well to me since this is not a microbenchmark.
Closing this PR since it's not relevant anymore |
…f6f0fc458 Pull request: release <- staging/7b421d5b-857a-4a15-b619-9eaf6f0fc458
Description
Refactor s3-benchmarks and include v1 transfermanager tests
Motivation and Context
Testing
Screenshots (if appropriate)
Types of changes
Checklist
mvn install
succeedsLicense