-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Add ReplaceSubrange benchmark #25310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
0f6d56c
e67c8b1
459861b
4d86f3f
082285f
0de61f0
4c0ea56
b82f6c5
b76169c
a9cbe2b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
//===--- StringReplaceSubrange.swift -------------------------------------------===// | ||
// | ||
// This source file is part of the Swift.org open source project | ||
// | ||
// Copyright (c) 2014 - 2019 Apple Inc. and the Swift project authors | ||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||
// | ||
// See https://swift.org/LICENSE.txt for license information | ||
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
import TestsUtils | ||
|
||
let tags: [BenchmarkCategory] = [.validation, .api, .String] | ||
|
||
public let StringReplaceSubrange = [ | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.SmallLiteral.String", | ||
runFunction: { replaceSubrange($0, "coffee", with: "t") }, | ||
tags: tags | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.LargeManaged.String", | ||
runFunction: { replaceSubrange($0, largeManagedString, with: "t") }, | ||
tags: tags, | ||
setUpFunction: setupLargeManagedString | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.SmallLiteral.Substr", | ||
runFunction: { replaceSubrange($0, "coffee", with: getSubstring("t")) }, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need to put an optimization barrier (by calling the For an implementation symmetry, I'd also extract the "coffee" into
|
||
tags: tags | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.LargeManaged.Substr", | ||
runFunction: { replaceSubrange($0, largeManagedString, with: getSubstring("t")) }, | ||
tags: tags, | ||
setUpFunction: setupLargeManagedString | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.SmallLiteral.ArrChar", | ||
runFunction: { replaceSubrange($0, "coffee", with: getArrayCharacter(Array<Character>(["t"]))) }, | ||
tags: tags | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.LargeManaged.ArrChar", | ||
runFunction: { replaceSubrange($0, largeManagedString, with: getArrayCharacter(Array<Character>(["t"]))) }, | ||
tags: tags, | ||
setUpFunction: setupLargeManagedString | ||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.SmallLiteral.RepeatedChar", | ||
runFunction: { replaceSubrange($0, "coffee", with: getRepeatedCharacter(repeatedCharacter)) }, | ||
tags: tags | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The benchmark name "Str.replaceSubrange.SmallLiteral.RepeatedChar" is longer than 40 characters, but I couldn't think a better name fitting 40. Maybe it can be like "Str.replaceSubrange.LargeManagedRepChar", but I was concerned "RepChar" is a little bit hard to understand that it means There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at what @milseman writes in SR-8905:
I'd say the naming convention calls for base name of
The longest one is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an awesome naming idea. I will use them. Thanks for your suggestion! |
||
), | ||
BenchmarkInfo( | ||
name: "Str.replaceSubrange.LargeManaged.RepeatedChar", | ||
runFunction: { replaceSubrange($0, largeManagedString, with: getRepeatedCharacter(repeatedCharacter)) }, | ||
tags: tags, | ||
setUpFunction: setupLargeManagedString | ||
), | ||
] | ||
|
||
// MARK: - Privates for String | ||
|
||
private var largeManagedString: String = { | ||
return getString("coffee\u{301}coffeecoffeecoffeecoffee") | ||
}() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll guess it was in order for the string to be in particular normalization form. @milseman Do you want to vary the benchmarks also for different normalization forms? SR-8905 doesn't mention thatβ¦ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering if the result of testing with the string "coffeΓ©coffeecoffeecoffeecoffee" would be different from the one with "coffeecoffeecoffeecoffeecoffee" π€ If there is distinct difference, maybe we could add two benchmarks for the one with the acute accent character and the other one without it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Grapheme segmentation is relevant to this benchmark, but not normalization (since there's no comparison). The difference is that the precomposed representation is a single scalar per grapheme cluster, while the decomposed (multi-scalar) form is not. The single-scalar one will hit our grapheme breaking fast-paths while the multi-scalar one will call out to ICU. Alternatively, you could use other kinds of multi-scalar graphemes clusters, such as complex emoji. I just mentioned |
||
|
||
private func setupLargeManagedString() { | ||
_ = largeManagedString | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense. Thanks for explaining! |
||
|
||
// MARK: - Privates for Repeated<Character> | ||
|
||
private let repeatedCharacter: Repeated<Character> = { | ||
let character = Character("c") | ||
return repeatElement(character, count: 1) | ||
}() | ||
|
||
|
||
@inline(never) | ||
private func replaceSubrange(_ N: Int, _ string: String, with replacingString: String) { | ||
var copy = getString(string) | ||
let range = string.startIndex..<string.index(after: string.startIndex) | ||
for _ in 0 ..< 5_000 * N { | ||
copy.replaceSubrange(range, with: replacingString) | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the criteria for choosing this multiplying number like 5000? Does this depend on the benchmark time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. We are just trying to size the workload to run in 20β1000 ΞΌs, so that it is in a measurement sweet spot for our system. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for explaining! |
||
} | ||
|
||
@inline(never) | ||
private func replaceSubrange(_ N: Int, _ string: String, with replacingSubstring: Substring) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand @milseman's intention from SR-8905 correctly, we are designing benchmark for the generic Therefore we should be able to define single shared generic test function and vary the parameter in For an example of such benchmarks, see @milseman Any thoughts on keeping or dropping the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could refactor this to a generic |
||
var copy = getString(string) | ||
let range = string.startIndex..<string.index(after: string.startIndex) | ||
for _ in 0 ..< 5_000 * N { | ||
copy.replaceSubrange(range, with: replacingSubstring) | ||
} | ||
} | ||
|
||
@inline(never) | ||
private func replaceSubrange(_ N: Int, _ string: String, with replacingArrayCharacter: Array<Character>) { | ||
var copy = getString(string) | ||
let range = string.startIndex..<string.index(after: string.startIndex) | ||
for _ in 0 ..< 5_000 * N { | ||
copy.replaceSubrange(range, with: replacingArrayCharacter) | ||
} | ||
} | ||
|
||
@inline(never) | ||
private func replaceSubrange(_ N: Int, _ string: String, with replacingRepeatedCharacter: Repeated<Character>) { | ||
var copy = getString(string) | ||
let range = string.startIndex..<string.index(after: string.startIndex) | ||
for _ in 0 ..< 5_000 * N { | ||
copy.replaceSubrange(range, with: replacingRepeatedCharacter) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -322,3 +322,11 @@ public func getString(_ s: String) -> String { return s } | |
// The same for Substring. | ||
@inline(never) | ||
public func getSubstring(_ s: Substring) -> Substring { return s } | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to define any new optimization barrier functions... |
||
// The same for Array<Character>. | ||
@inline(never) | ||
public func getArrayCharacter(_ a: Array<Character>) -> Array<Character> { return a } | ||
|
||
// The same for Repeated<Character>. | ||
@inline(never) | ||
public func getRepeatedCharacter(_ r: Repeated<Character>) -> Repeated<Character> { return r } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind explaining what is the difference between the small literal string vs the large managed string? Is this something related to this small string optimization? If the string fits 15 ASCII characters length, it won't be allocated in the heap memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. See:
_SmallString
and_StringGuts
for implementation details if you're interested.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links! I will take a look at them π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small form accommodates 15 UTF-8 code units in length (not just ASCII)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying the length of the small string, Michael π