-
Notifications
You must be signed in to change notification settings - Fork 248
Fix parallel mode on larger projects #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since the parallel mode requires the count of the files that it's going to process, the application of `openAndPrepareFile` was not lazy, this meant that swift-format immediately opened all the files in the project, which because of macOS's low open file limit quickly results in "too many open files". This changes the logic to only open the file as needed instead. This shouldn't be any different from before as far as file discovery, since we were already resolving everything to fetch the count.
This is how this logic used to work, but it changed in 6d500ea. @p4checo based on your description I don't see why this specific piece changed there, can you verify I'm not breaking something here? To repro what this is fixing:
Before this change you see many errors like:
|
Maybe this reintroduces the issue since configurations are now created in parallel? If so we'll have to think more about how to solve both of these issues at once. |
Hi @keith 👋🏼 The reasoning for the change can be found on the commit message and PR:
So, I think that unfortunately your changes do reintroduce the issue because To work around the OS's open file limit while still avoiding serializing access to if parallel {
// process files in chunks to avoid hitting the OS's max open file limit.
let chunkSize = 1000 // TODO: define at runtime, or use a "reasonable" value.
let fileChunks = Array(FileIterator(urls: urls)).chunks(size: chunkSize)
fileChunks.forEach { files in
let filesToProcess = files.compactMap(openAndPrepareFile)
DispatchQueue.concurrentPerform(iterations: filesToProcess.count) { index in
processFile(filesToProcess[index])
}
}
} else {
FileIterator(urls: urls).lazy.compactMap(openAndPrepareFile).forEach(processFile)
} private extension Array {
func chunks(size: Int) -> [[Element]] {
stride(from: 0, to: count, by: size).map {
Array(self[$0..<Swift.min($0 + size, count)])
}
}
} I'm not sure if there's a programmatic way of knowing the current OS's maximum open file limit to determine optimal From a quick google search I found multiple commands and ran them on my machine (running macOS 12.6):
Interestingly, 256 files seems way too low, and 122880 way too high compared to the 3000 your example mentions are enough to cause the issue. Any thoughts on what the actual value might be? Hope this helps! 🙏🏼 |
I'm not sure what the exact value is, but also I imagine our repro case still has to be a bit above that because if we open / close the files fast enough, they won't overlap. Also knowing that number would only help so much since there could be many competing processes with many open files. Realistically semantically I think we would really just want "open just what we're processing and maybe just a few we're about to process" but implementing that feels like a lot of overhead for this case. I assuming outside of this change a nice way to deal with the Configuration thread safety issue would be to make it an Given these issues I don't think I'm going to work on this anymore right now, so I'll just leave it broken until someone is motivated to pick this up. Thanks for reviewing! |
Since the parallel mode requires the count of the files that it's going
to process, the application of
openAndPrepareFile
was not lazy, thismeant that swift-format immediately opened all the files in the project,
which because of macOS's low open file limit quickly results in "too
many open files". This changes the logic to only open the file as
needed instead. This shouldn't be any different from before as far as
file discovery, since we were already resolving everything to fetch the
count.