You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/vision/status_quo/barbara_simulates_hydrodynamics.md
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -8,20 +8,20 @@ If you would like to expand on this story, or adjust the answers to the FAQ, fee
8
8
9
9
## The story
10
10
### Problem
11
-
Barbara is a professor of physics at the University of Rustville. She needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established.
11
+
Niklaus is a professor of physics at the University of Rustville. He needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established.
12
12
13
-
Barabara wanted to write a performant tool to compute the solutions to the simulations of her research. She chose Rust because she needed high performance but she also wanted something that could be maintained by her students, who are not professional programmers. Rust's safety guarantees giver he confidence that her results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Barbara began implementing the parallelization architecture.
13
+
Niklaus wanted to write a performant tool to compute the solutions to the simulations of his research. He chose Rust because he needed high performance but he also wanted something that could be maintained by his students, who are not professional programmers. Rust's safety guarantees giver him confidence that his results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Niklaus began implementing the parallelization architecture.
14
14
15
-
Her first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So she assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores.
15
+
His first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So he assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores.
16
16
17
-
This solution worked, but Barbara had two problems with it. First, it gave her no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when her team added a new feature (tracer particles) that added additional solution data the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Barbara decided to find a better solution.
17
+
This solution worked, but Niklaus had two problems with it. First, it gave him no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when his team added a new feature (tracer particles) the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Niklaus decided to find a better solution.
18
18
19
19
### Solution Path
20
-
To address the first problem: Barbara would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow her to more finely control how many resources were used. So, she began looking for a tool in Rust that would meet this design pattern. When she read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, she thought she'd found exactly what she needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and so she began building a new CFD tool with `async` and `tokio`. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced.
20
+
To address the first problem: Niklaus would decouple the work that needed to be done (solving each patch) from the workers (threads) this would allow him to more finely control how many resources were used. So, he began looking for a tool in Rust that would meet this design pattern. When he read about `async` and how it allowed the user to define units of work, called tasks, and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. Further reading indicate that `tokio` was the runtime of choice for `async` in the community and, so, he began building a new CFD tool with `async` and `tokio`.
21
21
22
-
As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs. At first, Barbara was under a false impression about what async executors do. She had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. When this turned out to wrong, she went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Barbara felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel.
22
+
As Niklaus began working on his new design with `tokio`, his use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for his needs. At first, Niklaus was under a false impression about what `async` executors do. He had assumed that a multi-threaded executor could automatically move the execution of an `async` block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel.
23
23
24
-
With the move to `async`, Barbara saw an opportunity to solve her second program. Rather than using message passing to coordinate patch computation, she used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. She setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design:
24
+
With the move to `async`, Niklaus saw an opportunity to solve his second program. Rather than using message passing to coordinate patch computation, he used the `async` API to define dependencies between patches so that a patch would only begin computing its solution when its neighboring patches had completed. He setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with `async` was a new challenge. The initial design:
lacked performance because she needed to clone the value for every task. So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency.
35
+
lacked performance because he needed to clone the value for every task. So, Niklaus switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. He realized that managing the dependency graph was not intuitive when using `async` for concurrency.
36
36
37
-
A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of her design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. She asked her teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value:
37
+
A new problem arose during the move to `async`: a steep learning curve with error handling. The initial version of his design used `panic!`s to fail the program if an error was encountered, but the stack traces were almost unreadable. He asked his teammate Grace to migrate over to using the `Result` idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating `Result` in `async` blocks. Grace frequently found that she had to specify the type of the error when creating a result value:
And she could not figure out why she had to add the `::<_, HydroError>` to some of the `Result` values.
42
42
43
-
Finally, once her team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What her and her team want is for compilation to be 2 to 3 seconds. Barbara believes that the use of `async` is a major contributor to the long compilation times.
43
+
Finally, once Niklaus' team began using the new `async` design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What he and his team want is for compilation to be 2 to 3 seconds. Niklaus believes that the use of `async` is a major contributor to the long compilation times.
44
44
45
-
This new solution works, but Barbara is not satisfied with how complex her code became after the move to `async` and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of her program that was difficult to understand and pervasive. Ultimately, her conclusion was that `async` is not appropriate for parallelizing computational tasks. She will be trying a new design based upon Rayon in the next version of her application.
45
+
This new solution works, but Niklaus is not satisfied with how complex his code became after the move to `async` and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with `Arc` and `async` is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of his program that was difficult to understand and pervasive. Ultimately, his conclusion was that `async` is not appropriate for parallelizing computational tasks. He will be trying a new design based upon Rayon in the next version of her application.
46
46
47
47
## 🤔 Frequently Asked Questions
48
48
@@ -55,7 +55,7 @@ This new solution works, but Barbara is not satisfied with how complex her code
55
55
This story is based on the experience of building the [kilonova](https://github.com/clemson-cal/app-kilonova) hydrodynamics simulation solver.
56
56
57
57
### **Why did you choose Barbara and Grace to tell this story?**
58
-
I chose Barbara as the primary character in this story because this work was driven by someone with experience in Rust specifically but does not have much systems level experience. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters.
58
+
I chose Niklaus as the primary character in this story because this work was driven by someone who only uses programming for a small part of their work. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters.
59
59
60
60
### **How would this story have played out differently for the other characters?**
61
61
- Alan: there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that `async` was not the right place to start.
0 commit comments