Skip to content

Commit a5a76f7

Browse files
SS-JIAfacebook-github-bot
authored andcommitted
Introduce virtual_clone API to support view of view use cases + fix synchronization hazard with view tensors (#5753)
Summary: Pull Request resolved: #5753 ## Context This diff fixes some hazards (not necessarily) bugs with view tensors. ### `virtual_clone` API Consider the following sequence of calls which may be common in the view of view use case. ``` // Note the below does not reflect accurate API usage; it is mostly pseudocode. t1 = graph.add_tensor(...); // t2 will have the same metadata as t1 t2 = graph.add_tensor_view(t1); // t3 will also have the same metadata as t2 at this point. t3 = graph.add_tensor_view(t2); // t2 metadata will be updated correctly. t2 = add_transpose_view_node(t1, 0, 1, t2); // Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true. // As a result, t3 will have incorrect metadata after this node. t3 = add_transpose_view_node(t2, 1, 2, t3); ``` To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output. ### WAW synchronization hazards `vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource. As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard. This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time. ghstack-source-id: 245517197 exported-using-ghexport Reviewed By: jorgep31415 Differential Revision: D63642092 fbshipit-source-id: 5ca76347946fa2b2f1b07709cbc7d9afa3aaf00d
1 parent 6ff52cc commit a5a76f7

File tree

4 files changed

+131
-8
lines changed

4 files changed

+131
-8
lines changed

backends/vulkan/runtime/api/containers/Tensor.cpp

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -277,10 +277,11 @@ vTensorStorage::vTensorStorage(
277277
storage_type_,
278278
dtype,
279279
allocate_memory)),
280-
last_access_{} {}
280+
last_access_{},
281+
has_copies_{false} {}
281282

282283
vTensorStorage::vTensorStorage(
283-
const vTensorStorage& other,
284+
vTensorStorage& other,
284285
const int64_t buffer_offset)
285286
: context_(other.context_),
286287
storage_type_{other.storage_type_},
@@ -289,7 +290,10 @@ vTensorStorage::vTensorStorage(
289290
buffer_offset_{buffer_offset},
290291
image_(other.image_),
291292
buffer_(other.buffer_, buffer_offset),
292-
last_access_{other.last_access_} {}
293+
last_access_{other.last_access_},
294+
has_copies_{false} {
295+
other.has_copies_ = true;
296+
}
293297

294298
vTensorStorage::~vTensorStorage() {
295299
flush();
@@ -312,6 +316,21 @@ void vTensorStorage::transition(
312316
vkapi::PipelineStageFlags prev_stage = last_access_.stage;
313317
vkapi::MemoryAccessFlags prev_access = last_access_.access;
314318

319+
// If the underlying resource is a copy of another tensor's resource the
320+
// last_access may not be accurate, since the original storage may have been
321+
// written to as part of the original tensor. Likewise, if the underlying
322+
// resource has copies, then the resource may have been updated as part of the
323+
// view tensors.
324+
//
325+
// If the resource is a copy, or has copies of it, then cowardly assume that
326+
// it has previously been written to as part of a compute shader before the
327+
// current access event so that the appropriate memory barriers may be
328+
// inserted.
329+
if (is_copy() || has_copies_) {
330+
prev_stage = vkapi::PipelineStage::COMPUTE;
331+
prev_access = vkapi::kWrite;
332+
}
333+
315334
const bool prev_written = (prev_access & vkapi::MemoryAccessType::WRITE) != 0;
316335

317336
VkImageLayout cur_layout = VK_IMAGE_LAYOUT_UNDEFINED;
@@ -358,6 +377,13 @@ void vTensorStorage::transition(
358377
last_access_.access = cur_access;
359378
}
360379

380+
bool vTensorStorage::is_copy() const {
381+
if (storage_type_ == utils::kBuffer) {
382+
return buffer_.is_copy();
383+
}
384+
return image_.is_copy();
385+
}
386+
361387
bool vTensorStorage::is_copy_of(const vTensorStorage& other) const {
362388
if (storage_type_ == utils::kBuffer) {
363389
return buffer_.is_copy_of(other.buffer_);
@@ -418,7 +444,8 @@ vTensor::vTensor(
418444
}
419445
}
420446

421-
vTensor::vTensor(const vTensor& other)
447+
// NOLINTNEXTLINE
448+
vTensor::vTensor(vTensor& other)
422449
: dtype_(other.dtype_),
423450
// Copy tensor size metadata
424451
sizes_(other.sizes_.begin(), other.sizes_.end()),
@@ -443,7 +470,7 @@ vTensor::vTensor(const vTensor& other)
443470
storage_(other.storage_) {}
444471

445472
vTensor::vTensor(
446-
const vTensor& other,
473+
vTensor& other,
447474
const std::vector<int64_t>& sizes,
448475
const std::vector<int64_t>& dim_order,
449476
const int64_t offset_numel)
@@ -671,6 +698,14 @@ void vTensor::virtual_reconfigure(
671698
update_metadata();
672699
}
673700

701+
void vTensor::virtual_clone(const vTensor& other) {
702+
VK_CHECK_COND(is_view_of(other));
703+
sizes_ = other.sizes_;
704+
dim_order_ = other.dim_order_;
705+
axis_map_ = other.axis_map_;
706+
packed_dim_ = other.packed_dim_;
707+
}
708+
674709
void vTensor::virtual_resize(const std::vector<int64_t>& new_sizes) {
675710
VK_CHECK_COND(
676711
new_sizes.size() == dim_order_.size(),

backends/vulkan/runtime/api/containers/Tensor.h

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ class vTensorStorage final {
104104
* because this behaviour is unsafe, since the original tensor may be
105105
* destroyed before the copy is destroyed.
106106
*/
107-
vTensorStorage(const vTensorStorage& other, const int64_t buffer_offset = 0);
107+
vTensorStorage(vTensorStorage& other, const int64_t buffer_offset = 0);
108108

109109
public:
110110
// To discourage creating copies, the assignment operator is still deleted.
@@ -134,6 +134,8 @@ class vTensorStorage final {
134134

135135
// Last Access - used to insert memory barriers
136136
LastAccess last_access_;
137+
// Indicates whether copies of this vTensorStorage have been made
138+
bool has_copies_;
137139

138140
private:
139141
// Registers underlying memory for cleanup
@@ -153,6 +155,11 @@ class vTensorStorage final {
153155
return image_.format();
154156
}
155157

158+
/*
159+
* Check if the underlying resource is a copy of another resource
160+
*/
161+
bool is_copy() const;
162+
156163
/*
157164
* Used for checking if this vTensorStorage is a copy of another instance
158165
*/
@@ -176,6 +183,8 @@ class vTensor final {
176183
const utils::GPUMemoryLayout memory_layout = utils::kChannelsPacked,
177184
const bool allocate_memory = true);
178185

186+
vTensor(const vTensor& other) = delete;
187+
179188
/*
180189
* This constructor allows for the creation of a vTensor that references the
181190
* same buffer resource of another vTensor, with the same sizes and strides
@@ -185,7 +194,7 @@ class vTensor final {
185194
* Once created, the sizes and strides of the aliased vTensor can be changed
186195
* using the `virtual_reconfigure` member function.
187196
*/
188-
vTensor(const vTensor& other);
197+
vTensor(vTensor& other);
189198

190199
/*
191200
* This constructor allows for the creation of a vTensor that references the
@@ -202,7 +211,7 @@ class vTensor final {
202211
* buffer.
203212
*/
204213
vTensor(
205-
const vTensor& other,
214+
vTensor& other,
206215
const std::vector<int64_t>& sizes,
207216
const std::vector<int64_t>& dim_order,
208217
const int64_t offset_numel = 0);
@@ -511,6 +520,11 @@ class vTensor final {
511520
const std::vector<int64_t>& new_sizes,
512521
const std::vector<int64_t>& new_dim_order);
513522

523+
/*
524+
* Set all metadata of this tensor to match the metadata of another tensor.
525+
*/
526+
void virtual_clone(const vTensor& other);
527+
514528
/*
515529
* Perform a virtual resize of the vTensor by modifying the size metadata that
516530
* gets used in compute shaders. This allows the shader to treat the

backends/vulkan/runtime/graph/ops/impl/Transpose.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ void add_transpose_view_node(
6262
const int64_t dim1 = graph.extract_scalar<int64_t>(dim1_ref);
6363

6464
check_transpose_view_args(graph, input_ref, dim0, dim1, out_ref);
65+
const vTensorPtr in = graph.get_tensor(input_ref);
66+
graph.get_tensor(out_ref)->virtual_clone(*in);
6567
graph.get_tensor(out_ref)->virtual_transpose(dim0, dim1);
6668

6769
graph.execute_nodes().emplace_back(new ExecuteNode(

backends/vulkan/test/vulkan_compute_api_test.cpp

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,35 @@ TEST_F(VulkanComputeAPITest, virtual_transpose_test) {
301301
}
302302
}
303303

304+
TEST_F(VulkanComputeAPITest, view_of_view_test) {
305+
constexpr int N = 3;
306+
constexpr int C = 5;
307+
constexpr int H = 17;
308+
constexpr int W = 19;
309+
310+
std::vector<int64_t> sizes = {N, C, H, W};
311+
312+
vTensor t1 = vTensor(
313+
context(), sizes, vkapi::kFloat, utils::kTexture3D, utils::kWidthPacked);
314+
315+
vTensor t2 = vTensor(t1);
316+
EXPECT_TRUE(t2.sizes() == sizes);
317+
vTensor t3 = vTensor(t2);
318+
EXPECT_TRUE(t2.sizes() == sizes);
319+
320+
t2.virtual_transpose(1, 2);
321+
std::vector<int64_t> expected_t2_sizes = {N, H, C, W};
322+
EXPECT_TRUE(t2.sizes() == expected_t2_sizes);
323+
324+
// Because t3 was created before t2's metadata was updated, we need to first
325+
// update t3's metadata to match t2's metadata. Then the transpose will yield
326+
// the correct metadata.
327+
t3.virtual_clone(t2);
328+
t3.virtual_transpose(2, 3);
329+
std::vector<int64_t> expected_t3_sizes = {N, H, W, C};
330+
EXPECT_TRUE(t3.sizes() == expected_t3_sizes);
331+
}
332+
304333
utils::ivec3 make_temp_ivec3(int x, int y, int z) {
305334
return utils::ivec3{x, y, z};
306335
}
@@ -1274,6 +1303,49 @@ TEST(VulkanComputeGraphTest, test_simple_graph_with_view) {
12741303
}
12751304
}
12761305

1306+
TEST(VulkanComputeGraphTest, test_graph_view_of_view) {
1307+
GraphConfig config;
1308+
config.set_storage_type_override(utils::kTexture3D);
1309+
ComputeGraph graph(config);
1310+
1311+
constexpr int N = 3;
1312+
constexpr int C = 5;
1313+
constexpr int H = 17;
1314+
constexpr int W = 19;
1315+
1316+
std::vector<int64_t> orig_sizes = {N, C, H, W};
1317+
1318+
// Test a common view of view usage pattern. In delegate execution, the values
1319+
// of the graph are created first; then operators are added. As a result,
1320+
// creating views of views is a bit tricky because metadata updates to a view
1321+
// does not update the metadata of the view's views. Nonetheless, view
1322+
// operators have an implicit assumption that the metadata of the output is
1323+
// equivalent to the metadata of the input. Therefore, view operators must
1324+
// account for unseen updates to the input view by first calling
1325+
// `virtual_clone()` to make the output equivalent to the input before.
1326+
// modifying metadata.
1327+
1328+
ValueRef t1 = graph.add_tensor(orig_sizes, vkapi::kFloat);
1329+
ValueRef t2 = graph.add_tensor_view(t1);
1330+
ValueRef t3 = graph.add_tensor_view(t2);
1331+
1332+
ValueRef channels = graph.add_scalar<int64_t>(1);
1333+
ValueRef height = graph.add_scalar<int64_t>(2);
1334+
ValueRef width = graph.add_scalar<int64_t>(3);
1335+
1336+
auto opFn = VK_GET_OP_FN("aten.transpose.int");
1337+
1338+
opFn(graph, {t1, channels, height, t2});
1339+
std::vector<int64_t> t2_sizes = graph.sizes_of(t2);
1340+
std::vector<int64_t> expected_t2_sizes = {N, H, C, W};
1341+
EXPECT_TRUE(t2_sizes == expected_t2_sizes);
1342+
1343+
opFn(graph, {t2, height, width, t3});
1344+
std::vector<int64_t> t3_sizes = graph.sizes_of(t3);
1345+
std::vector<int64_t> expected_t3_sizes = {N, H, W, C};
1346+
EXPECT_TRUE(t3_sizes == expected_t3_sizes);
1347+
}
1348+
12771349
TEST(VulkanComputeGraphTest, test_simple_graph) {
12781350
GraphConfig config;
12791351
ComputeGraph graph(config);

0 commit comments

Comments
 (0)