Skip to content

Commit 5950611

Browse files
authored
[ET-VK] Introduce copy constructor for vTensor to allow for zero-copy operators (#4791)
## Context For buffer-backed tensors, orchestration operators such as slicing, transposition, views, etc. can be implemented by creating a new tensor that uses the same storage as another tensor, but with different metadata (i.e. sizes and strides). This diff implements copy constructors for the `Allocation`, `VulkanBuffer`, and `vTensor` classes which enable the aforementioned behaviour. Class instances created from copy constructors do not own the underlying memory resource, hence the resource will not be freed upon destruction of the class instance. Note that this behaviour is similar to copying a pointer in C/C++, and is inherently unsafe because the original resource may be destroyed before the copy. However, in practice this is not much of a concern, because tensors must be kept alive for the duration of inference, thus all tensors created during model inference will have the same lifetime. However, it does pose a problem for memory planned tensors, since from the memory planner's perspective the lifetime of the original tensor may be shorter than the aliased tensor, thus the shared memory may be overwritten by other tensors using the same allocation. **Therefore this behaviour is not yet safe to use when memory planning is enabled; additional work will be needed on the export side to make sure aliased tensors have the same lifetime as the original tensor**. ## Why not use shared_ptr? In the past, this behaviour was enabled by `vTensor` instances storing their `vTensorStorage` classes via a `shared_ptr`. This was a safer design, since `shared_ptr` would handle resource management of the underlying buffer or texture resource. However, I decided not to go with `shared_ptr` design because of the overhead involved in making a heap allocation whenever a vTensor is constructed, and the subsequent pointer chasing required whenever data is accessed from a vTensor. It seemed too big a cost to pay, especially considering tensor aliasing only really makes sense for buffer-backed tensors (thus it is not expected to be a common occurrence). Also, as mentioned above the lifetime of all created `vTensor` instances tend to have the same lifetime in practice, especially in the context of the `ComputeGraph` class. Also, the `shared_ptr` design would still encounter the problem with memory planning. Differential Revision: [D61417569](https://our.internmc.facebook.com/intern/diff/D61417569/) [ghstack-poisoned] Co-authored-by: Stephen Jia <[email protected]> Pull Request resolved: #4769
1 parent 80b4a72 commit 5950611

File tree

12 files changed

+544
-9
lines changed

12 files changed

+544
-9
lines changed

backends/vulkan/runtime/api/containers/Tensor.cpp

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,44 @@
1313
namespace vkcompute {
1414
namespace api {
1515

16+
/*
17+
* Given the strides of a buffer-backed tensor, find the index of the "fastest
18+
* moving" dimension in WHCN dimension order. If multiple dims have the lowest
19+
* stride, then the "earlier" dim is assumed to be the fastest moving (width is
20+
* "earlier" than height).
21+
*/
22+
int32_t find_fastest_whcn_dim(const std::vector<int64_t>& strides) {
23+
if (strides.size() == 0) {
24+
return 0;
25+
}
26+
int32_t fastest_dim = 0;
27+
int64_t min_stride = strides.at(0);
28+
for (int d = strides.size() - 1; d >= 0; --d) {
29+
if (strides.at(d) < min_stride) {
30+
fastest_dim = d;
31+
min_stride = strides.at(d);
32+
}
33+
}
34+
return (strides.size() - 1 - fastest_dim);
35+
}
36+
37+
/*
38+
* Given the strides of a buffer-backed tensor, estimate the equivalent memory
39+
* layout enum value by identifying the fastest moving dimension.
40+
*/
41+
utils::GPUMemoryLayout estimate_memory_layout(
42+
const std::vector<int64_t>& strides) {
43+
int32_t fastest_dim = find_fastest_whcn_dim(strides);
44+
if (fastest_dim <= 3) {
45+
return utils::GPUMemoryLayout(fastest_dim);
46+
}
47+
48+
// TODO(ssjia) find a way to gracefully recover from this case by i.e. adding
49+
// a UNKOWN GPUMemoryLayout. This is not high priority though because we don't
50+
// expect this to ever come up in practice.
51+
VK_THROW("No compatible GPUMemoryLayout value");
52+
}
53+
1654
std::vector<int64_t> calculate_strides(
1755
const std::vector<int64_t>& sizes,
1856
const utils::GPUMemoryLayout memory_layout) {
@@ -166,6 +204,34 @@ vTensor::vTensor(
166204
}
167205
}
168206

207+
vTensor::vTensor(
208+
const vTensor& other,
209+
const std::vector<int64_t>& sizes,
210+
const std::vector<int64_t>& strides,
211+
const size_t offset_numel)
212+
: dtype_(other.dtype_),
213+
memory_layout_(estimate_memory_layout(strides)),
214+
// Copy tensor size metadata
215+
sizes_(sizes.begin(), sizes.end()),
216+
strides_(strides.begin(), strides.end()),
217+
numel_(utils::multiply_integers(sizes_)),
218+
padded_sizes_{calculate_padded_sizes(sizes, memory_layout_)},
219+
unsqueezed_strides_{unsqueeze_strides(strides_, numel_)},
220+
padded_numel_(utils::multiply_integers(padded_sizes_)),
221+
texture_limits_{{0, 0, 0}},
222+
// Empty initialize Utility Uniform Buffers
223+
sizes_uniform_(),
224+
strides_uniform_(),
225+
numel_uniform_(),
226+
texture_limits_uniform_(),
227+
// Copy Tensor storage
228+
storage_(other.storage_, vkapi::element_size(dtype_) * offset_numel) {
229+
VK_CHECK_COND(
230+
offset_numel + numel_ <= other.numel(),
231+
"Tensor alias cannot access more elements than available in the original"
232+
"tensor");
233+
}
234+
169235
vkapi::VulkanImage& vTensor::image(
170236
vkapi::PipelineBarrier& pipeline_barrier,
171237
const vkapi::PipelineStageFlags stage) & {
@@ -428,6 +494,21 @@ vTensorStorage::vTensorStorage(
428494
allocate_memory)),
429495
last_access_{} {}
430496

497+
vTensorStorage::vTensorStorage(
498+
const vTensorStorage& other,
499+
const size_t buffer_offset)
500+
: context_(other.context_),
501+
storage_type_{other.storage_type_},
502+
image_extents_(other.image_extents_),
503+
buffer_length_{other.buffer_length_},
504+
image_(),
505+
buffer_(other.buffer_, buffer_offset),
506+
last_access_{other.last_access_} {
507+
if (other.storage_type_ != utils::kBuffer) {
508+
VK_THROW("Tensors with texture storage cannot be copied!");
509+
}
510+
}
511+
431512
vTensorStorage::~vTensorStorage() {
432513
flush();
433514
}

backends/vulkan/runtime/api/containers/Tensor.h

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,19 @@ class vTensorStorage final {
8787
const vkapi::ScalarType dtype,
8888
const bool allocate_memory = true);
8989

90-
vTensorStorage(const vTensorStorage& other) = delete;
90+
protected:
91+
/*
92+
* This allows for creation of tensors that use the same underlying storage
93+
* as another tensor. Note that this functionality is currently enabled for
94+
* tensors that have buffer storage only. The created tensor will not have
95+
* ownership of the underlying VkBuffer. This constructor is marked protected
96+
* because this behaviour is unsafe, since the original tensor may be
97+
* destroyed before the copy is destroyed.
98+
*/
99+
vTensorStorage(const vTensorStorage& other, const size_t buffer_offset = 0);
100+
101+
public:
102+
// To discourage creating copies, the assignment operator is still deleted.
91103
vTensorStorage& operator=(const vTensorStorage& other) = delete;
92104

93105
vTensorStorage(vTensorStorage&& other) = default;
@@ -158,6 +170,22 @@ class vTensor final {
158170
vTensor(const vTensor& other) = delete;
159171
vTensor& operator=(const vTensor& other) = delete;
160172

173+
/*
174+
* This constructor allows for the creation of a vTensor that references the
175+
* same buffer resource of another vTensor, but with different sizes and
176+
* strides metatdata. The created vTensor will not own the underlying
177+
* resource. This is only applicable for buffer backed tensors at the moment.
178+
*
179+
* The offset_numel argument allows the aliased tensor's memory region to
180+
* begin at an offset of N elements from the start of the original tensor's
181+
* buffer.
182+
*/
183+
vTensor(
184+
const vTensor& other,
185+
const std::vector<int64_t>& sizes,
186+
const std::vector<int64_t>& strides,
187+
const size_t offset_numel = 0);
188+
161189
vTensor(vTensor&& other) = default;
162190
vTensor& operator=(vTensor&& other) = default;
163191

backends/vulkan/runtime/graph/ComputeGraph.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,17 @@ ValueRef ComputeGraph::add_tensor(
203203
sizes, dtype, suggested_memory_layout(sizes), shared_object_idx);
204204
}
205205

206+
ValueRef ComputeGraph::add_tensor_view(
207+
const ValueRef vref,
208+
const std::vector<int64_t>& sizes,
209+
const std::vector<int64_t>& strides,
210+
const size_t offset_numel) {
211+
const vTensorPtr t = get_tensor(vref);
212+
ValueRef idx(static_cast<int>(values_.size()));
213+
values_.emplace_back(api::vTensor(*t, sizes, strides, offset_numel));
214+
return idx;
215+
}
216+
206217
ValueRef ComputeGraph::add_tensorref(
207218
const std::vector<int64_t>& sizes,
208219
const vkapi::ScalarType dtype,

backends/vulkan/runtime/graph/ComputeGraph.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,17 @@ class ComputeGraph final {
351351
const ValueRef vref,
352352
const utils::GPUMemoryLayout memory_layout);
353353

354+
/*
355+
* Use the copy constructor of `api::vTensor` to create a "view" of the
356+
* `vTensor` value at `vref`. See the copy constructor of `api::vTensor` for
357+
* more details.
358+
*/
359+
ValueRef add_tensor_view(
360+
const ValueRef vref,
361+
const std::vector<int64_t>& sizes,
362+
const std::vector<int64_t>& strides,
363+
const size_t offset_numel = 0);
364+
354365
/*
355366
* Add a `TensorRef` value to the graph with the specific properties. A
356367
* `TensorRef` is a reference to a `api::vTensor` whose data is stored in an

backends/vulkan/runtime/vk_api/memory/Allocation.cpp

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ Allocation::Allocation()
2929
: memory_requirements{},
3030
create_info{},
3131
allocator(VK_NULL_HANDLE),
32-
allocation(VK_NULL_HANDLE) {}
32+
allocation(VK_NULL_HANDLE),
33+
is_copy_(false) {}
3334

3435
Allocation::Allocation(
3536
VmaAllocator vma_allocator,
@@ -38,16 +39,25 @@ Allocation::Allocation(
3839
: memory_requirements(mem_props),
3940
create_info(create_info),
4041
allocator(vma_allocator),
41-
allocation(VK_NULL_HANDLE) {
42+
allocation(VK_NULL_HANDLE),
43+
is_copy_(false) {
4244
VK_CHECK(vmaAllocateMemory(
4345
allocator, &memory_requirements, &create_info, &allocation, nullptr));
4446
}
4547

48+
Allocation::Allocation(const Allocation& other) noexcept
49+
: memory_requirements(other.memory_requirements),
50+
create_info(other.create_info),
51+
allocator(other.allocator),
52+
allocation(other.allocation),
53+
is_copy_(true) {}
54+
4655
Allocation::Allocation(Allocation&& other) noexcept
4756
: memory_requirements(other.memory_requirements),
4857
create_info(other.create_info),
4958
allocator(other.allocator),
50-
allocation(other.allocation) {
59+
allocation(other.allocation),
60+
is_copy_(other.is_copy_) {
5161
other.allocation = VK_NULL_HANDLE;
5262
}
5363

@@ -58,14 +68,18 @@ Allocation& Allocation::operator=(Allocation&& other) noexcept {
5868
create_info = other.create_info;
5969
allocator = other.allocator;
6070
allocation = other.allocation;
71+
is_copy_ = other.is_copy_;
6172

6273
other.allocation = tmp_allocation;
6374

6475
return *this;
6576
}
6677

6778
Allocation::~Allocation() {
68-
if (VK_NULL_HANDLE != allocation) {
79+
// Do not destroy the VmaAllocation if this class instance is a copy of some
80+
// other class instance, since this means that this class instance does not
81+
// have ownership of the underlying resource.
82+
if (VK_NULL_HANDLE != allocation && !is_copy_) {
6983
vmaFreeMemory(allocator, allocation);
7084
}
7185
}

backends/vulkan/runtime/vk_api/memory/Allocation.h

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,23 @@ struct Allocation final {
3131
const VkMemoryRequirements&,
3232
const VmaAllocationCreateInfo&);
3333

34-
Allocation(const Allocation&) = delete;
34+
protected:
35+
/*
36+
* The Copy constructor allows for creation of a class instance that are
37+
* "aliases" of another class instance. The resulting class instance will not
38+
* have ownership of the underlying VmaAllocation.
39+
*
40+
* This behaviour is analogous to creating a copy of a pointer, thus it is
41+
* unsafe, as the original class instance may be destroyed before the copy.
42+
* These constructors are therefore marked protected so that they may be used
43+
* only in situations where the lifetime of the original class instance is
44+
* guaranteed to exceed, or at least be the same as, the lifetime of the
45+
* copied class instance.
46+
*/
47+
Allocation(const Allocation&) noexcept;
48+
49+
public:
50+
// To discourage creating copies, the assignment operator is still deleted.
3551
Allocation& operator=(const Allocation&) = delete;
3652

3753
Allocation(Allocation&&) noexcept;
@@ -47,9 +63,21 @@ struct Allocation final {
4763
// Handles to the allocated memory
4864
VmaAllocation allocation;
4965

66+
private:
67+
// Indicates whether this class instance is a copy of another class instance,
68+
// in which case it does not have ownership of the underlying VmaAllocation
69+
bool is_copy_;
70+
71+
public:
5072
operator bool() const {
5173
return (allocation != VK_NULL_HANDLE);
5274
}
75+
76+
inline bool is_copy() const {
77+
return is_copy_;
78+
}
79+
80+
friend class VulkanBuffer;
5381
};
5482

5583
} // namespace vkapi

backends/vulkan/runtime/vk_api/memory/Buffer.cpp

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ VulkanBuffer::VulkanBuffer()
2020
allocator_(VK_NULL_HANDLE),
2121
memory_{},
2222
owns_memory_(false),
23+
is_copy_(false),
2324
handle_(VK_NULL_HANDLE) {}
2425

2526
VulkanBuffer::VulkanBuffer(
@@ -37,6 +38,7 @@ VulkanBuffer::VulkanBuffer(
3738
allocator_(vma_allocator),
3839
memory_{},
3940
owns_memory_(allocate_memory),
41+
is_copy_(false),
4042
handle_(VK_NULL_HANDLE) {
4143
// If the buffer size is 0, allocate a buffer with a size of 1 byte. This is
4244
// to ensure that there will be some resource that can be bound to a shader.
@@ -74,11 +76,29 @@ VulkanBuffer::VulkanBuffer(
7476
}
7577
}
7678

79+
VulkanBuffer::VulkanBuffer(
80+
const VulkanBuffer& other,
81+
const VkDeviceSize offset,
82+
const VkDeviceSize range) noexcept
83+
: buffer_properties_(other.buffer_properties_),
84+
allocator_(other.allocator_),
85+
memory_(other.memory_),
86+
owns_memory_(other.owns_memory_),
87+
is_copy_(true),
88+
handle_(other.handle_) {
89+
// TODO: set the offset and range appropriately
90+
buffer_properties_.mem_offset = other.buffer_properties_.mem_offset + offset;
91+
if (range != VK_WHOLE_SIZE) {
92+
buffer_properties_.mem_range = range;
93+
}
94+
}
95+
7796
VulkanBuffer::VulkanBuffer(VulkanBuffer&& other) noexcept
7897
: buffer_properties_(other.buffer_properties_),
7998
allocator_(other.allocator_),
8099
memory_(std::move(other.memory_)),
81100
owns_memory_(other.owns_memory_),
101+
is_copy_(other.is_copy_),
82102
handle_(other.handle_) {
83103
other.handle_ = VK_NULL_HANDLE;
84104
}
@@ -91,6 +111,7 @@ VulkanBuffer& VulkanBuffer::operator=(VulkanBuffer&& other) noexcept {
91111
allocator_ = other.allocator_;
92112
memory_ = std::move(other.memory_);
93113
owns_memory_ = other.owns_memory_;
114+
is_copy_ = other.is_copy_;
94115
handle_ = other.handle_;
95116

96117
other.handle_ = tmp_buffer;
@@ -100,7 +121,10 @@ VulkanBuffer& VulkanBuffer::operator=(VulkanBuffer&& other) noexcept {
100121
}
101122

102123
VulkanBuffer::~VulkanBuffer() {
103-
if (VK_NULL_HANDLE != handle_) {
124+
// Do not destroy the VkBuffer if this class instance is a copy of another
125+
// class instance, since this means that this class instance does not have
126+
// ownership of the underlying resource.
127+
if (VK_NULL_HANDLE != handle_ && !is_copy_) {
104128
if (owns_memory_) {
105129
vmaDestroyBuffer(allocator_, handle_, memory_.allocation);
106130
} else {

0 commit comments

Comments
 (0)