-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][Docs] Add sycl_ext_oneapi_virtual_mem extension and implementation #8954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
10c344e
81703b6
ce0829c
0fef874
294dd96
053cbdc
1129750
7b1b5ec
ac8843f
f8fad93
0a6e5f9
d0b3229
27c7200
92fe6f5
d2c92f0
d6745aa
eb21fc3
f268d37
5f5b2f1
1f2d527
99b7e9c
14e64af
d88e141
0df58a4
f7004f1
ffb2982
a7b067d
bd3a7e7
c75d8ee
4fdf659
c1f51e5
8c8955a
c763606
452ff19
d3f58d5
9cb8dbe
2db3ac8
17a1f3b
7e2844c
99ee1f8
6d8aede
8c5b692
62f5dba
358b083
be0f060
391116c
6b62bec
2385c0b
633184e
d1d129f
77d0232
8278db7
3eb4f9b
3acb43a
58ab3cb
2f5638b
2e8b031
5d889c3
41cb1e6
d7f720e
1045a5c
3c05124
9d3529d
93d7368
fca62eb
47ba688
5e72a00
219ad30
3c61360
7c87219
81286e4
8b21ef4
1e1fe34
7dcb46c
5e49a0a
7c882dd
0a564f1
394f8ed
aff695d
08553af
b7f91ae
331fa45
150b9dc
538af56
0f29473
2879161
d812df0
c7d22eb
d14ea5f
05f25b1
586b3e2
774a5a5
337634f
851153c
3996b52
4215fa2
a5739f1
3cc68a4
260ab05
21397f0
09fb2fe
08bbd83
d3ae658
d75bacd
e7b2635
3d8261e
9c252ed
55f015a
6ea3b4e
0826458
9b7282b
1916da3
1fa82c0
997cc39
3be01f4
743eb2e
b34d068
93db42b
eeaaad0
e8c98b5
e5483be
290401c
6630795
5c7330d
187b0f7
8dd8cb3
327687b
639aab3
42cbcde
95cdcbb
56ff29c
24dea7a
f85146f
6df0773
0bd2ce3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
= sycl_ext_oneapi_virtual_mem | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2023-2023 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 6 specification. All | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
|
||
== Status | ||
|
||
This is an experimental extension specification, intended to provide early | ||
access to features and gather community feedback. Interfaces defined in this | ||
specification are implemented in {dpcpp}, but they are not finalized and may | ||
change incompatibly in future versions of {dpcpp} without prior notice. | ||
*Shipping software products should not rely on APIs defined in this | ||
specification.* | ||
|
||
|
||
== Backend support status | ||
|
||
The APIs in this extension may be used only on a device that has | ||
`aspect::ext_oneapi_virtual_mem`. The application must check that the devices | ||
in the corresponding context have this aspect before using any of the APIs | ||
introduced in this extension. If the application fails to do this, the | ||
implementation throws a synchronous exception with the | ||
`errc::feature_not_supported` error code. | ||
|
||
== Overview | ||
|
||
This extension adds the notion of "virtual memory ranges" to SYCL, introducing | ||
a way to map an address range onto multiple allocations of physical memory, | ||
allowing users to avoid expensive reallocations and potentially running out of | ||
device memory while relocating the corresponding memory. | ||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_VIRTUAL_MEM` to one of the values defined in the table | ||
below. Applications can test for the existence of this macro to determine if | ||
the implementation supports this feature, or applications can test the macro's | ||
value to determine which of the extension's features the implementation | ||
supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
|
||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
=== Memory granularity | ||
|
||
Working with virtual address ranges and the underlying physical memory requires | ||
KseniyaTikhomirova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the user to align and adjust in accordance with a specified minimum granularity. | ||
In addition, devices can have a recommended granularity which may different from | ||
the minimum granularity and can be used instead of the minimum granularity. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I would rephrase this to make it clearer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried rephrasing it. Is this what you had in mind? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe: -> In addition, adapters may return a recommended granularity to potentially achieve higher performance. Distinction between minimum and recommended is adapters-specific and may vary between devices. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like that. I've changed it a little to refer to common SYCL concepts but otherwise I'm good with the message. Thanks! |
||
|
||
The interfaces for querying the these granularities are defined as: | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```c++ | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
size_t get_minimum_mem_granularity(size_t numBytes, const device &syclDevice, const context &syclContext); | ||
size_t get_minimum_mem_granularity(size_t numBytes, const queue &syclQueue); | ||
size_t get_minimum_mem_granularity(size_t numBytes, const physical_mem &syclPhysicalMem); | ||
|
||
size_t get_recommended_mem_granularity(size_t numBytes, const device &syclDevice, const context &syclContext); | ||
size_t get_recommended_mem_granularity(size_t numBytes, const queue &syclQueue); | ||
size_t get_recommended_mem_granularity(size_t numBytes, const physical_mem &syclPhysicalMem); | ||
|
||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} // namespace sycl::ext::oneapi::experimental | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a guarantee that the returned granularity is >= the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe we can make any such guarantees. The granularity is some value the user must align both the pointer and the size based on. As an example, consider a backend/device that always returns 1024 (note: CUDA doesn't care about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense. I think we should just clarify this for There's something about Another weird thing is the word "minimum". This word implies that the application could also choose a larger granularity, but that's not the case. Presumably, the device supports a fixed set of granularities, and the application must choose one of them. This makes me wonder if the API should instead just return a list of all the supported granularities like:
If the application wants the minimum one, they can just use the first element in the returned vector. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, the L0 interface corresponding to the granularity query (zeVirtualMemQueryPageSize) does not differentiate between recommended and minimum, but does take the size. Conversely, the CUDA query (cuMemGetAllocationGranularity) doesn't have the size argument, but has both a minimum and recommended mode. Maybe we can remove the size from the minimum query by passing As for being able to return a list of valid granularities, I don't see how we can do that with the current L0 interfaces. For CUDA would have one or two elements (minimum and recommended, or one if they are the same.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, @steffenlarsen . cuMemGetAllocationGranularity doesnt have a size parameter, so what that API does is: "Here's the minimum and recommended granularities, please adjust your requested size to it" While zeVirtualMemQueryPageSize says: "For the size you want, here's the minimum granularity you should use for functionality and performance". So the semantics of both APIs is different: CUDA's always returns the same numbers for a type of allocation, user needs to adjust the size, L0's already returns the granularity adjusted to the size. Now, the SYCL APIs proposed here are accepting a size, represented by numBytes sycl::get_recommended_mem_granularity(size_t numBytes...) So I guess there's an expectation that the granularity returned by sycl::get_recommended_mem_granularity should take into account that size, which is what L0 is doing. So I dont think we should pass 1 to L0 API. What I think we should do is to modify cuda_piextVirtualMemGranularityGetInfo to not ignore the mem_size parameter, and instead, returned the granularity based on that size, something like: pi_result cuda_piextVirtualMemGranularityGetInfo(
pi_context context, pi_device device, size_t mem_size,
pi_virtual_mem_granularity_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret) {
...
size_t granularity;
result = PI_CHECK_ERROR(
cuMemGetAllocationGranularity(&granularity, &alloc_props, flags));
granularity = ROUND_UP(mem_size, granularity);
... Now, if the intention of sycl::get_recommended_mem_granularity is to return a set of granularities, and have the user adjust the size, then current implementation in CUDA backend is ok, and for L0, a size of 1 could be passed. So the main question here is : what is the intention of sycl::get_recommended_mem_granularity? is it to return the granularity based on the size passed (which is what L0 does, and for which we will need changes in the CUDA backend) or to ignore the size and return standard granularities (on which case it would be better to remove num_bytes from sycl::get_recommended_mem_granularity and to pass 1 to L0 API). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This latest change by @steffenlarsen aligns the SYCL API with CUDA, which makes migration easy, so that's good. Does this cause us to lose some performance on Level Zero, though? Let's say the user wants a moderately big (1Mb) address range. With the current API, we'll call Level Zero There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An option is to add another PI function asking the backend to align for us. For CUDA we would just be using the recommended granularity and then L0 could work its magic. It means we would have somewhat similar APIs, but we get the best of both worlds as new users could just leverage this instead of doing their own aligning while people translating code have their 1:1 mapping in the existing functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there's any value in adding an API that just applies the alignment to the user's size. It's easy enough for the application to do that themselves. I'm wondering if Level Zero chooses a different recommended alignment for big vs. small sizes, for example. As a purely hypothetical example, let's say the h/w supports both small and big page sizes. In such a case, it would be better to allocate small data blocks using small pages and large data blocks using big pages. However, each page size would have a different alignment requirement. Is that what's going on with the Level Zero API? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This conversation seems to have stalled out waiting for a response from someone on the Level Zero team. Removing the "size" parameter to I'm just a little worried that there will be some negative impact if we always pass a size of "1" to the Level Zero There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For PVC and ATS-M, zeVirtualMemQueryPageSize will return 64KBytes for any size less than 2MBytes, and will return 2MBytes for any size equal to or greater than 2MBytes. Given this, entering size of 1 should probably not be an issue. |
||
``` | ||
|
||
[frame="topbot",options="header,footer"] | ||
|===================== | ||
|Function |Description | ||
|
||
|`size_t get_minimum_mem_granularity(size_t numBytes, const device &syclDevice, const context &syclContext)` | | ||
Returns the minimum granularity of physical and virtual memory allocations of | ||
byte size `numBytes`. | ||
|
||
If `syclDevice` does not have `aspect::ext_oneapi_virtual_mem` the call throws | ||
an exception with `errc::feature_not_supported`. | ||
|
||
|`size_t get_minimum_mem_granularity(size_t numBytes, const queue &syclQueue)` | | ||
Same as `get_minimum_mem_granularity(numBytes, syclQueue.get_device(), syclQueue.get_context())`. | ||
|
||
|`size_t get_minimum_mem_granularity(size_t numBytes, const physical_mem &syclPhysicalMem)` | | ||
Same as `get_minimum_mem_granularity(numBytes, syclPhysicalMem.get_device(), syclPhysicalMem.get_context())`. | ||
|
||
|`size_t get_recommended_mem_granularity(size_t numBytes, const device &syclDevice, const context &syclContext)` | | ||
Returns the recommended granularity of physical and virtual memory allocations | ||
of byte size `numBytes`. | ||
|
||
If `syclDevice` does not have `aspect::ext_oneapi_virtual_mem` the call throws | ||
an exception with `errc::feature_not_supported`. | ||
|
||
|`size_t get_recommended_mem_granularity(size_t numBytes, const queue &syclQueue)` | | ||
Same as `get_recommended_mem_granularity(numBytes, syclQueue.get_device(), syclQueue.get_context())`. | ||
|
||
|`size_t get_recommended_mem_granularity(size_t numBytes, const physical_mem &syclPhysicalMem)` | | ||
Same as `get_recommended_mem_granularity(numBytes, syclPhysicalMem.get_device(), syclPhysicalMem.get_context())`. | ||
|
||
|===================== | ||
|
||
=== Reserving virtual address ranges | ||
|
||
Virtual address ranges are represented by a pointer and a number of bytes | ||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
reserved for it. The pointer must be aligned in accordance with the minimum | ||
granularity, as queried through `get_minimum_mem_granularity`, and likewise the | ||
number of bytes must be a multiple of this granularity. It is the responsibility | ||
of the user to manage the constituents of any virtual address range they | ||
reserve. | ||
|
||
The interfaces for reserving, freeing, and manipulating the access mode of a | ||
virtual address range are defined as: | ||
|
||
```c++ | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
void *reserve_virtual_mem(const void *start, size_t numBytes, const context &syclContext); | ||
void *reserve_virtual_mem(size_t numBytes, const context &syclContext); | ||
|
||
void free_virtual_mem(const void* ptr, size_t numBytes, const context &syclContext); | ||
|
||
void set_access_mode(const void *ptr, size_t numBytes, access_mode mode, const context &syclContext); | ||
|
||
void set_inaccessible(const void *ptr, size_t numBytes, const context &syclContext); | ||
|
||
std::optional<access_mode> get_access_mode(const void *ptr, size_t numBytes, const context &syclContext); | ||
|
||
} // namespace sycl::ext::oneapi::experimental | ||
``` | ||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[frame="topbot",options="header,footer"] | ||
|===================== | ||
|Function |Description | ||
|
||
|`void *reserve_virtual_mem(const void *start, size_t numBytes, const context &syclContext)` | | ||
Reserves a virtual memory range in `syclContext` with `numBytes` bytes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just realized that there is no If we decide there is not a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The CUDA interface has the restriction
while the Level Zero interface mentions the page size (used here as the granularity) as
Neither takes a device, despite the granularity queries taking a device in both interfaces. I am not sure if either backend will ever return different minimums, I suspect the actual requirement for the alignment and size comes into play when you map them onto physical memory, which are allocated on specific devices. Depending on how we should read the Level Zero requirement here, we could rephrase the reservation interface requirement to be that it must be aligned in accordance with the granularity of any device it will be mapped to. Arguably, this is more of an implicit requirement from the map function though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since neither the Level Zero nor the CUDA API takes a device handle, I assume that both APIs must be reserving the address range in all devices contained by the context. Would you agree? In that case, wouldn't it make sense to remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be okay with it, but what would it do if the devices report different granularity? I assume the best solution would be to try and find the smallest value that is a multiple of all the reported granularities. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's ask @jandres742 about the Level Zero API. The documentation for zeVirtualMemReserve says:
However, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a reasonable restriction to have for other backends? Are page sizes on the Intel GPUs always a multiple of the host page size anyways? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gmlueck :
Both, as as mentioned in the spec: "The starting address and size must be page aligned. See zeVirtualMemQueryPageSize." "The virtual start address and size must be page aligned. See zeVirtualMemQueryPageSize." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This conversation never got resolved. I think the core problem is that the following statement is unclear:
The function I think the solution might be to remove the However, this probably requires a change to Level Zero because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Examining Level Zero code: the zeVirtualMemQueryPageSize() API calls the following internal function: ze_result_t ContextImp::queryVirtualMemPageSize(ze_device_handle_t hDevice, So, the hDevice handle is not used. Going down the call tree, the following is eventually called: size_t DrmMemoryManager::selectAlignmentAndHeap(size_t size, HeapIndex *heap) {
So all detected devices are cycled in this loop. I verified this to be the case using a board with 4x ATS-M devices and verified that all 4 devices were looped. Based upon this, only one call to zeVirtualMemQueryPageSize() should be required for the given driver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any chance we can change the parameters of |
||
|
||
`start` specifies the requested start of the new virtual memory range | ||
reservation. If the implementation is unable to reserve the virtual memory range | ||
at the specified address, the implementation will pick another suitable address. | ||
|
||
`start` must be aligned in accordance with the minimum granularity, as returned | ||
by a call to `get_minimum_mem_granularity`. Likewise, `numBytes` must be a | ||
multiple of the granularity. Attempting to call this function without meeting | ||
these requirements results in undefined behavior. | ||
|
||
If any of the devices in `syclContext` does not have | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`aspect::ext_oneapi_virtual_mem` the call throws an exception with | ||
`errc::feature_not_supported`. | ||
|
||
|`void *reserve_virtual_mem(size_t numBytes, const device &syclDevice, const context &syclContext)` | | ||
Same as `reserve_virtual_mem(nullptr, numBytes, syclDevice, syclContext)`. | ||
|
||
|`void free_virtual_mem(const void* ptr, size_t numBytes, const context &syclContext)` | | ||
Frees a virtual memory range specified by `ptr` and `numBytes`. `ptr` must be | ||
the same as returned by a call to `reserve_virtual_mem` and `numBytes` must be | ||
the same as the size of the range specified in the reservation call. | ||
|
||
|`void set_access_mode(const void *ptr, size_t numBytes, access_mode mode, const context &syclContext)` | | ||
Sets the access mode of a virtual memory range specified by `ptr` and | ||
`numBytes`. `mode` must either be `access_mode::read` or | ||
`access_mode::read_write`. | ||
|
||
Writing to any address in the virtual memory range with access mode set to | ||
`access_mode::read` results in undefined behavior. | ||
|
||
|`void set_inaccessible(const void *ptr, size_t numBytes, const context &syclContext)` | | ||
Sets a virtual memory range, specified by `ptr` and `numBytes`, as inaccessible. | ||
Accessing an address in an inaccessible virtual memory range results in | ||
undefined behavior. | ||
|
||
|`std::optional<access_mode> get_access_mode(const void *ptr, size_t numBytes, const context &syclContext)` | | ||
Returns the access mode of the virtual memory range specified by `ptr` and | ||
`numBytes`. If the virtual memory range is inaccessible `std::nullopt` is | ||
returned. | ||
|
||
|===================== | ||
|
||
|
||
=== Physical memory representation | ||
|
||
To represent the underlying physical device memory a virtual address is mapped, | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the `physical_mem` class is added. This new class is defined as: | ||
|
||
```c++ | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
class physical_mem { | ||
public: | ||
physical_mem(const device &syclDevice, const context &syclContext, size_t numBytes); | ||
physical_mem(const queue &syclQueue, size_t numBytes); | ||
|
||
/* -- common interface members -- */ | ||
|
||
void map(const void *ptr, size_t numBytes, size_t offset) const; | ||
void map(const void *ptr, size_t numBytes, size_t offset, access_mode mode) const; | ||
|
||
context get_context() const; | ||
device get_device() const; | ||
|
||
size_t size() const noexcept; | ||
}; | ||
|
||
} // namespace sycl::ext::oneapi::experimental | ||
``` | ||
|
||
`physical_mem` has common reference semantics, as described in | ||
[section 4.5.2. Common reference semantics](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reference-semantics). | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[frame="topbot",options="header,footer"] | ||
|============================ | ||
|Member function |Description | ||
|
||
|`physical_mem(const device &syclDevice, const context &syclContext, size_t numBytes)` | | ||
Constructs a `physical_mem` instance using the `syclDevice` provided. This | ||
device must either be contained by syclContext or it must be a descendent device | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
of some device that is contained by that context, otherwise this function throws | ||
a synchronous exception with the errc::invalid error code. | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This will allocate `numBytes` of physical memory on the device. `numBytes` must | ||
be a multiple of the minimum granularity, as returned by a call to | ||
`get_minimum_mem_granularity` | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|`physical_mem(const queue &syclQueue, size_t numBytes)` | | ||
Same as `physical_mem(syclQueue.get_device(), syclQueue.get_context, numBytes)`. | ||
|
||
|`void map(const void *ptr, size_t numBytes, size_t offset)` | | ||
Maps a virtual memory range, specified by `ptr` and `numBytes`, to the physical | ||
memory corresponding to the corresponding instance of `physical_mem`. | ||
|
||
The virtual memory range is inaccessible after this call and can be made | ||
accessible through a call to `set_access_mode`. Accessing an address in an | ||
inaccessible virtual memory range results in undefined behavior. | ||
|
||
|`void map(const void *ptr, size_t numBytes, size_t offset, access_mode mode)` | | ||
Maps a virtual memory range, specified by `ptr` and `numBytes`, to the physical | ||
memory corresponding to the corresponding instance of `physical_mem`. | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
After this call the virtual memory range is accessible on the corresponding | ||
device in the access mode specified by `mode`. `mode` must either be | ||
`access_mode::read` or `access_mode::read_write`. | ||
|
||
Writing to any address in the virtual memory range with access mode set to | ||
`access_mode::read` results in undefined behavior. | ||
|
||
|`context get_context() const` | | ||
Returns the SYCL context associated with the instance of `physical_mem`. | ||
|
||
|`device get_device() const` | | ||
Returns the SYCL device associated with the instance of `physical_mem`. | ||
|
||
|`size_t size() const` | | ||
Returns the size of the corresponding physical memory in bytes. | ||
|
||
|============================ | ||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Virtual memory address ranges are mapped to the a `physical_mem` through the | ||
`map` member functions. However, to unmap the virtual memory range the user | ||
only needs to know the context associated with the `physical_mem` the address | ||
range was mapped to. As such, the corresponding `unmap` is a free function | ||
defined as: | ||
|
||
```c++ | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
void unmap(const void *ptr, size_t numBytes, const context &syclContext); | ||
|
||
} // namespace sycl::ext::oneapi::experimental | ||
``` | ||
|
||
[frame="topbot",options="header,footer"] | ||
|===================== | ||
|Function |Description | ||
|
||
|`void unmap(const void *ptr, size_t numBytes, const device &syclDevice, const context &syclContext)` | | ||
KseniyaTikhomirova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Unmaps the range specified by `ptr` and `numBytes`. The range must have been | ||
mapped through a call to `physical_mem::map()` prior to calling this. The range | ||
must not be a proper sub-range of a previously mapped range. | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
After this call, the range will again be ready to be mapped through a call to | ||
`physical_mem::map()`. | ||
|
||
|===================== |
Uh oh!
There was an error while loading. Please reload this page.