Skip to content

Commit 3613ca3

Browse files
authored
[SYCL][Doc]Comparisons in "if_architecture_is" (#12259)
Update the "if_architecture_is" extension spec to allow greater-than and less-than comparisons between architecture values.
1 parent bdd5063 commit 3613ca3

File tree

1 file changed

+233
-17
lines changed

1 file changed

+233
-17
lines changed

sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc

Lines changed: 233 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -617,6 +617,22 @@ amd_gpu_gfx1201
617617
|AMD RDNA 4 architecture.
618618
|===
619619

620+
The enumerators are guaranteed to be partially ordered, which means that some
621+
comparison operations (e.g. `<`, `>`) are meaningful.
622+
Specifically, the following guarantees are provided:
623+
624+
* When an enumerator's value is defined in the table above as equal to some
625+
other enumerator, the values of the two enumerators are guaranteed to be the
626+
same.
627+
628+
* The enumerators within a "family" (e.g. the Intel GPU family) are guaranteed
629+
to have ascending values in the order that they are defined in the table
630+
above.
631+
(Except, of course, for the enumerators that are defined to have a value that
632+
is equal to some other enumerator.)
633+
634+
Enumerators from different families have no guaranteed relative order.
635+
620636
[_Note:_ An "alias" enumerator is generally added for new Intel GPU devices
621637
only after hardware has finalized and the exact version is known.
622638
_{endnote}_]
@@ -628,9 +644,65 @@ of the device, and `if_architecture_is` can be used similarly to the
628644
`+__CUDA_ARCH__+` macro in CUDA.
629645
_{endnote}_]
630646

631-
=== New `if_architecture_is` free function
647+
=== New enumeration of architecture categories
648+
649+
This extension adds a new enumeration of various categories of device
650+
architectures.
651+
652+
[source]
653+
----
654+
namespace sycl::ext::oneapi::experimental {
655+
656+
enum class arch_category : /* unspecified */ {
657+
// See table below for list of enumerators
658+
};
659+
660+
} // namespace sycl::ext::oneapi::experimental
661+
----
632662

633-
This extension adds the following new free function which may be called from
663+
The following table specifies the enumerators that are available and tells
664+
which version of this extension first included each of these enumerators.
665+
666+
[%header,cols="5,1,5"]
667+
|===
668+
|Enumerator name
669+
|Added in version
670+
|Description
671+
672+
a|
673+
[source]
674+
----
675+
intel_gpu
676+
----
677+
|-
678+
|
679+
Any Intel GPU device.
680+
This category includes all device architectures in the Intel GPU family.
681+
682+
a|
683+
[source]
684+
----
685+
nvidia_gpu
686+
----
687+
|-
688+
|
689+
Any Nvidia GPU device.
690+
This category includes all device architectures in the Nvidia GPU family.
691+
692+
a|
693+
[source]
694+
----
695+
amd_gpu
696+
----
697+
|-
698+
|
699+
Any AMD GPU device.
700+
This category includes all device architectures in the AMD GPU family.
701+
|===
702+
703+
=== New free functions to query the architecture in device code
704+
705+
This extension adds the following new free functions which may be called from
634706
device code.
635707

636708
|====
@@ -642,27 +714,74 @@ a!
642714
----
643715
namespace sycl::ext::oneapi::experimental {
644716
645-
template<architecture ...Archs, typename T>
717+
template<architecture ...Archs, typename T> (1)
718+
/* unspecified */ if_architecture_is(T fn);
719+
720+
template<arch_category ...Categories, typename T> (2)
646721
/* unspecified */ if_architecture_is(T fn);
647722
723+
template<architecture Arch, typename T> (3)
724+
/* unspecified */ if_architecture_is_lt(T fn);
725+
726+
template<architecture Arch, typename T> (4)
727+
/* unspecified */ if_architecture_is_le(T fn);
728+
729+
template<architecture Arch, typename T> (5)
730+
/* unspecified */ if_architecture_is_gt(T fn);
731+
732+
template<architecture Arch, typename T> (6)
733+
/* unspecified */ if_architecture_is_ge(T fn);
734+
735+
template<architecture Arch1, architecture Arch2, typename T> (7)
736+
/* unspecified */ if_architecture_is_between(T fn);
737+
648738
} // namespace sycl::ext::oneapi::experimental
649739
----
650740
!====
651741

652742
_Constraints:_ The type `T` must be a {cpp} `Callable` type which is invocable
653743
with an empty parameter list.
654744

745+
_Mandates (7):_ The architecture `Arch1` must be in the same family as `Arch2`.
746+
655747
_Preconditions:_ This function must be called from device code.
656748

657-
_Effects:_ The `Archs` parameter pack identifies the condition that gates
658-
execution of the callable object `fn`.
659-
This condition is `true` only if the device which executes the
660-
`if_architecture_is` function has any one of the architectures listed in this
661-
pack.
749+
_Effects:_ The template parameters to these functions identify a condition that
750+
gates execution of the callable object `fn`.
751+
If the condition is `true`, the object `fn` is called.
662752
Otherwise, the function `fn` is potentially discarded as described in the
663753
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
664754
extension.
665755

756+
For (1), the condition is `true` only if the device which executes the
757+
`if_architecture_is` function has any one of the architectures listed in the
758+
`Archs` pack.
759+
760+
For (2), the condition is `true` only if the device which executes the
761+
`if_architecture_is` function has an architecture that is in any one of the
762+
categories listed in the `Categories` pack.
763+
764+
For (3), the condition is `true` only if the device which executes the
765+
`if_architecture_is_lt` function has an architecture that is in the same
766+
family as `Arch` and compares less than `Arch`.
767+
768+
For (4), the condition is `true` only if the device which executes the
769+
`if_architecture_is_le` function has an architecture that is in the same
770+
family as `Arch` and compares less than or equal to `Arch`.
771+
772+
For (5), the condition is `true` only if the device which executes the
773+
`if_architecture_is_gt` function has an architecture that is in the same
774+
family as `Arch` and compares greater than `Arch`.
775+
776+
For (6), the condition is `true` only if the device which executes the
777+
`if_architecture_is_ge` function has an architecture that is in the same
778+
family as `Arch` and compares greater than or equal to `Arch`.
779+
780+
For (7), the condition is `true` only if the device which executes the
781+
`if_architecture_is_between` function has an architecture that is in the same
782+
family as `Arch1` and is greater than or equal to `Arch1` and is less than or
783+
equal to `Arch2`.
784+
666785
_Returns:_ An object _F_ of the unnamed "else" class, which can be used to
667786
perform if-then-else chains.
668787
|====
@@ -702,24 +821,77 @@ a|
702821
a!
703822
[source]
704823
----
705-
template<architecture ...Archs, typename T>
824+
template<architecture ...Archs, typename T> (1)
825+
/* unspecified */ else_if_architecture_is(T fn);
826+
827+
template<arch_category ...Categories, typename T> (2)
706828
/* unspecified */ else_if_architecture_is(T fn);
829+
830+
template<architecture Arch, typename T> (3)
831+
/* unspecified */ else_if_architecture_is_lt(T fn);
832+
833+
template<architecture Arch, typename T> (4)
834+
/* unspecified */ else_if_architecture_is_le(T fn);
835+
836+
template<architecture Arch, typename T> (5)
837+
/* unspecified */ else_if_architecture_is_gt(T fn);
838+
839+
template<architecture Arch, typename T> (6)
840+
/* unspecified */ else_if_architecture_is_ge(T fn);
841+
842+
template<architecture Arch1, architecture Arch2, typename T> (7)
843+
/* unspecified */ else_if_architecture_is_between(T fn);
707844
----
708845
!====
709846

710847
_Constraints:_ The type `T` must be a {cpp} `Callable` type which is invocable
711848
with an empty parameter list.
712849

713-
_Effects:_ This function has an associated condition that gates execution of
850+
_Mandates (7):_ The architecture `Arch1` must be in the same family as `Arch2`.
851+
852+
_Effects:_ These functions have an associated condition that gates execution of
714853
the callable object `fn`.
715-
This condition is `true` only if the object _F_ comes from a previous call
716-
whose associated condition is `false` *and* if the device calling
717-
`else_if_architecture_is` has any one of the architectures in the `Archs`
718-
parameter pack.
854+
If the condition is `true`, the object `fn` is called.
719855
Otherwise, the function `fn` is potentially discarded as described in the
720856
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
721857
extension.
722858

859+
For (1), the condition is `true` only if the object _F_ comes from a previous
860+
call whose associated condition is `false` *and* if the device which executes
861+
the `else_if_architecture_is` function has any one of the architectures listed
862+
in the `Archs` parameter pack.
863+
864+
For (2), the condition is `true` only if the object _F_ comes from a previous
865+
call whose associated condition is `false` *and* if the device which executes
866+
the `else_if_architecture_is` function has an architecture that is in any one
867+
of the categories listed in the `Categories` pack.
868+
869+
For (3), the condition is `true` only if the object _F_ comes from a previous
870+
call whose associated condition is `false` *and* if the device which executes
871+
the `else_if_architecture_is_lt` function has an architecture that is in the
872+
same family as `Arch` and compares less than `Arch`.
873+
874+
For (4), the condition is `true` only if the object _F_ comes from a previous
875+
call whose associated condition is `false` *and* if the device which executes
876+
the `else_if_architecture_is_le` function has an architecture that is in the
877+
same family as `Arch` and compares less than or equal to `Arch`.
878+
879+
For (5), the condition is `true` only if the object _F_ comes from a previous
880+
call whose associated condition is `false` *and* if the device which executes
881+
the `else_if_architecture_is_gt` function has an architecture that is in the
882+
same family as `Arch` and compares greater than `Arch`.
883+
884+
For (6), the condition is `true` only if the object _F_ comes from a previous
885+
call whose associated condition is `false` *and* if the device which executes
886+
the `else_if_architecture_is_ge` function has an architecture that is in the
887+
same family as `Arch` and compares greater than or equal to `Arch`.
888+
889+
For (7), the condition is `true` only if the object _F_ comes from a previous
890+
call whose associated condition is `false` *and* if the device which executes
891+
the `else_if_architecture_is_between` function has an architecture that is in
892+
the same family as `Arch1` and is greater than or equal to `Arch1` and is less
893+
than or equal to `Arch2`.
894+
723895
_Returns:_ An object _F_ of the unnamed "else" class, which can be used to
724896
perform if-then-else chains.
725897
|====
@@ -738,16 +910,22 @@ a!
738910
namespace sycl {
739911
740912
class device {
741-
bool ext_oneapi_architecture_is(
913+
bool ext_oneapi_architecture_is( (1)
742914
ext::oneapi::experimental::architecture arch);
915+
916+
bool ext_oneapi_architecture_is( (2)
917+
ext::oneapi::experimental::arch_category category);
743918
};
744919
745920
} // namespace sycl
746921
----
747922
!====
748923

749-
_Returns:_ The value `true` only if the device's architecture is equal to
924+
_Returns (1):_ The value `true` only if the device's architecture is equal to
750925
`arch`.
926+
927+
_Returns (2):_ The value `true` only if the device's architecture is in the
928+
category `category`.
751929
|====
752930

753931
=== New device information descriptor
@@ -790,13 +968,28 @@ static constexpr size_t N = 1000;
790968
int main() {
791969
sycl::queue q;
792970
971+
// Testing for a specific architecture.
793972
q.parallel_for({N}, [=](auto i) {
794973
syclex::if_architecture_is<syclex::architecture::intel_gpu_pvc>([&]{
795974
// Code for PVC
796975
}).otherwise([&]{
797976
// Fallback code
798977
});
799978
});
979+
980+
// Testing for an architecture category or a range of architectures.
981+
q.parallel_for({N}, [=](auto i) {
982+
syclex::if_architecture_is<syclex::arch_category::intel_gpu>([&]{
983+
// Code for an Intel GPU
984+
}).else_if_architecture_ge<syclex::architecture::nvidia_gpu_sm80>([&]{
985+
// Code Nvidia compute capability >= 8.x
986+
}).else_if_architecture_is_between<syclex::architecture::amd_gpu_gfx1010,
987+
syclex::architecture::amd_gpu_gfx1013>([&]{
988+
// Code for AMD devices between gfx1010 and gfx1013 (inclusive)
989+
}).otherwise([&]{
990+
// Fallback code
991+
});
992+
});
800993
}
801994
----
802995

@@ -845,6 +1038,28 @@ They currently exist only for use with the
8451038
link:sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc[sycl_ext_oneapi_matrix]
8461039
extension.
8471040

1041+
1042+
== Implementation notes
1043+
1044+
Some planning is required when choosing the values for the `architecture`
1045+
enumerators because applications will expect comparisons to reflect the
1046+
features that are available.
1047+
For example, an application might assume that
1048+
`arch >= architecture::intel_gpu_pvc` tests for an Intel GPU that is PVC or
1049+
newer.
1050+
For the Intel GPUs, the order of the enumerators should be the same as the
1051+
order of the device's 32-bit GMDID values.
1052+
One solution is to use the GMDID value as the value of the enumerator, but we
1053+
must ensure that the value does not accidentally collide with a value from
1054+
another architecture family.
1055+
We could potentially use the top bits to distinguish between architecture
1056+
families, but this could cause problems if future GMDID values start using
1057+
those top bits.
1058+
Another option is to use a 64-bit base type for the enumeration.
1059+
Whatever strategy we choose, we should not need to renumber the enumerators
1060+
whenever a new one is added because this would constitute an ABI break.
1061+
1062+
8481063
== Future direction
8491064

8501065
This experimental extension is still evolving.
@@ -883,5 +1098,6 @@ features that are available on devices with the given architecture list but may
8831098
not be available on devices with other architectures.
8841099
--
8851100

886-
* Additional enumerators in the `architecture` enumeration.
1101+
* Additional enumerators in the `architecture` and `arch_category`
1102+
enumerations.
8871103
This could include entries for different x86_64 architectures.

0 commit comments

Comments
 (0)