@@ -617,6 +617,22 @@ amd_gpu_gfx1201
617
617
|AMD RDNA 4 architecture.
618
618
|===
619
619
620
+ The enumerators are guaranteed to be partially ordered, which means that some
621
+ comparison operations (e.g. `<`, `>`) are meaningful.
622
+ Specifically, the following guarantees are provided:
623
+
624
+ * When an enumerator's value is defined in the table above as equal to some
625
+ other enumerator, the values of the two enumerators are guaranteed to be the
626
+ same.
627
+
628
+ * The enumerators within a "family" (e.g. the Intel GPU family) are guaranteed
629
+ to have ascending values in the order that they are defined in the table
630
+ above.
631
+ (Except, of course, for the enumerators that are defined to have a value that
632
+ is equal to some other enumerator.)
633
+
634
+ Enumerators from different families have no guaranteed relative order.
635
+
620
636
[_Note:_ An "alias" enumerator is generally added for new Intel GPU devices
621
637
only after hardware has finalized and the exact version is known.
622
638
_{endnote}_]
@@ -628,9 +644,65 @@ of the device, and `if_architecture_is` can be used similarly to the
628
644
`+__CUDA_ARCH__+` macro in CUDA.
629
645
_{endnote}_]
630
646
631
- === New `if_architecture_is` free function
647
+ === New enumeration of architecture categories
648
+
649
+ This extension adds a new enumeration of various categories of device
650
+ architectures.
651
+
652
+ [source]
653
+ ----
654
+ namespace sycl::ext::oneapi::experimental {
655
+
656
+ enum class arch_category : /* unspecified */ {
657
+ // See table below for list of enumerators
658
+ };
659
+
660
+ } // namespace sycl::ext::oneapi::experimental
661
+ ----
632
662
633
- This extension adds the following new free function which may be called from
663
+ The following table specifies the enumerators that are available and tells
664
+ which version of this extension first included each of these enumerators.
665
+
666
+ [%header,cols="5,1,5"]
667
+ |===
668
+ |Enumerator name
669
+ |Added in version
670
+ |Description
671
+
672
+ a|
673
+ [source]
674
+ ----
675
+ intel_gpu
676
+ ----
677
+ |-
678
+ |
679
+ Any Intel GPU device.
680
+ This category includes all device architectures in the Intel GPU family.
681
+
682
+ a|
683
+ [source]
684
+ ----
685
+ nvidia_gpu
686
+ ----
687
+ |-
688
+ |
689
+ Any Nvidia GPU device.
690
+ This category includes all device architectures in the Nvidia GPU family.
691
+
692
+ a|
693
+ [source]
694
+ ----
695
+ amd_gpu
696
+ ----
697
+ |-
698
+ |
699
+ Any AMD GPU device.
700
+ This category includes all device architectures in the AMD GPU family.
701
+ |===
702
+
703
+ === New free functions to query the architecture in device code
704
+
705
+ This extension adds the following new free functions which may be called from
634
706
device code.
635
707
636
708
|====
642
714
----
643
715
namespace sycl::ext::oneapi::experimental {
644
716
645
- template<architecture ...Archs, typename T>
717
+ template<architecture ...Archs, typename T> (1)
718
+ /* unspecified */ if_architecture_is(T fn);
719
+
720
+ template<arch_category ...Categories, typename T> (2)
646
721
/* unspecified */ if_architecture_is(T fn);
647
722
723
+ template<architecture Arch, typename T> (3)
724
+ /* unspecified */ if_architecture_is_lt(T fn);
725
+
726
+ template<architecture Arch, typename T> (4)
727
+ /* unspecified */ if_architecture_is_le(T fn);
728
+
729
+ template<architecture Arch, typename T> (5)
730
+ /* unspecified */ if_architecture_is_gt(T fn);
731
+
732
+ template<architecture Arch, typename T> (6)
733
+ /* unspecified */ if_architecture_is_ge(T fn);
734
+
735
+ template<architecture Arch1, architecture Arch2, typename T> (7)
736
+ /* unspecified */ if_architecture_is_between(T fn);
737
+
648
738
} // namespace sycl::ext::oneapi::experimental
649
739
----
650
740
!====
651
741
652
742
_Constraints:_ The type `T` must be a {cpp} `Callable` type which is invocable
653
743
with an empty parameter list.
654
744
745
+ _Mandates (7):_ The architecture `Arch1` must be in the same family as `Arch2`.
746
+
655
747
_Preconditions:_ This function must be called from device code.
656
748
657
- _Effects:_ The `Archs` parameter pack identifies the condition that gates
658
- execution of the callable object `fn`.
659
- This condition is `true` only if the device which executes the
660
- `if_architecture_is` function has any one of the architectures listed in this
661
- pack.
749
+ _Effects:_ The template parameters to these functions identify a condition that
750
+ gates execution of the callable object `fn`.
751
+ If the condition is `true`, the object `fn` is called.
662
752
Otherwise, the function `fn` is potentially discarded as described in the
663
753
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
664
754
extension.
665
755
756
+ For (1), the condition is `true` only if the device which executes the
757
+ `if_architecture_is` function has any one of the architectures listed in the
758
+ `Archs` pack.
759
+
760
+ For (2), the condition is `true` only if the device which executes the
761
+ `if_architecture_is` function has an architecture that is in any one of the
762
+ categories listed in the `Categories` pack.
763
+
764
+ For (3), the condition is `true` only if the device which executes the
765
+ `if_architecture_is_lt` function has an architecture that is in the same
766
+ family as `Arch` and compares less than `Arch`.
767
+
768
+ For (4), the condition is `true` only if the device which executes the
769
+ `if_architecture_is_le` function has an architecture that is in the same
770
+ family as `Arch` and compares less than or equal to `Arch`.
771
+
772
+ For (5), the condition is `true` only if the device which executes the
773
+ `if_architecture_is_gt` function has an architecture that is in the same
774
+ family as `Arch` and compares greater than `Arch`.
775
+
776
+ For (6), the condition is `true` only if the device which executes the
777
+ `if_architecture_is_ge` function has an architecture that is in the same
778
+ family as `Arch` and compares greater than or equal to `Arch`.
779
+
780
+ For (7), the condition is `true` only if the device which executes the
781
+ `if_architecture_is_between` function has an architecture that is in the same
782
+ family as `Arch1` and is greater than or equal to `Arch1` and is less than or
783
+ equal to `Arch2`.
784
+
666
785
_Returns:_ An object _F_ of the unnamed "else" class, which can be used to
667
786
perform if-then-else chains.
668
787
|====
702
821
a!
703
822
[source]
704
823
----
705
- template<architecture ...Archs, typename T>
824
+ template<architecture ...Archs, typename T> (1)
825
+ /* unspecified */ else_if_architecture_is(T fn);
826
+
827
+ template<arch_category ...Categories, typename T> (2)
706
828
/* unspecified */ else_if_architecture_is(T fn);
829
+
830
+ template<architecture Arch, typename T> (3)
831
+ /* unspecified */ else_if_architecture_is_lt(T fn);
832
+
833
+ template<architecture Arch, typename T> (4)
834
+ /* unspecified */ else_if_architecture_is_le(T fn);
835
+
836
+ template<architecture Arch, typename T> (5)
837
+ /* unspecified */ else_if_architecture_is_gt(T fn);
838
+
839
+ template<architecture Arch, typename T> (6)
840
+ /* unspecified */ else_if_architecture_is_ge(T fn);
841
+
842
+ template<architecture Arch1, architecture Arch2, typename T> (7)
843
+ /* unspecified */ else_if_architecture_is_between(T fn);
707
844
----
708
845
!====
709
846
710
847
_Constraints:_ The type `T` must be a {cpp} `Callable` type which is invocable
711
848
with an empty parameter list.
712
849
713
- _Effects:_ This function has an associated condition that gates execution of
850
+ _Mandates (7):_ The architecture `Arch1` must be in the same family as `Arch2`.
851
+
852
+ _Effects:_ These functions have an associated condition that gates execution of
714
853
the callable object `fn`.
715
- This condition is `true` only if the object _F_ comes from a previous call
716
- whose associated condition is `false` *and* if the device calling
717
- `else_if_architecture_is` has any one of the architectures in the `Archs`
718
- parameter pack.
854
+ If the condition is `true`, the object `fn` is called.
719
855
Otherwise, the function `fn` is potentially discarded as described in the
720
856
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
721
857
extension.
722
858
859
+ For (1), the condition is `true` only if the object _F_ comes from a previous
860
+ call whose associated condition is `false` *and* if the device which executes
861
+ the `else_if_architecture_is` function has any one of the architectures listed
862
+ in the `Archs` parameter pack.
863
+
864
+ For (2), the condition is `true` only if the object _F_ comes from a previous
865
+ call whose associated condition is `false` *and* if the device which executes
866
+ the `else_if_architecture_is` function has an architecture that is in any one
867
+ of the categories listed in the `Categories` pack.
868
+
869
+ For (3), the condition is `true` only if the object _F_ comes from a previous
870
+ call whose associated condition is `false` *and* if the device which executes
871
+ the `else_if_architecture_is_lt` function has an architecture that is in the
872
+ same family as `Arch` and compares less than `Arch`.
873
+
874
+ For (4), the condition is `true` only if the object _F_ comes from a previous
875
+ call whose associated condition is `false` *and* if the device which executes
876
+ the `else_if_architecture_is_le` function has an architecture that is in the
877
+ same family as `Arch` and compares less than or equal to `Arch`.
878
+
879
+ For (5), the condition is `true` only if the object _F_ comes from a previous
880
+ call whose associated condition is `false` *and* if the device which executes
881
+ the `else_if_architecture_is_gt` function has an architecture that is in the
882
+ same family as `Arch` and compares greater than `Arch`.
883
+
884
+ For (6), the condition is `true` only if the object _F_ comes from a previous
885
+ call whose associated condition is `false` *and* if the device which executes
886
+ the `else_if_architecture_is_ge` function has an architecture that is in the
887
+ same family as `Arch` and compares greater than or equal to `Arch`.
888
+
889
+ For (7), the condition is `true` only if the object _F_ comes from a previous
890
+ call whose associated condition is `false` *and* if the device which executes
891
+ the `else_if_architecture_is_between` function has an architecture that is in
892
+ the same family as `Arch1` and is greater than or equal to `Arch1` and is less
893
+ than or equal to `Arch2`.
894
+
723
895
_Returns:_ An object _F_ of the unnamed "else" class, which can be used to
724
896
perform if-then-else chains.
725
897
|====
738
910
namespace sycl {
739
911
740
912
class device {
741
- bool ext_oneapi_architecture_is(
913
+ bool ext_oneapi_architecture_is( (1)
742
914
ext::oneapi::experimental::architecture arch);
915
+
916
+ bool ext_oneapi_architecture_is( (2)
917
+ ext::oneapi::experimental::arch_category category);
743
918
};
744
919
745
920
} // namespace sycl
746
921
----
747
922
!====
748
923
749
- _Returns:_ The value `true` only if the device's architecture is equal to
924
+ _Returns (1) :_ The value `true` only if the device's architecture is equal to
750
925
`arch`.
926
+
927
+ _Returns (2):_ The value `true` only if the device's architecture is in the
928
+ category `category`.
751
929
|====
752
930
753
931
=== New device information descriptor
@@ -790,13 +968,28 @@ static constexpr size_t N = 1000;
790
968
int main() {
791
969
sycl::queue q;
792
970
971
+ // Testing for a specific architecture.
793
972
q.parallel_for({N}, [=](auto i) {
794
973
syclex::if_architecture_is<syclex::architecture::intel_gpu_pvc>([&]{
795
974
// Code for PVC
796
975
}).otherwise([&]{
797
976
// Fallback code
798
977
});
799
978
});
979
+
980
+ // Testing for an architecture category or a range of architectures.
981
+ q.parallel_for({N}, [=](auto i) {
982
+ syclex::if_architecture_is<syclex::arch_category::intel_gpu>([&]{
983
+ // Code for an Intel GPU
984
+ }).else_if_architecture_ge<syclex::architecture::nvidia_gpu_sm80>([&]{
985
+ // Code Nvidia compute capability >= 8.x
986
+ }).else_if_architecture_is_between<syclex::architecture::amd_gpu_gfx1010,
987
+ syclex::architecture::amd_gpu_gfx1013>([&]{
988
+ // Code for AMD devices between gfx1010 and gfx1013 (inclusive)
989
+ }).otherwise([&]{
990
+ // Fallback code
991
+ });
992
+ });
800
993
}
801
994
----
802
995
@@ -845,6 +1038,28 @@ They currently exist only for use with the
845
1038
link:sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc[sycl_ext_oneapi_matrix]
846
1039
extension.
847
1040
1041
+
1042
+ == Implementation notes
1043
+
1044
+ Some planning is required when choosing the values for the `architecture`
1045
+ enumerators because applications will expect comparisons to reflect the
1046
+ features that are available.
1047
+ For example, an application might assume that
1048
+ `arch >= architecture::intel_gpu_pvc` tests for an Intel GPU that is PVC or
1049
+ newer.
1050
+ For the Intel GPUs, the order of the enumerators should be the same as the
1051
+ order of the device's 32-bit GMDID values.
1052
+ One solution is to use the GMDID value as the value of the enumerator, but we
1053
+ must ensure that the value does not accidentally collide with a value from
1054
+ another architecture family.
1055
+ We could potentially use the top bits to distinguish between architecture
1056
+ families, but this could cause problems if future GMDID values start using
1057
+ those top bits.
1058
+ Another option is to use a 64-bit base type for the enumeration.
1059
+ Whatever strategy we choose, we should not need to renumber the enumerators
1060
+ whenever a new one is added because this would constitute an ABI break.
1061
+
1062
+
848
1063
== Future direction
849
1064
850
1065
This experimental extension is still evolving.
@@ -883,5 +1098,6 @@ features that are available on devices with the given architecture list but may
883
1098
not be available on devices with other architectures.
884
1099
--
885
1100
886
- * Additional enumerators in the `architecture` enumeration.
1101
+ * Additional enumerators in the `architecture` and `arch_category`
1102
+ enumerations.
887
1103
This could include entries for different x86_64 architectures.
0 commit comments