Skip to content

Commit b6a11a2

Browse files
authored
Merge 852f7a3 into 898b6f2
2 parents 898b6f2 + 852f7a3 commit b6a11a2

File tree

4 files changed

+469
-102
lines changed

4 files changed

+469
-102
lines changed

dpnp/dpnp_iface_statistics.py

Lines changed: 149 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
to_supported_dtypes,
5555
)
5656

57-
from .dpnp_utils import call_origin, get_usm_allocations
57+
from .dpnp_utils import get_usm_allocations
5858
from .dpnp_utils.dpnp_utils_reduction import dpnp_wrap_reduction_call
5959
from .dpnp_utils.dpnp_utils_statistics import dpnp_cov, dpnp_median
6060

@@ -748,61 +748,171 @@ def cov(
748748
749749
For full documentation refer to :obj:`numpy.cov`.
750750
751+
Parameters
752+
----------
753+
m : {dpnp.ndarray, usm_ndarray}
754+
A 1-D or 2-D array containing multiple variables and observations.
755+
Each row of `m` represents a variable, and each column a single
756+
observation of all those variables. Also see `rowvar` below.
757+
y : {None, dpnp.ndarray, usm_ndarray}, optional
758+
An additional set of variables and observations. `y` has the same form
759+
as that of `m`.
760+
761+
Default: ``None``.
762+
rowvar : bool, optional
763+
If `rowvar` is ``True``, then each row represents a variable, with
764+
observations in the columns. Otherwise, the relationship is transposed:
765+
each column represents a variable, while the rows contain observations.
766+
767+
Default: ``True``.
768+
bias : bool, optional
769+
Default normalization is by ``(N - 1)``, where ``N`` is the number of
770+
observations given (unbiased estimate). If `bias` is ``True``, then
771+
normalization is by ``N``. These values can be overridden by using the
772+
keyword `ddof`.
773+
774+
Default: ``False``.
775+
ddof : {None, int}, optional
776+
If not ``None`` the default value implied by `bias` is overridden. Note
777+
that ``ddof=1`` will return the unbiased estimate, even if both
778+
`fweights` and `aweights` are specified, and ``ddof=0`` will return the
779+
simple average. See the notes for the details.
780+
781+
Default: ``None``.
782+
fweights : {None, dpnp.ndarray, usm_ndarray}, optional
783+
1-D array of integer frequency weights; the number of times each
784+
observation vector should be repeated.
785+
It is required that ``fweights >= 0``. However, the function will not
786+
error when ``fweights < 0`` for performance reasons.
787+
788+
Default: ``None``.
789+
aweights : {None, dpnp.ndarray, usm_ndarray}, optional
790+
1-D array of observation vector weights. These relative weights are
791+
typically large for observations considered "important" and smaller for
792+
observations considered less "important". If ``ddof=0`` the array of
793+
weights can be used to assign probabilities to observation vectors.
794+
It is required that ``aweights >= 0``. However, the function will not
795+
error when ``aweights < 0`` for performance reasons.
796+
797+
Default: ``None``.
798+
dtype : data-type, optional
799+
Data-type of the result. By default, the return data-type will have
800+
at least floating point type based on the capabilities of the device on
801+
which the input arrays reside.
802+
803+
Default: ``None``.
804+
751805
Returns
752806
-------
753807
out : dpnp.ndarray
754808
The covariance matrix of the variables.
755809
756-
Limitations
757-
-----------
758-
Input array ``m`` is supported as :obj:`dpnp.ndarray`.
759-
Dimension of input array ``m`` is limited by ``m.ndim <= 2``.
760-
Size and shape of input arrays are supported to be equal.
761-
Parameter `y` is supported only with default value ``None``.
762-
Parameter `bias` is supported only with default value ``False``.
763-
Parameter `ddof` is supported only with default value ``None``.
764-
Parameter `fweights` is supported only with default value ``None``.
765-
Parameter `aweights` is supported only with default value ``None``.
766-
Otherwise the function will be executed sequentially on CPU.
767-
Input array data types are limited by supported DPNP :ref:`Data types`.
768-
769810
See Also
770811
--------
771-
:obj:`dpnp.corrcoef` : Normalized covariance matrix
812+
:obj:`dpnp.corrcoef` : Normalized covariance matrix.
813+
814+
Notes
815+
-----
816+
Assume that the observations are in the columns of the observation array `m`
817+
and let ``f = fweights`` and ``a = aweights`` for brevity. The steps to
818+
compute the weighted covariance are as follows::
819+
820+
>>> import dpnp as np
821+
>>> m = np.arange(10, dtype=np.float32)
822+
>>> f = np.arange(10) * 2
823+
>>> a = np.arange(10) ** 2.0
824+
>>> ddof = 1
825+
>>> w = f * a
826+
>>> v1 = np.sum(w)
827+
>>> v2 = np.sum(w * a)
828+
>>> m -= np.sum(m * w, axis=None, keepdims=True) / v1
829+
>>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)
830+
831+
Note that when ``a == 1``, the normalization factor
832+
``v1 / (v1**2 - ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)``
833+
as it should.
772834
773835
Examples
774836
--------
775837
>>> import dpnp as np
776838
>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T
777-
>>> x.shape
778-
(2, 3)
779-
>>> [i for i in x]
780-
[0, 1, 2, 2, 1, 0]
781-
>>> out = np.cov(x)
782-
>>> out.shape
783-
(2, 2)
784-
>>> [i for i in out]
785-
[1.0, -1.0, -1.0, 1.0]
839+
>>> x
840+
array([[0, 1, 2],
841+
[2, 1, 0]])
842+
843+
Note how :math:`x_0` increases while :math:`x_1` decreases. The covariance
844+
matrix shows this clearly:
845+
846+
>>> np.cov(x)
847+
array([[ 1., -1.],
848+
[-1., 1.]])
849+
850+
Note that element :math:`C_{0,1}`, which shows the correlation between
851+
:math:`x_0` and :math:`x_1`, is negative.
852+
853+
Further, note how `x` and `y` are combined:
854+
855+
>>> x = np.array([-2.1, -1, 4.3])
856+
>>> y = np.array([3, 1.1, 0.12])
857+
>>> X = np.stack((x, y), axis=0)
858+
>>> np.cov(X)
859+
array([[11.71 , -4.286 ], # may vary
860+
[-4.286 , 2.14413333]])
861+
>>> np.cov(x, y)
862+
array([[11.71 , -4.286 ], # may vary
863+
[-4.286 , 2.14413333]])
864+
>>> np.cov(x)
865+
array(11.71)
786866
787867
"""
788868

789-
if not dpnp.is_supported_array_type(m):
790-
pass
791-
elif m.ndim > 2:
792-
pass
793-
elif bias:
794-
pass
795-
elif ddof is not None:
796-
pass
797-
elif fweights is not None:
798-
pass
799-
elif aweights is not None:
800-
pass
869+
arrays = [m]
870+
if y is not None:
871+
arrays.append(y)
872+
dpnp.check_supported_arrays_type(*arrays)
873+
874+
if m.ndim > 2:
875+
raise ValueError("m has more than 2 dimensions")
876+
877+
if y is not None:
878+
if y.ndim > 2:
879+
raise ValueError("y has more than 2 dimensions")
880+
881+
if ddof is not None:
882+
if not isinstance(ddof, int):
883+
raise ValueError("ddof must be integer")
801884
else:
802-
return dpnp_cov(m, y=y, rowvar=rowvar, dtype=dtype)
885+
ddof = 0 if bias else 1
886+
887+
def_float = dpnp.default_float_type(m.sycl_queue)
888+
if dtype is None:
889+
dtype = dpnp.result_type(*arrays, def_float)
890+
891+
if fweights is not None:
892+
dpnp.check_supported_arrays_type(fweights)
893+
if not dpnp.issubdtype(fweights.dtype, numpy.integer):
894+
raise TypeError("fweights must be integer")
895+
896+
if fweights.ndim > 1:
897+
raise ValueError("cannot handle multidimensional fweights")
898+
899+
fweights = dpnp.astype(fweights, dtype=def_float)
900+
901+
if aweights is not None:
902+
dpnp.check_supported_arrays_type(aweights)
903+
if aweights.ndim > 1:
904+
raise ValueError("cannot handle multidimensional aweights")
905+
906+
aweights = dpnp.astype(aweights, dtype=def_float)
803907

804-
return call_origin(
805-
numpy.cov, m, y, rowvar, bias, ddof, fweights, aweights, dtype=dtype
908+
return dpnp_cov(
909+
m,
910+
y=y,
911+
rowvar=rowvar,
912+
ddof=ddof,
913+
dtype=dtype,
914+
fweights=fweights,
915+
aweights=aweights,
806916
)
807917

808918

dpnp/dpnp_utils/dpnp_utils_statistics.py

Lines changed: 59 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232

3333
import dpnp
3434
from dpnp.dpnp_array import dpnp_array
35-
from dpnp.dpnp_utils import get_usm_allocations, map_dtype_to_device
3635

3736
__all__ = ["dpnp_cov", "dpnp_median"]
3837

@@ -119,66 +118,77 @@ def _flatten_array_along_axes(a, axes_to_flatten, overwrite_input):
119118
return a_flatten, overwrite_input
120119

121120

122-
def dpnp_cov(m, y=None, rowvar=True, dtype=None):
121+
def dpnp_cov(
122+
m, y=None, rowvar=True, ddof=1, dtype=None, fweights=None, aweights=None
123+
):
123124
"""
124-
dpnp_cov(m, y=None, rowvar=True, dtype=None)
125-
126125
Estimate a covariance matrix based on passed data.
127-
No support for given weights is provided now.
128126
129-
The implementation is done through existing dpnp and dpctl methods
130-
instead of separate function call of dpnp backend.
127+
The implementation is done through existing dpnp functions.
131128
132129
"""
133130

134-
def _get_2dmin_array(x, dtype):
135-
"""
136-
Transform an input array to a form required for building a covariance matrix.
137-
138-
If applicable, it reshapes the input array to have 2 dimensions or greater.
139-
If applicable, it transposes the input array when 'rowvar' is False.
140-
It casts to another dtype, if the input array differs from requested one.
141-
142-
"""
143-
if x.ndim == 0:
144-
x = x.reshape((1, 1))
145-
elif x.ndim == 1:
146-
x = x[dpnp.newaxis, :]
147-
148-
if not rowvar and x.ndim != 1:
149-
x = x.T
150-
151-
if x.dtype != dtype:
152-
x = dpnp.astype(x, dtype)
153-
return x
131+
# need to create a copy of input, since it will be modified in-place
132+
x = dpnp.array(m, ndmin=2, dtype=dtype)
133+
if not rowvar and m.ndim != 1:
134+
x = x.T
154135

155-
# input arrays must follow CFD paradigm
156-
_, queue = get_usm_allocations((m,) if y is None else (m, y))
136+
if x.shape[0] == 0:
137+
return dpnp.empty_like(
138+
x, shape=(0, 0), dtype=dpnp.default_float_type(m.sycl_queue)
139+
)
157140

158-
# calculate a type of result array if not passed explicitly
159-
if dtype is None:
160-
dtypes = [m.dtype, dpnp.default_float_type(sycl_queue=queue)]
161-
if y is not None:
162-
dtypes.append(y.dtype)
163-
dtype = dpnp.result_type(*dtypes)
164-
# TODO: remove when dpctl.result_type() is returned dtype based on fp64
165-
dtype = map_dtype_to_device(dtype, queue.sycl_device)
166-
167-
X = _get_2dmin_array(m, dtype)
168141
if y is not None:
169-
y = _get_2dmin_array(y, dtype)
170-
171-
X = dpnp.concatenate((X, y), axis=0)
172-
173-
avg = X.mean(axis=1)
142+
y_ndim = y.ndim
143+
y = dpnp.array(y, copy=None, ndmin=2, dtype=dtype)
144+
if not rowvar and y_ndim != 1:
145+
y = y.T
146+
x = dpnp.concatenate((x, y), axis=0)
147+
148+
# get the product of frequencies and weights
149+
w = None
150+
if fweights is not None:
151+
if fweights.shape[0] != x.shape[1]:
152+
raise ValueError("incompatible numbers of samples and fweights")
153+
154+
w = fweights
155+
156+
if aweights is not None:
157+
if aweights.shape[0] != x.shape[1]:
158+
raise ValueError("incompatible numbers of samples and aweights")
159+
160+
if w is None:
161+
w = aweights
162+
else:
163+
w *= aweights
164+
165+
avg, w_sum = dpnp.average(x, axis=1, weights=w, returned=True)
166+
w_sum = w_sum[0]
167+
168+
# determine the normalization
169+
if w is None:
170+
fact = x.shape[1] - ddof
171+
elif ddof == 0:
172+
fact = w_sum
173+
elif aweights is None:
174+
fact = w_sum - ddof
175+
else:
176+
fact = w_sum - ddof * dpnp.sum(w * aweights) / w_sum
174177

175-
fact = X.shape[1] - 1
176-
X -= avg[:, None]
178+
if fact <= 0:
179+
warnings.warn(
180+
"Degrees of freedom <= 0 for slice", RuntimeWarning, stacklevel=2
181+
)
182+
fact = 0.0
177183

178-
c = dpnp.dot(X, X.T.conj())
179-
c *= 1 / fact if fact != 0 else dpnp.nan
184+
x -= avg[:, None]
185+
if w is None:
186+
x_t = x.T
187+
else:
188+
x_t = (x * w).T
180189

181-
return dpnp.squeeze(c)
190+
c = dpnp.dot(x, x_t.conj()) / fact
191+
return c.squeeze()
182192

183193

184194
def dpnp_median(

0 commit comments

Comments
 (0)