26
26
def cut (x , bins , right = True , labels = None , retbins = False , precision = 3 ,
27
27
include_lowest = False ):
28
28
"""
29
- Return indices of half-open bins to which each value of `x` belongs.
29
+ Return indices of half-open `bins` to which each value of `x` belongs.
30
+
31
+ Use `cut` when you need to segment and sort data values into bins or
32
+ buckets of data. This function is also useful for going from a continuous
33
+ variable to a categorical variable. For example, `cut` could convert ages
34
+ to groups of age ranges.
30
35
31
36
Parameters
32
37
----------
33
38
x : array-like
34
39
Input array to be binned. It has to be 1-dimensional.
35
- bins : int, sequence of scalars, or IntervalIndex
36
- If `bins` is an int, it defines the number of equal-width bins in the
37
- range of `x`. However, in this case, the range of `x` is extended
38
- by .1% on each side to include the min or max values of `x`. If
39
- `bins` is a sequence it defines the bin edges allowing for
40
- non-uniform bin width. No extension of the range of `x` is done in
41
- this case.
42
- right : bool, optional
43
- Indicates whether the bins include the rightmost edge or not. If
44
- right == True (the default), then the bins [1,2,3,4] indicate
40
+ bins : int, sequence of scalars, or pandas.IntervalIndex
41
+ If `bins` is an int, defines the number of equal-width bins in the
42
+ range of `x`. The range of `x` is extended by .1% on each side to
43
+ include the min or max values of `x`.
44
+ If `bins` is a sequence, defines the bin edges allowing for
45
+ non-uniform bin width. No extension of the range of `x` is done.
46
+ right : bool, optional, default 'True'
47
+ Indicates whether the `bins` include the rightmost edge or not. If
48
+ `right == True` (the default), then the `bins` [1,2,3,4] indicate
45
49
(1,2], (2,3], (3,4].
46
- labels : array or boolean, default None
47
- Used as labels for the resulting bins. Must be of the same length as
48
- the resulting bins. If False, return only integer indicators of the
49
- bins.
50
- retbins : bool, optional
51
- Whether to return the bins or not. Can be useful if bins is given
50
+ labels : array or bool, optional
51
+ Used as labels for the resulting ` bins` . Must be of the same length as
52
+ the resulting ` bins` . If False, returns only integer indicators of the
53
+ ` bins` .
54
+ retbins : bool, optional, default 'False'
55
+ Whether to return the ` bins` or not. Useful when ` bins` is provided
52
56
as a scalar.
53
- precision : int, optional
54
- The precision at which to store and display the bins labels
55
- include_lowest : bool, optional
57
+ precision : int, optional, default '3'
58
+ The precision at which to store and display the ` bins` labels.
59
+ include_lowest : bool, optional, default 'False'
56
60
Whether the first interval should be left-inclusive or not.
57
61
58
62
Returns
59
63
-------
60
- out : Categorical or Series or array of integers if labels is False
61
- The return type (Categorical or Series) depends on the input: a Series
62
- of type category if input is a Series else Categorical. Bins are
63
- represented as categories when categorical data is returned.
64
- bins : ndarray of floats
65
- Returned only if `retbins` is True.
64
+ out : pandas.Categorical or Series, or array of int if `labels` is 'False'
65
+ The return type depends on the input.
66
+ If the input is a Series, a Series of type category is returned.
67
+ Else - pandas.Categorical is returned. `Bins` are represented as
68
+ categories when categorical data is returned.
69
+ bins : numpy.ndarray of floats
70
+ Returned only if `retbins` is 'True'.
71
+
72
+ See Also
73
+ --------
74
+ qcut : Discretize variable into equal-sized buckets based on rank
75
+ or based on sample quantiles.
76
+ pandas.Categorical : Represents a categorical variable in
77
+ classic R / S-plus fashion.
78
+ Series : One-dimensional ndarray with axis labels (including time series).
79
+ pandas.IntervalIndex : Immutable Index implementing an ordered,
80
+ sliceable set. IntervalIndex represents an Index of intervals that
81
+ are all closed on the same side.
66
82
67
83
Notes
68
84
-----
69
- The `cut` function can be useful for going from a continuous variable to
70
- a categorical variable. For example, `cut` could convert ages to groups
71
- of age ranges.
72
-
73
- Any NA values will be NA in the result. Out of bounds values will be NA in
74
- the resulting Categorical object
75
-
85
+ Any NA values will be NA in the result. Out of bounds values will be NA in
86
+ the resulting pandas.Categorical object.
76
87
77
88
Examples
78
89
--------
@@ -88,7 +99,7 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
88
99
Categories (3, object): [good < medium < bad]
89
100
90
101
>>> pd.cut(np.ones(5), 4, labels=False)
91
- array([1, 1, 1, 1, 1])
102
+ array([1, 1, 1, 1, 1], dtype=int64 )
92
103
"""
93
104
# NOTE: this binning code is changed a bit from histogram for var(x) == 0
94
105
0 commit comments