@@ -5,7 +5,7 @@ SLEP006: Metadata Routing
5
5
=========================
6
6
7
7
:Author: Joel Nothman, Adrin Jalali, Alex Gramfort, Thomas J. Fan
8
- :Status: Under Review
8
+ :Status: Accepted
9
9
:Type: Standards Track
10
10
:Created: 2019-03-07
11
11
@@ -56,18 +56,25 @@ This SLEP proposes to add
56
56
57
57
* `get_metadata_routing ` to all **consumers ** and **routers **
58
58
(i.e. all estimators, scorers, and splitters supporting this API)
59
- * `*_requests ` to consumers (including estimators, scorers, and CV splitters),
60
- where `* ` is a method that requires metadata. (e.g. ` fit_requests `,
61
- `score_requests `, `transform_requests `, etc.)
59
+ * `set_*_request ` to consumers (including estimators, scorers, and CV
60
+ splitters), where `* ` is a method that requires metadata. (e.g.
61
+ `set_fit_request `, `set_score_request `, ` set_transform_request `, etc.)
62
62
63
- For example, `fit_requests ` configures an estimator to request metadata::
63
+ For example, `set_fit_request ` configures an estimator to request metadata::
64
64
65
- >>> log_reg = LogisticRegression().fit_requests (sample_weight=True)
65
+ >>> log_reg = LogisticRegression().set_fit_request (sample_weight=True)
66
66
67
67
`get_metadata_routing ` are used by **routers ** to inspect the metadata needed
68
68
by **consumers **. `get_metadata_routing ` returns a `MetadataRouter ` or a
69
- `MetadataRequest ` object that stores and handles metadata routing. See the
70
- draft implementation for more implementation details.
69
+ `MetadataRequest ` object that stores and handles metadata routing.
70
+ `get_metadata_routing ` returns enough information for a router to know what
71
+ metadata is requested, and whether the metadata is sample aligned or not. See
72
+ the draft implementation for more implementation details.
73
+
74
+ Note that in the core library nothing is requested by default, except
75
+ ``groups `` in ``Group*CV `` objects which request the ``groups `` metadata. At
76
+ the time of writing this proposal, all metadata requested in the core library
77
+ are sample aligned.
71
78
72
79
Detailed description
73
80
--------------------
@@ -85,11 +92,11 @@ requests `groups` by default::
85
92
86
93
>>> weighted_acc = make_scorer(accuracy_score).score_request(sample_weight=True)
87
94
>>> log_reg = (LogisticRegressionCV(cv=GroupKFold(), scoring=weighted_acc)
88
- ... .fit_requests (sample_weight=True))
95
+ ... .set_fit_request (sample_weight=True))
89
96
>>> cv_results = cross_validate(
90
97
... log_reg, X, y,
91
98
... cv=GroupKFold(),
92
- ... props ={"sample_weight": my_weights, "groups": my_groups},
99
+ ... metadata ={"sample_weight": my_weights, "groups": my_groups},
93
100
... scoring=weighted_acc)
94
101
95
102
To support unweighted fitting and weighted scoring, metadata is set to `False `
@@ -100,7 +107,7 @@ in `fit_request`::
100
107
>>> cross_validate(
101
108
... log_reg, X, y,
102
109
... cv=GroupKFold(),
103
- ... props ={'sample_weight': weights, 'groups': groups},
110
+ ... metadata ={'sample_weight': weights, 'groups': groups},
104
111
... scoring=weighted_acc)
105
112
106
113
Unweighted Feature selection
@@ -110,7 +117,7 @@ Unweighted Feature selection
110
117
will _not_ be routed weights::
111
118
112
119
>>> log_reg = (LogisticRegressionCV(cv=GroupKFold(), scoring=weighted_acc)
113
- ... .fit_requests (sample_weight=True))
120
+ ... .set_fit_request (sample_weight=True))
114
121
>>> sel = SelectKBest(k=2)
115
122
>>> pipe = make_pipeline(sel, log_reg)
116
123
>>> pipe.fit(X, y, sample_weight=weights, groups=groups)
@@ -128,13 +135,13 @@ this example, `scoring_weight` is passed to the scoring and `fitting_weight`
128
135
is passed to `LogisticRegressionCV `::
129
136
130
137
>>> weighted_acc = (make_scorer(accuracy_score)
131
- ... .score_requests (sample_weight="scoring_weight"))
138
+ ... .set_score_request (sample_weight="scoring_weight"))
132
139
>>> log_reg = (LogisticRegressionCV(cv=GroupKFold(), scoring=weighted_acc)
133
- ... .fit_requests (sample_weight="fitting_weight"))
140
+ ... .set_fit_request (sample_weight="fitting_weight"))
134
141
>>> cv_results = cross_validate(
135
142
... log_reg, X, y,
136
143
... cv=GroupKFold(),
137
- ... props ={"scoring_weight": my_weights,
144
+ ... metadata ={"scoring_weight": my_weights,
138
145
... "fitting_weight": my_other_weights,
139
146
... "groups": my_groups},
140
147
... scoring=weighted_acc)
@@ -155,7 +162,7 @@ and the inner random search::
155
162
>>> cv_results = cross_validate(
156
163
... log_reg, X, y,
157
164
... cv=GroupKFold(),
158
- ... props ={"groups": my_groups})
165
+ ... metadata ={"groups": my_groups})
159
166
160
167
Implementation
161
168
--------------
@@ -164,7 +171,7 @@ This SLEP has a draft implementation at :pr:`22083` by :user:`adrinjalali`. The
164
171
implementation provides developer utilities that are used by scikit-learn and
165
172
available to third-party estimators for adopting this SLEP. Specifically, the
166
173
draft implementation makes it easier to define `get_metadata_routing ` and
167
- `*_requests ` for **consumers ** and **routers **.
174
+ `set_*_request ` for **consumers ** and **routers **.
168
175
169
176
Backward compatibility
170
177
----------------------
@@ -184,7 +191,9 @@ a deprecation warning is raised::
184
191
To avoid the warning, one would need to specify the request in
185
192
`LogisticRegression `::
186
193
187
- >>> grid = GridSearchCV(LogisticRegression().fit_requests(sample_weight=True), ...)
194
+ >>> grid = GridSearchCV(
195
+ ... LogisticRegression().set_fit_request(sample_weight=True), ...
196
+ ... )
188
197
>>> grid.fit(X, y, sample_weight=sw)
189
198
190
199
Meta-estimators such as `GridSearchCV ` will check which metadata is requested,
@@ -200,23 +209,26 @@ not configured to request it::
200
209
>>> # `grid.fit`.
201
210
>>> grid.fit(X, y, sample_weight=sw)
202
211
203
- To avoid the error, `LogisticRegression ` must specify its metadata request by calling
204
- ` fit_requests `::
212
+ To avoid the error, `LogisticRegression ` must specify its metadata request by
213
+ calling ` set_fit_request `::
205
214
206
215
>>> # Request sample weights
207
- >>> log_reg_weights = LogisticRegression().fit_requests (sample_weight=True)
216
+ >>> log_reg_weights = LogisticRegression().set_fit_request (sample_weight=True)
208
217
>>> grid = GridSearchCV(log_reg_with_weights, ...)
209
218
>>> grid.fit(X, y, sample_weight=sw)
210
219
>>>
211
220
>>> # Do not request sample_weights
212
- >>> log_reg_no_weights = LogisticRegression().fit_requests (sample_weight=False)
221
+ >>> log_reg_no_weights = LogisticRegression().set_fit_request (sample_weight=False)
213
222
>>> grid = GridSearchCV(log_reg_no_weights, ...)
214
223
>>> grid.fit(X, y, sample_weight=sw)
215
224
216
- Third-party estimators will need to adopt this SLEP in order to support metadata
217
- routing, while the dunder syntax is deprecated. Our implementation will provide
218
- developer APIs to trigger warnings and errors as described above to help with
219
- adopting this SLEP.
225
+ Note that a meta-estimator will raise an error if the user passes a metadata
226
+ which is not requested by any of the child objects of the meta-estimator.
227
+
228
+ Third-party estimators will need to adopt this SLEP in order to support
229
+ metadata routing, while the dunder syntax is deprecated. Our implementation
230
+ will provide developer APIs to trigger warnings and errors as described above
231
+ to help with adopting this SLEP.
220
232
221
233
Alternatives
222
234
------------
0 commit comments