-
-
Notifications
You must be signed in to change notification settings - Fork 34
SLEP013: n_features_out_ #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
NicolasHug
merged 9 commits into
scikit-learn:master
from
adrinjalali:slep12/n_feature_out_
Feb 19, 2020
Merged
Changes from 5 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
09e1275
SLEP012: n_features_out_
adrinjalali 5d9f404
mention n_features_in_
adrinjalali 2d5e88f
add link to n_features_in_ and mention the new test
adrinjalali 01bab58
more Nicolas's changes
adrinjalali da14bcd
most -> some
adrinjalali a4a84eb
most -> many
adrinjalali 58354ba
rename to slep13
adrinjalali 852d0a8
merge upstream/master
adrinjalali 3e7b460
missed slep12->slep13
adrinjalali File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,3 +55,6 @@ docs/_build/ | |
|
||
# PyBuilder | ||
target/ | ||
|
||
# Editors | ||
.vscode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
.. _slep_012: | ||
|
||
====================================== | ||
SLEP012: ``n_features_out_`` attribute | ||
====================================== | ||
|
||
:Author: Adrin Jalali | ||
:Status: Under Review | ||
:Type: Standards Track | ||
:Created: 2020-02-12 | ||
|
||
Abstract | ||
######## | ||
|
||
This SLEP proposes the introduction of a public ``n_features_out_`` attribute | ||
for most transformers (where relevant). | ||
|
||
Motivation | ||
########## | ||
|
||
Knowing the number of features that a transformer outputs is useful for | ||
inspection purposes. This is in conjunction with `*SLEP010: ``n_features_in_``* | ||
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html>`_. | ||
|
||
Solution | ||
######## | ||
|
||
The proposed solution is for the ``n_features_out_`` attribute to be set once a | ||
call to ``fit`` is done. In most cases the value of ``n_features_out_`` is the | ||
same as some other attribute stored in the transformer, *e.g.* | ||
``n_components_``, and in these cases a ``Mixin`` such as a ``ComponentsMixin`` | ||
can delegate ``n_features_out_`` to those attributes. | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Testing | ||
------- | ||
|
||
A test to the common tests is added to ensure the presence of the attribute or | ||
property after calling ``fit``. | ||
|
||
Considerations | ||
############## | ||
|
||
The main consideration is that the addition of the common test means that | ||
existing estimators in downstream libraries will not pass our test suite, | ||
adrinjalali marked this conversation as resolved.
Show resolved
Hide resolved
|
||
unless the estimators also have the ``n_features_out_`` attribute. | ||
|
||
The newly introduced checks will only raise a warning instead of an exception | ||
for the next 2 releases, so this will give more time for downstream packages | ||
to adjust. | ||
|
||
There are other minor considerations: | ||
|
||
- In some meta-estimators, this is delegated to the | ||
sub-estimator(s). The ``n_features_out_`` attribute of the meta-estimator is | ||
thus explicitly set to that of the sub-estimator, either via a ``@property``, | ||
or directly in ``fit()``. | ||
- Some transformers such as ``FunctionTransformer`` may not know the number | ||
of output features since arbitrary arrays can be passed to `transform`. In | ||
such cases ``n_features_out_`` is set to ``None``. | ||
|
||
Copyright | ||
--------- | ||
|
||
This document has been placed in the public domain. [1]_ | ||
|
||
References and Footnotes | ||
------------------------ | ||
|
||
.. [1] _Open Publication License: https://www.opencontent.org/openpub/ | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
SLEPs under review | ||
================== | ||
|
||
No SLEP is currently under review. | ||
.. No SLEP is currently under review. | ||
|
||
.. Uncomment below when a SLEP is under review | ||
|
||
.. .. toctree:: | ||
.. :maxdepth: 1 | ||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
.. slepXXX/proposal | ||
slep012/proposal |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in most cases": is that true? I can think of many transformers where this doesn't apply, like StandardScaler, LogTransformer, OneHotEncoder etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the most cases feeling from looking and andy's PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change it for In some cases?
This seems to apply to the dimensionality reduction estimators and the projectors, but not to most imputers / pre-processors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems nitpicky ;) How about "in many cases"? But any of the three variants seem fine to me. Usually it corresponds to some attribute, the scalers often have
scale
for example and you're saying e.g. not i.e. so it's only an example...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this to be nitpicky. "most" and "some" have significantly different meanings and they do not imply the same things.
Either "most" applies, and in that case, I'm only trying to understand what I'm missing. Or it does not, and I'm suggesting an alternative that would not cause the confusion I'm having.
I like "many".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever the reason, such review comments make me want to just close the PR. They may be constructive from an objective point of view, but they're not when it comes to my feelings when working on a PR, and I know I'm not alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to hear that, really. It would help me if you could tell me what you would have expected from me here. I was making a comment where I tried to i) better understand the SLEP and ii) potentially improve the document. Would you prefer me not making such comments? Or prefix it to make my intentions clearer? Something else?
Meanwhile, and this is a genuine question, not a taunt: is it possible that you've been reading more than there is into my comments? Occurrences like this one or e.g. #17 (comment) give me the impression that this might be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, none of your comments on their own would be somethind I'd mind even the slightest. It's the accumulation of them which makes me break.
I may fail at this, but what I try to do reviewing is to focus on the main issues first, and do my best to understand what the contributor wants to say/do. I always first assume it's on me if I have a hard time understanding somebody's contribution, unless it passes a certain threshold where I may be kinda certain that the work would be very hard to be understood by most people.
From one perspective, you could say that I'm okay with contributions as long as they improve the status quo, even if they're not perfect.
And when I see reviews on my work. I'm happy to discuss the main ideas, but when it comes to the details such as word choices, especially on SLEPs, my position is "who cares" lol. It's a document for us to come to a consensus, and that's the main purpose of it. As long as it achieves that purpose, it doesn't matter if I include a funny gif in it or if I write poetry. For instance, the comment you refer to, was something which comes from me playing with the SLEP in my mind. When I'm writing it, I bargain with it, it's something I talk to, and I give life to it. From that perspective, of course it could imagine something, cause I'm not the one saying it, it's the SLEP suggesting something :P Or when I use the word lubricate, of course I know the connotation of the word, but I know (or at least in my experience) that most people kinda laugh when they see that, and yet they understand what the text means; again that's me playing around and having fun writing the SLEP.
When it comes to language issues, I know I'm not a native speaker and I'm pretty sure if a native speaker goes through any text I write, they'll find tons of minor issues. Take the example of propositions, I'm sure I have a ton of mistakes just in this single comment. But as long as people understand, we're okay with it as a community, for the sake of not exhausting people's patience; or at least that's what I'd like to believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply.
I think I understand your point of view about good-enough PRs that aren't absolutely perfect, and for the most part, I share this mindset.
My approach differs in minor aspects, which I'll try to explain. Basically my line of thought is: if it's likely to improve the document and if it takes a few seconds to address, it's worth making a comment. It may save time/work for future reviewers and readers (this is even more true when it's about the docs where a small clarification can save many, many hours when you add up all the users). Of course, I don't apply the same criteria depending on who I'm reviewing: beginners and non-paid contributors don't have the same resources.
Now, we might disagree on what defines an actual improvement, and that's fine. I often find myself thinking "meh, I don't get it but ok" on some comments being made to my PRs, but I address anyway. And I'm probably more picky than the average reviewer, I'll definitely give you that, especially when it comes to the docs. I believe that typos and language clarifications are worth pointing out.
I just looked back at the comments that I made on this PR. One is a legit comment for something that was missing (#29 (comment)), one is a nitpick (#29 (comment)), one is a suggestion to something I didn't get at first and that you clarified (#29 (comment)), and the rest are questions, sometimes followed by a suggestion. These suggestions all follow my approach detailed above: they're low effort to address, and they're likely to improve the doc (IMHO), hopefully being beneficial to others.
I believe that what didn't help is that I was reviewing your PRs only a few minutes after they were posted. It might have given you the impression that I was eager to criticize your work, and maybe that I was on a personal mission. As I said on my gitter message from last week: that's not the case. I was reviewing early because I happened to be available at the time, and because I believe that timely feedback is good for productivity.
In any case, I'm not saying you're wrong for feeling the way you do. I'm just trying to share my perspective and clarify my intents when I review.