Skip to content

SLEP013: n_features_out_ #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 19, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,6 @@ docs/_build/

# PyBuilder
target/

# Editors
.vscode
71 changes: 71 additions & 0 deletions slep012/proposal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
.. _slep_012:

======================================
SLEP012: ``n_features_out_`` attribute
======================================

:Author: Adrin Jalali
:Status: Under Review
:Type: Standards Track
:Created: 2020-02-12

Abstract
########

This SLEP proposes the introduction of a public ``n_features_out_`` attribute
for most transformers (where relevant).

Motivation
##########

Knowing the number of features that a transformer outputs is useful for
inspection purposes. This is in conjunction with `*SLEP010: ``n_features_in_``*
<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html>`_.

Solution
########

The proposed solution is for the ``n_features_out_`` attribute to be set once a
call to ``fit`` is done. In most cases the value of ``n_features_out_`` is the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in most cases": is that true? I can think of many transformers where this doesn't apply, like StandardScaler, LogTransformer, OneHotEncoder etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the most cases feeling from looking and andy's PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change it for In some cases?

This seems to apply to the dimensionality reduction estimators and the projectors, but not to most imputers / pre-processors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nitpicky ;) How about "in many cases"? But any of the three variants seem fine to me. Usually it corresponds to some attribute, the scalers often have scale for example and you're saying e.g. not i.e. so it's only an example...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this to be nitpicky. "most" and "some" have significantly different meanings and they do not imply the same things.
Either "most" applies, and in that case, I'm only trying to understand what I'm missing. Or it does not, and I'm suggesting an alternative that would not cause the confusion I'm having.

I like "many".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever the reason, such review comments make me want to just close the PR. They may be constructive from an objective point of view, but they're not when it comes to my feelings when working on a PR, and I know I'm not alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to hear that, really. It would help me if you could tell me what you would have expected from me here. I was making a comment where I tried to i) better understand the SLEP and ii) potentially improve the document. Would you prefer me not making such comments? Or prefix it to make my intentions clearer? Something else?

Meanwhile, and this is a genuine question, not a taunt: is it possible that you've been reading more than there is into my comments? Occurrences like this one or e.g. #17 (comment) give me the impression that this might be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, none of your comments on their own would be somethind I'd mind even the slightest. It's the accumulation of them which makes me break.

I may fail at this, but what I try to do reviewing is to focus on the main issues first, and do my best to understand what the contributor wants to say/do. I always first assume it's on me if I have a hard time understanding somebody's contribution, unless it passes a certain threshold where I may be kinda certain that the work would be very hard to be understood by most people.

From one perspective, you could say that I'm okay with contributions as long as they improve the status quo, even if they're not perfect.

And when I see reviews on my work. I'm happy to discuss the main ideas, but when it comes to the details such as word choices, especially on SLEPs, my position is "who cares" lol. It's a document for us to come to a consensus, and that's the main purpose of it. As long as it achieves that purpose, it doesn't matter if I include a funny gif in it or if I write poetry. For instance, the comment you refer to, was something which comes from me playing with the SLEP in my mind. When I'm writing it, I bargain with it, it's something I talk to, and I give life to it. From that perspective, of course it could imagine something, cause I'm not the one saying it, it's the SLEP suggesting something :P Or when I use the word lubricate, of course I know the connotation of the word, but I know (or at least in my experience) that most people kinda laugh when they see that, and yet they understand what the text means; again that's me playing around and having fun writing the SLEP.

When it comes to language issues, I know I'm not a native speaker and I'm pretty sure if a native speaker goes through any text I write, they'll find tons of minor issues. Take the example of propositions, I'm sure I have a ton of mistakes just in this single comment. But as long as people understand, we're okay with it as a community, for the sake of not exhausting people's patience; or at least that's what I'd like to believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply.

I think I understand your point of view about good-enough PRs that aren't absolutely perfect, and for the most part, I share this mindset.

My approach differs in minor aspects, which I'll try to explain. Basically my line of thought is: if it's likely to improve the document and if it takes a few seconds to address, it's worth making a comment. It may save time/work for future reviewers and readers (this is even more true when it's about the docs where a small clarification can save many, many hours when you add up all the users). Of course, I don't apply the same criteria depending on who I'm reviewing: beginners and non-paid contributors don't have the same resources.

Now, we might disagree on what defines an actual improvement, and that's fine. I often find myself thinking "meh, I don't get it but ok" on some comments being made to my PRs, but I address anyway. And I'm probably more picky than the average reviewer, I'll definitely give you that, especially when it comes to the docs. I believe that typos and language clarifications are worth pointing out.

I just looked back at the comments that I made on this PR. One is a legit comment for something that was missing (#29 (comment)), one is a nitpick (#29 (comment)), one is a suggestion to something I didn't get at first and that you clarified (#29 (comment)), and the rest are questions, sometimes followed by a suggestion. These suggestions all follow my approach detailed above: they're low effort to address, and they're likely to improve the doc (IMHO), hopefully being beneficial to others.

I believe that what didn't help is that I was reviewing your PRs only a few minutes after they were posted. It might have given you the impression that I was eager to criticize your work, and maybe that I was on a personal mission. As I said on my gitter message from last week: that's not the case. I was reviewing early because I happened to be available at the time, and because I believe that timely feedback is good for productivity.

In any case, I'm not saying you're wrong for feeling the way you do. I'm just trying to share my perspective and clarify my intents when I review.

same as some other attribute stored in the transformer, *e.g.*
``n_components_``, and in these cases a ``Mixin`` such as a ``ComponentsMixin``
can delegate ``n_features_out_`` to those attributes.

Testing
-------

A test to the common tests is added to ensure the presence of the attribute or
property after calling ``fit``.

Considerations
##############

The main consideration is that the addition of the common test means that
existing estimators in downstream libraries will not pass our test suite,
unless the estimators also have the ``n_features_out_`` attribute.

The newly introduced checks will only raise a warning instead of an exception
for the next 2 releases, so this will give more time for downstream packages
to adjust.

There are other minor considerations:

- In some meta-estimators, this is delegated to the
sub-estimator(s). The ``n_features_out_`` attribute of the meta-estimator is
thus explicitly set to that of the sub-estimator, either via a ``@property``,
or directly in ``fit()``.
- Some transformers such as ``FunctionTransformer`` may not know the number
of output features since arbitrary arrays can be passed to `transform`. In
such cases ``n_features_out_`` is set to ``None``.

Copyright
---------

This document has been placed in the public domain. [1]_

References and Footnotes
------------------------

.. [1] _Open Publication License: https://www.opencontent.org/openpub/


8 changes: 4 additions & 4 deletions under_review.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
SLEPs under review
==================

No SLEP is currently under review.
.. No SLEP is currently under review.

.. Uncomment below when a SLEP is under review

.. .. toctree::
.. :maxdepth: 1
.. toctree::
:maxdepth: 1

.. slepXXX/proposal
slep012/proposal