Skip to content

SLEP013: n_features_out_ #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 19, 2020

Conversation

adrinjalali
Copy link
Member

n_features_out_ is closely related to n_features_in_, which is accepted and close to being merged.

scikit-learn/scikit-learn#14241 by @amueller is the implementation of this SLEP.

This is short, but it makes me feel I'm missing some obvious important issue...

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adrin.

There should be a link to the n_features_in SLEP somewhere, for reference

What's the interaction with n_features_in_ in pipelines?
Let's say after fit, step 2 of a pipeline has inferred n_features_in_ to be 3 but n_features_out_ of step 1 claims it's 4. One of them is wrong, do we error?

Not convinced about the Mixins yet but I'll check out the discussions on the PR

########

The proposed solution is for the ``n_features_out_`` attribute to be set once a
call to ``fit`` is done. In most cases the value of ``n_features_out_`` is the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in most cases": is that true? I can think of many transformers where this doesn't apply, like StandardScaler, LogTransformer, OneHotEncoder etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the most cases feeling from looking and andy's PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change it for In some cases?

This seems to apply to the dimensionality reduction estimators and the projectors, but not to most imputers / pre-processors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nitpicky ;) How about "in many cases"? But any of the three variants seem fine to me. Usually it corresponds to some attribute, the scalers often have scale for example and you're saying e.g. not i.e. so it's only an example...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this to be nitpicky. "most" and "some" have significantly different meanings and they do not imply the same things.
Either "most" applies, and in that case, I'm only trying to understand what I'm missing. Or it does not, and I'm suggesting an alternative that would not cause the confusion I'm having.

I like "many".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever the reason, such review comments make me want to just close the PR. They may be constructive from an objective point of view, but they're not when it comes to my feelings when working on a PR, and I know I'm not alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to hear that, really. It would help me if you could tell me what you would have expected from me here. I was making a comment where I tried to i) better understand the SLEP and ii) potentially improve the document. Would you prefer me not making such comments? Or prefix it to make my intentions clearer? Something else?

Meanwhile, and this is a genuine question, not a taunt: is it possible that you've been reading more than there is into my comments? Occurrences like this one or e.g. #17 (comment) give me the impression that this might be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, none of your comments on their own would be somethind I'd mind even the slightest. It's the accumulation of them which makes me break.

I may fail at this, but what I try to do reviewing is to focus on the main issues first, and do my best to understand what the contributor wants to say/do. I always first assume it's on me if I have a hard time understanding somebody's contribution, unless it passes a certain threshold where I may be kinda certain that the work would be very hard to be understood by most people.

From one perspective, you could say that I'm okay with contributions as long as they improve the status quo, even if they're not perfect.

And when I see reviews on my work. I'm happy to discuss the main ideas, but when it comes to the details such as word choices, especially on SLEPs, my position is "who cares" lol. It's a document for us to come to a consensus, and that's the main purpose of it. As long as it achieves that purpose, it doesn't matter if I include a funny gif in it or if I write poetry. For instance, the comment you refer to, was something which comes from me playing with the SLEP in my mind. When I'm writing it, I bargain with it, it's something I talk to, and I give life to it. From that perspective, of course it could imagine something, cause I'm not the one saying it, it's the SLEP suggesting something :P Or when I use the word lubricate, of course I know the connotation of the word, but I know (or at least in my experience) that most people kinda laugh when they see that, and yet they understand what the text means; again that's me playing around and having fun writing the SLEP.

When it comes to language issues, I know I'm not a native speaker and I'm pretty sure if a native speaker goes through any text I write, they'll find tons of minor issues. Take the example of propositions, I'm sure I have a ton of mistakes just in this single comment. But as long as people understand, we're okay with it as a community, for the sake of not exhausting people's patience; or at least that's what I'd like to believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply.

I think I understand your point of view about good-enough PRs that aren't absolutely perfect, and for the most part, I share this mindset.

My approach differs in minor aspects, which I'll try to explain. Basically my line of thought is: if it's likely to improve the document and if it takes a few seconds to address, it's worth making a comment. It may save time/work for future reviewers and readers (this is even more true when it's about the docs where a small clarification can save many, many hours when you add up all the users). Of course, I don't apply the same criteria depending on who I'm reviewing: beginners and non-paid contributors don't have the same resources.

Now, we might disagree on what defines an actual improvement, and that's fine. I often find myself thinking "meh, I don't get it but ok" on some comments being made to my PRs, but I address anyway. And I'm probably more picky than the average reviewer, I'll definitely give you that, especially when it comes to the docs. I believe that typos and language clarifications are worth pointing out.

I just looked back at the comments that I made on this PR. One is a legit comment for something that was missing (#29 (comment)), one is a nitpick (#29 (comment)), one is a suggestion to something I didn't get at first and that you clarified (#29 (comment)), and the rest are questions, sometimes followed by a suggestion. These suggestions all follow my approach detailed above: they're low effort to address, and they're likely to improve the doc (IMHO), hopefully being beneficial to others.

I believe that what didn't help is that I was reviewing your PRs only a few minutes after they were posted. It might have given you the impression that I was eager to criticize your work, and maybe that I was on a personal mission. As I said on my gitter message from last week: that's not the case. I was reviewing early because I happened to be available at the time, and because I believe that timely feedback is good for productivity.

In any case, I'm not saying you're wrong for feeling the way you do. I'm just trying to share my perspective and clarify my intents when I review.

@adrinjalali
Copy link
Member Author

What's the interaction with n_features_in_ in pipelines?
Let's say after fit, step 2 of a pipeline has inferred n_features_in_ to be 3 but n_features_out_ of step 1 claims it's 4. One of them is wrong, do we error?

I think we do either try to enforce consistency of that between fit and transform in estimators, or some operation raises an error. Checking the consistency could be added, not sure if we should put that in this SLEP. To me it's kind of a next step.

@NicolasHug
Copy link
Member

Checking the consistency could be added, not sure if we should put that in this SLEP. To me it's kind of a next step.

OK Agreed. That can get tricky because the discrepancy might come from a user-input issue, or from a wrong computation on our side in which case we don't want to error

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @adrinjalali

########

The proposed solution is for the ``n_features_out_`` attribute to be set once a
call to ``fit`` is done. In most cases the value of ``n_features_out_`` is the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nitpicky ;) How about "in many cases"? But any of the three variants seem fine to me. Usually it corresponds to some attribute, the scalers often have scale for example and you're saying e.g. not i.e. so it's only an example...

@adrinjalali
Copy link
Member Author

Seems like this one is going to be an easy one? I almost can't believe it :P should we merge and take a vote then?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 vs 13 but LGTM otherwise, feel free to merge when numbering is fixed

.. _slep_013:

======================================
SLEP012: ``n_features_out_`` attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLEP012 but ref and file say 13 ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, forgot this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(PR title too BTW)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@adrinjalali adrinjalali changed the title SLEP012: n_features_out_ SLEP013: n_features_out_ Feb 19, 2020
@NicolasHug NicolasHug merged commit 8445d6c into scikit-learn:master Feb 19, 2020
@NicolasHug
Copy link
Member

Thanks Adrin. Wanna call for a vote? We can ask for votes to be made on a PR / issue to avoid flooding the mailing list, as suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants