Skip to content

Clarify use of contractions in diagnostic messages #116803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AaronBallman
Copy link
Collaborator

This dissuades contributors from using contractions when writing diagnostic wording for Clang. Contractions should be avoided because of the potential for visual confusion with single quoting syntactic constructs and because they can be harder to understand for non-native English speakers.

This dissuades contributors from using contractions when writing
diagnostic wording for Clang. Contractions should be avoided because of
the potential for visual confusion with single quoting syntactic
constructs and because they can be harder to understand for non-native
English speakers.
@AaronBallman AaronBallman added documentation clang Clang issues not falling into any other category clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer labels Nov 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2024

@llvm/pr-subscribers-clang

Author: Aaron Ballman (AaronBallman)

Changes

This dissuades contributors from using contractions when writing diagnostic wording for Clang. Contractions should be avoided because of the potential for visual confusion with single quoting syntactic constructs and because they can be harder to understand for non-native English speakers.


Full diff: https://github.com/llvm/llvm-project/pull/116803.diff

1 Files Affected:

  • (modified) clang/docs/InternalsManual.rst (+4)
diff --git a/clang/docs/InternalsManual.rst b/clang/docs/InternalsManual.rst
index f189cb4e6a2ac3..39d389b816f129 100644
--- a/clang/docs/InternalsManual.rst
+++ b/clang/docs/InternalsManual.rst
@@ -160,6 +160,10 @@ wording a diagnostic.
   named in a diagnostic message. e.g., prefer wording like ``'this' pointer
   cannot be null in well-defined C++ code`` over wording like ``this pointer
   cannot be null in well-defined C++ code``.
+* Prefer diagnostic wording without contractions whenever possible. The single
+  quote in a contraction can be visually distracting due to its use with
+  syntactic constructs and contractions can be harder to understand for non-
+  native English speakers.
 
 The Format String
 ^^^^^^^^^^^^^^^^^

* Prefer diagnostic wording without contractions whenever possible. The single
quote in a contraction can be visually distracting due to its use with
syntactic constructs and contractions can be harder to understand for non-
native English speakers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add the special case of cannot vs can not? Or is that already here somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the special case of cannot vs can not?

As in?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot is a formally 'correct' way of saying it, and we just had a PR committed that changed our uses.

Copy link
Member

@Sirraide Sirraide Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot is a formally 'correct' way of saying it

Well, ‘cannot’ and ‘can not’ mean different things, and yeah, usually, ‘cannot’ is what you want. I don’t think ‘can not’ would be too common in a diagnostic because those are typically not about something you’re allowed not to do...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a native speaker, (and looking in a dictionary), they are identical meaning (same as can't).

We DID have plenty of can not in both comments and diagnostics, but they were recently changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are identical meaning (same as can't).

So ‘cannot’ is identical to ‘can’t’, yes. ‘can not’ is a bit different in that the ‘can’ itself isn’t negated, but rather, the verb after it is, e.g. ‘I can not do that’ == ‘I am able / allowed to not do that’—which, arguably, this doesn’t come up too often because it’s a bit of an unusual thing to say in most circumstances, but if that meaning is intended, you’re supposed to write ‘can not’ and not ‘cannot’ (of course, from a descriptive point of view, one could argue that if people keep mistaking one for the other, there isn’t much of a point of differentiating the two, but I’m not sure we’re quite there yet).

Sorry for the rambling, but I like linguistics too much to be able to stop myself whenever topics like these come up. ;Þ

We DID have plenty of can not in both comments and diagnostics, but they were recently changed.

I definitely believe that most of those should probably have been ‘cannot’, yeah. ‘can not’ is often a typo for ‘cannot’, but it is a valid syntactic construct—provided that that’s what the writer actually intended to write, of course.

* Prefer diagnostic wording without contractions whenever possible. The single
quote in a contraction can be visually distracting due to its use with
syntactic constructs and contractions can be harder to understand for non-
native English speakers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are also much easier to 'mis'/'mistake', so IDK if we want to point that out?

@Sirraide
Copy link
Member

Sirraide commented Nov 19, 2024

I don’t have a very strong opinion on this if the consensus is that this is a change for the better, but as someone with a background in linguistics, I’d argue that this seems like a weird thing to discourage—I don’t think the single quote is really distracting at all if it occurs in a common contraction (e.g. isn’t, aren’t, don’t, doesn’t, etc.), because you simply parse that as one word. Of course, I don’t think we should start writing ‘you’dn’t’ve’ or anything absurd like that, but I don’t think there’s anything wrong w/ normal contractions.

they can be harder to understand for non-native English speakers.

I also don’t think this is true: simple contractions are one of the first things we teach people, and I’ve yet to meet someone whose first language isn’t English and doesn’t know what e.g. ‘isn’t’ is supposed to mean. Is there any actual precedent for anyone being confused about this?

@AaronBallman
Copy link
Collaborator Author

I don’t have a very strong opinion on this if the consensus is that this is a change for the better, but as someone with a background in linguistics, I’d argue that this seems like a weird thing to discourage—I don’t think the single quote is really distracting at all if it occurs in a common contraction (e.g. isn’t, aren’t, don’t, doesn’t, etc.), because you simply parse that as one word. Of course, I don’t think we should start writing ‘you’dn’t’ve’ or anything absurd like that, but I don’t think there’s anything wrong w/ normal contractions.

It's not a significant problem, it's more a "scanning the line to see where the syntax is" problem in that it's a visual distraction if you're trying to find the variable name being diagnosed for a complex expression and there are contractions in the wording.

they can be harder to understand for non-native English speakers.

I also don’t think this is true: simple contractions are one of the first things we teach people, and I’ve yet to meet someone whose first language isn’t English and doesn’t know what e.g. ‘isn’t’ is supposed to mean. Is there any actual precedent for anyone being confused about this?

I am not a linguist, but this is something I've heard many times over the years when talking about writing to a multilingual audience. e.g.,

https://techcomm.nz/Story?Action=View&Story_id=394
https://www.wikihow.life/Communicate-with-a-Non-Native-English-Speaker
https://hodigital.blog.gov.uk/2015/12/29/tips-for-writing-for-non-native-english-speakers/
and others

That said, I would not be surprised if we could find plenty of sources saying the opposite.

@Sirraide
Copy link
Member

this is something I've heard many times over the years when talking about writing to a multilingual audience.

Hmm, I don’t know if there are any linguistic studies about this off the top of my head (I can only speak from my personal experience of never having encountered someone who’s had problem w/ contractions, despite having talked to a lot of people whose first language wasn’t English), but my reaction to stuff like that there is a lot of nonsensical linguistic ‘advice’ out there... (nonsense in that it is not at all based on how language actually works or on how people actually talk; think things like ‘you shouldn’t end a sentence with a preposition’, which is, to put it bluntly, abject nonsense perpetrated by would-be grammarians who thought it sensible to apply Latin grammar to English, despite the two being completely different languages that diverged millenia ago)–sorry if I sound a bit mean here (also not talking about you here btw; that was mostly directed towards English teachers who don’t actually know English grammar...), but anyone with a background with linguistics will tell you that we have to put up w/ a lot of nonsense...

So basically, if we actually get complaints from people that our diagnostics (or documentation, etc.) are confusing because they contain contractions, then sure, it’d make perfect sense to do something about it, but I have a feeling the confusing part about C++ compiler diagnostics are generally not the contractions ;Þ

@kparzysz
Copy link
Contributor

Even if there was some subtle distinction between can not and cannot, the message should be worded in such a way that it does not depend on that.

Second, using consistent wording maintains certain "look and feel". Maybe it's my personal experience only, but for me it looks more polished (as in "higher quality", or "more professional").

@AaronBallman
Copy link
Collaborator Author

this is something I've heard many times over the years when talking about writing to a multilingual audience.

Hmm, I don’t know if there are any linguistic studies about this off the top of my head (I can only speak from my personal experience of never having encountered someone who’s had problem w/ contractions, despite having talked to a lot of people whose first language wasn’t English), but my reaction to stuff like that there is a lot of nonsensical linguistic ‘advice’ out there... (nonsense in that it is not at all based on how language actually works or on how people actually talk; think things like ‘you shouldn’t end a sentence with a preposition’, which is, to put it bluntly, abject nonsense perpetrated by would-be grammarians who thought it sensible to apply Latin grammar to English, despite the two being completely different languages that diverged millenia ago)–sorry if I sound a bit mean here (also not talking about you here btw; that was mostly directed towards English teachers who don’t actually know English grammar...), but anyone with a background with linguistics will tell you that we have to put up w/ a lot of nonsense...

So basically, if we actually get complaints from people that our diagnostics (or documentation, etc.) are confusing because they contain contractions, then sure, it’d make perfect sense to do something about it, but I have a feeling the confusing part about C++ compiler diagnostics are generally not the contractions ;Þ

I don't think contractions are the confusing part of diagnostics, but I do think we want consistency between our diagnostics as much as possible and we use a mixture of both contractions and no contractions inconsistently (though that's improving). I fall on the side of avoiding contractions rather than including them.

Do you have strong opinions on using contractions? Would you recommend we go the other direction and switch to consistently using contractions?

@Sirraide
Copy link
Member

I don't think contractions are the confusing part of diagnostics, but I do think we want consistency between our diagnostics as much as possible and we use a mixture of both contractions and no contractions inconsistently (though that's improving). I fall on the side of avoiding contractions rather than including them.

I guess that makes sense yeah (I personally don’t care that much about consistency wrt diagnostic wording, but I can also see why that’s something we’d want).

Do you have strong opinions on using contractions? Would you recommend we go the other direction and switch to consistently using contractions?

I don’t have strong opinions about this, no; linguistically, imo either way is fine (I’d just be a bad linguist if I didn’t argue against prescriptivism whenever it comes up ;Þ), but I don’t have a problem w/ picking one over the other for non-linguistic reasons. I mean, I would probably prefer it if we could write diagnostic messages w/o having to think too hard as to what the correct style is wrt things like these (because it’s what I think people will just naturally do), but if it’s just a matter of ‘we want to be consistent, so let’s always do X, even though that choice is more or less arbitrary’, then that’s equally valid.

So in sum, enforcing one over the other is not what I’d want to do (and I just don’t think it’s all that necessary), but if we decide to go that route, then I’m fine w/ that too ;Þ

@kparzysz
Copy link
Contributor

Here's another thing---could there be tools that try to parse the messages (e.g. something that runs clang and presents the messages to the user in some form)? Having a policy such as "single quotes only come in pairs" could make it easier. I don't know if that's something we should even take into consideration, it's just a thought.

@Sirraide
Copy link
Member

could there be tools that try to parse the messages

Hmm, I think we have other formats that are better suited for that (don’t we have a flag that makes us print JSON diagnostics?), so I’d hope that no-one tries to just parse the diagnostics from the terminal, and even then, you could definitely hard-code common contractions imo, but that is an interesting question nonetheless.

@AaronBallman
Copy link
Collaborator Author

could there be tools that try to parse the messages

Hmm, I think we have other formats that are better suited for that (don’t we have a flag that makes us print JSON diagnostics?), so I’d hope that no-one tries to just parse the diagnostics from the terminal, and even then, you could definitely hard-code common contractions imo, but that is an interesting question nonetheless.

Yeah, I think we'd want to push folks towards using -fdiagnostics-format which supports interchange formats like SARIF.

I guess that makes sense yeah (I personally don’t care that much about consistency wrt diagnostic wording, but I can also see why that’s something we’d want).

While it is annoying to have to remember a list of rules about diagnostic messages, I think it's important that we aim for consistency because I think we want there to be one "voice" to things like diagnostics, documentation, and other communications with the user. (The docs don't have to be consistent with the diagnostics, but should be consistent with other documentation in Clang, etc.) I think that provides a better user experience than having multiple "voices" throughout the product.

Here's where we're at currently for contractions vs long form (looking at sema, parse, and common diagnostics):
can't: 0 contractions vs 795 long
isn't: 9 contractions vs 352 long
doesn't: 3 contractions vs 190 long
aren't: 3 contractions vs 41 long
shouldn't: 0 contractions vs 26 long
don't: 2 contractions vs 15 long
won't: 0 contractions vs 13 long
wasn't: 0 contractions vs 10 long
couldn't: 0 contractions vs 9 long
hasn't: 0 contractions vs 2 long
didn't: 0 contractions vs 0 long

so I think we have a general preference for long form over contractions. From spot-checking the uses of contractions, it seems that all uses could pretty easily be written just as clearly as the long form and it wouldn't be much churn (about 15-20 messages in total).

So in sum, enforcing one over the other is not what I’d want to do (and I just don’t think it’s all that necessary), but if we decide to go that route, then I’m fine w/ that too ;Þ

I don't see much benefit to having such a lopsided approach as we currently have. That said, the proposal is to "prefer", so it's guiding rather than purely prescriptive. Can you live with that?

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s fine, yeah

@AaronBallman AaronBallman merged commit f710e4c into llvm:main Nov 20, 2024
13 checks passed
@AaronBallman AaronBallman deleted the aballman-diagnostic-wording-contractions branch November 20, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer clang Clang issues not falling into any other category documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants