Skip to content

[-Wunsafe-buffer-usage][BoundsSafety] support __null_terminated in the interop mode #10060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 23, 2025

Conversation

ziqingluo-90
Copy link

For a __null_terminated pointer 'p', only a __null_terminated pointer, a string literal or a call to std::string::c_str can be assigned to (/passed to/used to initialize) 'p'. Otherwise, in the interop mode, an -Wunsafe-buffer-usage warning will be reported.

Similar to how __counted_by is handled, incompatibility in nested __null_terminated pointers (e.g., input/output pointer type arguments) is reported as Bounds Safety type checking errors.

(rdar://138798346)

@ziqingluo-90
Copy link
Author

@swift-ci test

@ziqingluo-90
Copy link
Author

CC: @dtarditi

@ziqingluo-90 ziqingluo-90 force-pushed the dev/ziqing/PR-138798346 branch from 0be9793 to 6a7f6c7 Compare February 19, 2025 01:49
@ziqingluo-90
Copy link
Author

@swift-ci test

…e interop mode

For a __null_terminated pointer 'p', only a __null_terminated pointer,
a string literal or a call to `std::string::c_str` can be assigned to
(/passed to/used to initialize) 'p'.  Otherwise, in the interop mode,
an -Wunsafe-buffer-usage warning will be reported.

Similar to how __counted_by is handled, incompatibility in nested
__null_terminated pointers (e.g., input/output pointer type arguments)
is reported as Bounds Safety type checking errors.

(rdar://138798346)
@ziqingluo-90 ziqingluo-90 force-pushed the dev/ziqing/PR-138798346 branch from 6a7f6c7 to 1e578f1 Compare February 19, 2025 19:25
@ziqingluo-90
Copy link
Author

@swift-ci test

Copy link

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions.


#include <ptrcheck.h>
typedef unsigned size_t;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these new tests look unrelated to this change. Please consider splitting them out to a separate PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. These are things left un-upstreamed.

"initialization of a pointer of|"
"assignment to a pointer of|"
"casting to}2 '__null_terminated' type; "
"only '__null_terminated' pointers, string literals, and 'std::string::c_str' calls are compatible with '__null_terminated' pointers">,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler to say that. the source pointer may not point to null-terminated data. I could imagine that we eventually we will increase the number of cases that we allow. I don't have super strong opinions about this - it is nice to tell people what is allowed, but it doesn't tell them what the error is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yeah, I should explain what the error is. The convention for warning messages (at least for -Wunsafe-buffer-usage) is that we have a short but clear enough sentence to just say "something is unsafe", then put necessary explanation in notes. My message here might already be lengthy. I will explain what the error is in a note and also move "only '__null_terminated' pointers, ..." (sort of the suggestion) to a note too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I maybe misunderstood part of your comment here.
You were suggesting to simply saying "source pointer may not point to null-terminated data." instead of breaking into cases. I'm not against the idea of making the warning message simpler. My only concern is that warning messages under -Wunsafe-buffer-usage all have the forms "unsafe [some operation]" or "[some operation] is unsafe". I'd like this message to follow that.

Different cases are determined by AssignmentAction, which is handled by the existing framework in Sema. I assume it is stable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe "unsafe conversion to __null_terminated type; source pointer may not point to null-terminated data." ? If "conversion" is a appropriate umbrella term for these operations.

"only '__null_terminated' pointers, string literals, and 'std::string::c_str' calls are compatible with '__null_terminated' pointers">,
InGroup<UnsafeBufferUsage>, DefaultIgnore;
def warn_unsafe_named_cast_to_null_terminated_pointer : Warning<
"C++ named cast to '__null_terminated' type is unsafe">,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be overly specific: any cast to __null_terminated is unsafe. If I got this error message I'd try doing a C-style cast instead, which would not help.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, looking at the test cases it seems like C-style casts are deliberately allowed. What's the rationale?

Copy link
Author

@ziqingluo-90 ziqingluo-90 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original idea is that it should be rare for C++ programmers to cast null_terminated pointers with named casts. And, Bounds Safety's type checking for null_terminated can only be re-used for C-Style casting. So why not force C++ programmers to just use C-Style casts for null_terminated pointers?

But now I changed my mind. I think it is too strict. Moreover, it will be difficult to relax the restriction once this warning is added. On the other hand, we are allowed to have some false negatives for some rare use cases and make warnings for them later. So I think I will remove this warning. @rapidsna @patrykstefanski WDYT?

Thanks for bringing this up!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that it's easier to add warnings later than relaxing later, since relaxing the rules doesn't disrupt existing adopters while new warnings do. But that said, if there's no reason for preferring one cast over the other it seems reasonable to remove it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the test case to allow C-Style casts?


#ifndef INTEROP_MODE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the code isn't actually shared for INTEROP_MODE vs !INTEROP_MODE. Shouldn't it be just a separate test file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. These are things left un-upstreamed.

@@ -238,6 +238,17 @@ namespace {
const auto *DstPointerType = DestType->getAs<ValueTerminatedType>();
IsDstNullTerm =
DstPointerType->getTerminatorValue(Self.Context).isZero();
if (Self.LangOpts.isBoundsSafetyAttributeOnlyMode() &&
Self.LangOpts.CPlusPlus &&
!Self.Diags.isIgnored(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it controlled by diag::warn_unsafe_assign_to_null_terminated_pointer, not by the UnsafeBufferUsage?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we use diag::warn_unsafe_buffer_operation to indicate whether -Wunsafe-buffer-usage is on for a source location. (For example: here.)

The diag::warn_unsafe_assign_to_null_terminated_pointer is under the UnsafeBufferUsage warning group and is only controlled by the -Wunsafe-buffer-usage flag. So functionally it is equivalent to using diag::warn_unsafe_buffer_operation. I originally thought it is more specific but now I feel like using diag::warn_unsafe_buffer_operation is more consistent and probably more readable. I will change it to diag::warn_unsafe_buffer_operation, if this is your concern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, consistently using diag::warn_unsafe_buffer_operation sounds a more robust implementation rather than relying on different warnings for each individual set of warnings.

Nit: You might also want to have a helper function like IsSafeBufferMode and use that instead to avoid mistakes and duplication.

// In C++ Safe Buffers/Bounds Safety interop mode, the compiler
// reports a warning under -Wunsafe-buffer-usage:
DiagKind = diag::warn_unsafe_assign_to_null_terminated_pointer;
isInvalid = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be isInvalid = false because this is a warning. Then, it's controlled by -Werror if the compiler still wants to fail the build.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is a mistake!

if (DestType->getAs<ValueTerminatedType>()) {
Diag(E->getBeginLoc(),
diag::warn_unsafe_named_cast_to_null_terminated_pointer);
return ExprError();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it's weird that it returns ExprError() on a warning. It should either an error or just pass through.

Copy link
Author

@ziqingluo-90 ziqingluo-90 Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it.

@@ -576,6 +587,17 @@ Sema::BuildCXXNamedCast(SourceLocation OpLoc, tok::TokenKind Kind,

Op.checkQualifiedDestType();

// In C++ Safe Buffers/Bounds Safety interop mode, forbidding for now all
// named cast to __null_terminated pointer type.
if (DestType->isPointerType() && LangOpts.hasBoundsSafetyAttributes() &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Self.LangOpts.isBoundsSafetyAttributeOnlyMode()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general idea is that the interoperation work should be guarded by LangOpts.hasBoundsSafetyAttributes combined with the testing of whether-Wunsafe-buffer-usage is enabled for most places. We only use LangOpts.isBoundsSafetyAttributeOnlyMode() to guard against the work that otherwise can affect -fbounds-safety. Let me know if this is not the best way of thinking.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangOpts.hasBoundsSafetyAttributes is enabled in both the annotation only mode and the full firebloom mode. I was asking because you may want an error and may be a different diagnostics in the firebloom mode, which we didn't design yet. The current way of doing it is still fine for this patch and we can adjust it later for C++ firebloom mode.

// In C++ Safe Buffers/Bounds Safety interop mode, forbidding for now all
// named cast to __null_terminated pointer type.
if (DestType->isPointerType() && LangOpts.hasBoundsSafetyAttributes() &&
!Diags.isIgnored(diag::warn_unsafe_named_cast_to_null_terminated_pointer,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why it's guarded by this specific diag, not the unsafe buffer usage flag?

@@ -13603,6 +13603,19 @@ def note_unsafe_count_attributed_pointer_argument : Note<
def warn_unsafe_single_pointer_argument : Warning<
"unsafe assignment to function parameter of __single pointer type">,
InGroup<UnsafeBufferUsage>, DefaultIgnore;
def warn_unsafe_assign_to_null_terminated_pointer : Warning<
Copy link

@rapidsna rapidsna Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably almost a duplicate of an existing one warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by. I guess you introduced a new one because you want something in the UsafeBufferUsage group? Okay, I see the diagnostic message at the end is different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by mentions "indexable" and "unsafe_forge", which do not exist in the interop mode. This is the main reason for having another diagnostic message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move those "indexable" and "unsafe_forge" to %select in warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by and not print them in interop.

- Make the diagnostic message simply saying "unsafe [conversion
  operation] of '__null_terminated' type"; further explanation is put
  in notes.
- Use `warn_unsafe_buffer_operation` to consistently test if CXX Safe
  Buffer is enabled.  Add an helper function for the test.
- Remove the warning on using C++ named cast to obtain
  __null_terminated pointers.
- Added tests for good cases of C-Style casts.
- Consistently use isBoundsSafetyAttributeOnlyMode for code involving
  bounds attributes but not shared with full Bounds Safety mode.
@ziqingluo-90 ziqingluo-90 force-pushed the dev/ziqing/PR-138798346 branch from 17a8a92 to 453073f Compare February 21, 2025 00:02
@@ -205,7 +205,7 @@ namespace {
/// int *__bidi_indexable p1;
/// int *__indexable p2 = (int *__indexable)p1;
/// \endcode
if (Self.getLangOpts().BoundsSafety) {
if (Self.getLangOpts().hasBoundsSafetyAttributes()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We want to remove hasBoundsSafetyAttributes(), since it is not needed anymore. Self.getLangOpts().hasBoundsSafetyAttributes() -> Self.getLangOpts().BoundsSafetyAttributes.

@@ -13603,6 +13603,19 @@ def note_unsafe_count_attributed_pointer_argument : Note<
def warn_unsafe_single_pointer_argument : Warning<
"unsafe assignment to function parameter of __single pointer type">,
InGroup<UnsafeBufferUsage>, DefaultIgnore;
def warn_unsafe_assign_to_null_terminated_pointer : Warning<

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move those "indexable" and "unsafe_forge" to %select in warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by and not print them in interop.

@@ -1560,6 +1560,11 @@ class Sema final : public SemaBase {
/// be controlled with \#pragma clang abi_ptr_attr set(*ATTR*). Defaults to
/// __single in user code and __unsafe_indexable in system headers.
BoundsSafetyPointerAttributes CurPointerAbi;

/// \returns true iff `-Wunsafe-buffer-usage` is enabled for `Loc`.
bool isCXXSafeBuffersEnabledAt(SourceLocation Loc) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can this method be const?

// match fails:
ObjTypeII = RecordDecl->getIdentifier();
if (MethodDeclII && ObjTypeII &&
MethodDeclII->getName() == "c_str" &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there a reason we support std::string::c_str() but not std::string::data()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string::data() is not null-termination guaranteed as I remember.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/string/basic_string/c_str
c_str() and data() perform the same function. (since C++11)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, it's changed since C++11.

if (MethodDeclII && ObjTypeII &&
MethodDeclII->getName() == "c_str" &&
ObjTypeII->getName() == "basic_string")
return Sema::Compatible;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if LHS is terminated by a value different than 0, e.g. const char *__terminated_by(42)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Need to check it is 0-terminator.

DiagKind = diag::warn_unsafe_assign_to_null_terminated_pointer;
isInvalid = false;
break;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we don't check if the terminator is 0.

Copy link
Author

@ziqingluo-90 ziqingluo-90 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to re-use warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by. We do not need to check terminator there now.

- Change LangOpts.hasBoundsSafetyAttributes() to LangOpts.BoundsSafetyAttributes
- Correct that std::string::c_str() should be not compatible with (non-0)-terminated pointers
Reuse the existing warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by
@ziqingluo-90
Copy link
Author

ziqingluo-90 commented Feb 21, 2025

I have addressed comments by:

  • Remove irrelevant changes in test/SemaCXX/warn-unsafe-buffer-usage-libc-functions.cpp
  • Reuse warn_bounds_safety_incompatible_non_terminated_by_to_terminated_by for interop's diagnostic. Hide irrelevant texts in "%select".
  • Use warn_unsafe_buffer_operation to consistently test if CXX Safe Buffer is enabled.
  • Remove the warning on __null_terminated casting with C++ named casts.
  • Add tests for OK cases of C-Style casts.
  • Consistently use isBoundsSafetyAttributeOnlyMode to guard the code that involves bounds attributes but is not shared with full Bounds Safety mode.
  • Change LangOpts.hasBoundsSafetyAttributes() to LangOpts.BoundsSafetyAttributes
  • std::string::c_str/data() should only be compatible with 0-terminated pointers

@patrykstefanski @rapidsna @hnrklssn @dtarditi please take another look

@ziqingluo-90
Copy link
Author

@swift-ci test

@@ -20694,6 +20720,9 @@ bool Sema::DiagnoseAssignmentResult(AssignConvertType ConvTy,
AssignedVarFixIts;
VarDecl *BoundsSafetyIncompatNestedPtrLocalVarToFix = nullptr;
FixItHint BoundsSafetyIncompatNestedPtrFixIt;
bool InCXXBoundsSafetyMode = LangOpts.CPlusPlus &&
Copy link

@rapidsna rapidsna Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This seems to be duplicated too in another Sema file. So it's another candidate to get a helper function that tales Loc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

/// BoundsSafety: Such case should be handled as BoundsSafetyPointerCast.
/// \code
/// int *__bidi_indexable p1;
/// int *__indexable p2 = (int *__indexable)p1;
/// \endcode
if (Self.getLangOpts().BoundsSafety) {
if (Self.getLangOpts().BoundsSafetyAttributes) {
unsigned DiagKind = 0;
bool isInvalid = false;
// The type error may be nested, so any pointer can result in VT errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to assert that Self.getLangOpts().BoundsSafety || ConvertTy == Sema::Compatible || (ConvertTy == Sema::IncompatibleNonValueTerminatedToValueTerminatedPointer && InCXXBoundsSafetyMode)? Especially since CheckValueTerminatedAssignmentConstraints used to only run under BoundsSafety, but now it runs under BoundsSafetyAttributes even for C and in C++ without safe buffers mode. We really don't want it to accidentally start returning errors in these modes. Otherwise LGTM

Copy link
Author

@ziqingluo-90 ziqingluo-90 Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. But maybe the following is enough?
Self.getLangOpts().BoundsSafety || Self.isCXXSafeBuffersBoundsSafetyInteropEnabledAt(SrcExpr.get()->getBeginLoc()) ,
where isCXXSafeBuffersBoundsSafetyInteropEnabledAt is defined by

  bool isCXXSafeBuffersBoundsSafetyInteropEnabledAt(SourceLocation Loc) const {
    return LangOpts.CPlusPlus && LangOpts.isBoundsSafetyAttributeOnlyMode() &&
           isCXXSafeBuffersEnabledAt(Loc);
  }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me 👍

- Refactor interop mode check to a helper function
- Guard the type checking for C-style cast so that it runs only in full bounds safety or interop only mode
@ziqingluo-90
Copy link
Author

I plan to merge this PR this weekend. @hnrklssn @rapidsna

@ziqingluo-90 ziqingluo-90 merged commit fe73f37 into swiftlang:next Feb 23, 2025
ziqingluo-90 added a commit to ziqingluo-90/apple-llvm-project that referenced this pull request Feb 23, 2025
…e interop mode (swiftlang#10060)

For a __null_terminated pointer 'p', only a __null_terminated pointer,
a string literal or a call to `std::string::c_str/data` can be assigned to
(/passed to/used to initialize, etc.) 'p'.  Otherwise, an error will be emitted.
The implementation reuses Bounds Safety's diagnostics.

(rdar://138798346)
ziqingluo-90 added a commit that referenced this pull request Feb 25, 2025
…e interop mode (#10060) (#10103)

For a __null_terminated pointer 'p', only a __null_terminated pointer,
a string literal or a call to `std::string::c_str/data` can be assigned to
(/passed to/used to initialize, etc.) 'p'.  Otherwise, an error will be emitted.
The implementation reuses Bounds Safety's diagnostics.

(rdar://138798346)
@ziqingluo-90 ziqingluo-90 deleted the dev/ziqing/PR-138798346 branch February 25, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants