Skip to content

Commit 231f828

Browse files
[clang] Implement __attribute__((format_matches)) (llvm#116708)
This implements ``__attribute__((format_matches))``, as described in the RFC: https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076 The ``format`` attribute only allows the compiler to check that a format string matches its arguments. If the format string is passed independently of its arguments, there is no way to have the compiler check it. ``format_matches(flavor, fmtidx, example)`` allows the compiler to check format strings against the ``example`` format string instead of against format arguments. See the changes to AttrDocs.td in this diff for more information. Implementation-wise, this change subclasses CheckPrintfHandler to allow it to collect specifiers into arrays, and implements comparing that two specifiers are equivalent. `checkFormatStringExpr` gets a new `ReferenceFormatString` argument that is piped down when calling a function with the `format_matches` attribute (and is `nullptr` otherwise); this is the string that the actual format string is compared against. Although this change does not enable -Wformat-nonliteral by default, IMO, all the pieces are now in place such that it could be. rdar://84936554
1 parent cdf2a52 commit 231f828

File tree

14 files changed

+1409
-180
lines changed

14 files changed

+1409
-180
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -639,6 +639,50 @@ Attribute Changes in Clang
639639
- Clang now supports ``[[clang::lifetime_capture_by(X)]]``. Similar to lifetimebound, this can be
640640
used to specify when a reference to a function parameter is captured by another capturing entity ``X``.
641641

642+
- There is a new ``format_matches`` attribute to complement the existing
643+
``format`` attribute. ``format_matches`` allows the compiler to verify that
644+
a format string argument is equivalent to a reference format string: it is
645+
useful when a function accepts a format string without its accompanying
646+
arguments to format. For instance:
647+
648+
.. code-block:: c
649+
650+
static int status_code;
651+
static const char *status_string;
652+
653+
void print_status(const char *fmt) {
654+
fprintf(stderr, fmt, status_code, status_string);
655+
// ^ warning: format string is not a string literal [-Wformat-nonliteral]
656+
}
657+
658+
void stuff(void) {
659+
print_status("%s (%#08x)\n");
660+
// order of %s and %x is swapped but there is no diagnostic
661+
}
662+
663+
Before the introducion of ``format_matches``, this code cannot be verified
664+
at compile-time. ``format_matches`` plugs that hole:
665+
666+
.. code-block:: c
667+
668+
__attribute__((format_matches(printf, 1, "%x %s")))
669+
void print_status(const char *fmt) {
670+
fprintf(stderr, fmt, status_code, status_string);
671+
// ^ `fmt` verified as if it was "%x %s" here; no longer triggers
672+
// -Wformat-nonliteral, would warn if arguments did not match "%x %s"
673+
}
674+
675+
void stuff(void) {
676+
print_status("%s (%#08x)\n");
677+
// warning: format specifier 's' is incompatible with 'x'
678+
// warning: format specifier 'x' is incompatible with 's'
679+
}
680+
681+
Like with ``format``, the first argument is the format string flavor and the
682+
second argument is the index of the format string parameter.
683+
``format_matches`` accepts an example valid format string as its third
684+
argument. For more information, see the Clang attributes documentation.
685+
642686
Improvements to Clang's diagnostics
643687
-----------------------------------
644688
- Clang now emits an error instead of a warning for ``-Wundefined-internal``

clang/include/clang/AST/FormatString.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ class ArgType {
292292
};
293293

294294
private:
295-
const Kind K;
295+
Kind K;
296296
QualType T;
297297
const char *Name = nullptr;
298298
bool Ptr = false;
@@ -338,6 +338,7 @@ class ArgType {
338338
}
339339

340340
MatchKind matchesType(ASTContext &C, QualType argTy) const;
341+
MatchKind matchesArgType(ASTContext &C, const ArgType &other) const;
341342

342343
QualType getRepresentativeType(ASTContext &C) const;
343344

clang/include/clang/Basic/Attr.td

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1770,6 +1770,16 @@ def Format : InheritableAttr {
17701770
let Documentation = [FormatDocs];
17711771
}
17721772

1773+
def FormatMatches : InheritableAttr {
1774+
let Spellings = [GCC<"format_matches">];
1775+
let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">, ExprArgument<"ExpectedFormat">];
1776+
let AdditionalMembers = [{
1777+
StringLiteral *getFormatString() const;
1778+
}];
1779+
let Subjects = SubjectList<[ObjCMethod, Block, HasFunctionProto]>;
1780+
let Documentation = [FormatMatchesDocs];
1781+
}
1782+
17731783
def FormatArg : InheritableAttr {
17741784
let Spellings = [GCC<"format_arg">];
17751785
let Args = [ParamIdxArgument<"FormatIdx">];

clang/include/clang/Basic/AttrDocs.td

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3644,6 +3644,132 @@ behavior of the program is undefined.
36443644
}];
36453645
}
36463646

3647+
def FormatMatchesDocs : Documentation {
3648+
let Category = DocCatFunction;
3649+
let Content = [{
3650+
3651+
The ``format`` attribute is the basis for the enforcement of diagnostics in the
3652+
``-Wformat`` family, but it only handles the case where the format string is
3653+
passed along with the arguments it is going to format. It cannot handle the case
3654+
where the format string and the format arguments are passed separately from each
3655+
other. For instance:
3656+
3657+
.. code-block:: c
3658+
3659+
static const char *first_name;
3660+
static double todays_temperature;
3661+
static int wind_speed;
3662+
3663+
void say_hi(const char *fmt) {
3664+
printf(fmt, first_name, todays_temperature);
3665+
// ^ warning: format string is not a string literal
3666+
printf(fmt, first_name, wind_speed);
3667+
// ^ warning: format string is not a string literal
3668+
}
3669+
3670+
int main() {
3671+
say_hi("hello %s, it is %g degrees outside");
3672+
say_hi("hello %s, it is %d degrees outside!");
3673+
// ^ no diagnostic, but %d cannot format doubles
3674+
}
3675+
3676+
In this example, ``fmt`` is expected to format a ``const char *`` and a
3677+
``double``, but these values are not passed to ``say_hi``. Without the
3678+
``format`` attribute (which cannot apply in this case), the -Wformat-nonliteral
3679+
diagnostic unnecessarily triggers in the body of ``say_hi``, and incorrect
3680+
``say_hi`` call sites do not trigger a diagnostic.
3681+
3682+
To complement the ``format`` attribute, Clang also defines the
3683+
``format_matches`` attribute. Its syntax is similar to the ``format``
3684+
attribute's, but instead of taking the index of the first formatted value
3685+
argument, it takes a C string literal with the expected specifiers:
3686+
3687+
.. code-block:: c
3688+
3689+
static const char *first_name;
3690+
static double todays_temperature;
3691+
static int wind_speed;
3692+
3693+
__attribute__((__format_matches__(printf, 1, "%s %g")))
3694+
void say_hi(const char *fmt) {
3695+
printf(fmt, first_name, todays_temperature); // no dignostic
3696+
printf(fmt, first_name, wind_speed); // warning: format specifies type 'int' but the argument has type 'double'
3697+
}
3698+
3699+
int main() {
3700+
say_hi("hello %s, it is %g degrees outside");
3701+
say_hi("it is %g degrees outside, have a good day %s!");
3702+
// warning: format specifies 'double' where 'const char *' is required
3703+
// warning: format specifies 'const char *' where 'double' is required
3704+
}
3705+
3706+
The third argument to ``format_matches`` is expected to evaluate to a **C string
3707+
literal** even when the format string would normally be a different type for the
3708+
given flavor, like a ``CFStringRef`` or a ``NSString *``.
3709+
3710+
The only requirement on the format string literal is that it has specifiers
3711+
that are compatible with the arguments that will be used. It can contain
3712+
arbitrary non-format characters. For instance, for the purposes of compile-time
3713+
validation, ``"%s scored %g%% on her test"`` and ``"%s%g"`` are interchangeable
3714+
as the format string argument. As a means of self-documentation, users may
3715+
prefer the former when it provides a useful example of an expected format
3716+
string.
3717+
3718+
In the implementation of a function with the ``format_matches`` attribute,
3719+
format verification works as if the format string was identical to the one
3720+
specified in the attribute.
3721+
3722+
.. code-block:: c
3723+
3724+
__attribute__((__format_matches__(printf, 1, "%s %g")))
3725+
void say_hi(const char *fmt) {
3726+
printf(fmt, "person", 546);
3727+
// ^ warning: format specifies type 'double' but the
3728+
// argument has type 'int'
3729+
// note: format string is defined here:
3730+
// __attribute__((__format_matches__(printf, 1, "%s %g")))
3731+
// ^~
3732+
}
3733+
3734+
3735+
At the call sites of functions with the ``format_matches`` attribute, format
3736+
verification instead compares the two format strings to evaluate their
3737+
equivalence. Each format flavor defines equivalence between format specifiers.
3738+
Generally speaking, two specifiers are equivalent if they format the same type.
3739+
For instance, in the ``printf`` flavor, ``%2i`` and ``%-0.5d`` are compatible.
3740+
When ``-Wformat-signedness`` is disabled, ``%d`` and ``%u`` are compatible. For
3741+
a negative example, ``%ld`` is incompatible with ``%d``.
3742+
3743+
Do note the following un-obvious cases:
3744+
3745+
* Passing ``NULL`` as the format string does not trigger format diagnostics.
3746+
* When the format string is not NULL, it cannot _miss_ specifiers, even in
3747+
trailing positions. For instance, ``%d`` is not accepted when the required
3748+
format is ``%d %d %d``.
3749+
* While checks for the ``format`` attribute tolerate sone size mismatches
3750+
that standard argument promotion renders immaterial (such as formatting an
3751+
``int`` with ``%hhd``, which specifies a ``char``-sized integer), checks for
3752+
``format_matches`` require specified argument sizes to match exactly.
3753+
* Format strings expecting a variable modifier (such as ``%*s``) are
3754+
incompatible with format strings that would itemize the variable modifiers
3755+
(such as ``%i %s``), even if the two specify ABI-compatible argument lists.
3756+
* All pointer specifiers, modifiers aside, are mutually incompatible. For
3757+
instance, ``%s`` is not compatible with ``%p``, and ``%p`` is not compatible
3758+
with ``%n``, and ``%hhn`` is incompatible with ``%s``, even if the pointers
3759+
are ABI-compatible or identical on the selected platform. However, ``%0.5s``
3760+
is compatible with ``%s``, since the difference only exists in modifier flags.
3761+
This is not overridable with ``-Wformat-pedantic`` or its inverse, which
3762+
control similar behavior in ``-Wformat``.
3763+
3764+
At this time, clang implements ``format_matches`` only for format types in the
3765+
``printf`` family. This includes variants such as Apple's NSString format and
3766+
the FreeBSD ``kprintf``, but excludes ``scanf``. Using a known but unsupported
3767+
format silently fails in order to be compatible with other implementations that
3768+
would support these formats.
3769+
3770+
}];
3771+
}
3772+
36473773
def FlagEnumDocs : Documentation {
36483774
let Category = DocCatDecl;
36493775
let Content = [{

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7688,6 +7688,7 @@ def warn_format_nonliteral_noargs : Warning<
76887688
def warn_format_nonliteral : Warning<
76897689
"format string is not a string literal">,
76907690
InGroup<FormatNonLiteral>, DefaultIgnore;
7691+
def err_format_nonliteral : Error<"format string is not a string literal">;
76917692

76927693
def err_unexpected_interface : Error<
76937694
"unexpected interface name %0: expected expression">;
@@ -10008,6 +10009,8 @@ def note_previous_declaration_as : Note<
1000810009

1000910010
def warn_printf_insufficient_data_args : Warning<
1001010011
"more '%%' conversions than data arguments">, InGroup<FormatInsufficientArgs>;
10012+
def warn_format_cmp_specifier_arity : Warning<
10013+
"%select{fewer|more}0 specifiers in format string than expected">, InGroup<FormatInsufficientArgs>;
1001110014
def warn_printf_data_arg_not_used : Warning<
1001210015
"data argument not used by format string">, InGroup<FormatExtraArgs>;
1001310016
def warn_format_invalid_conversion : Warning<
@@ -10125,6 +10128,27 @@ def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
1012510128
def note_printf_c_str: Note<"did you mean to call the %0 method?">;
1012610129
def note_format_security_fixit: Note<
1012710130
"treat the string as an argument to avoid this">;
10131+
def warn_format_string_type_incompatible : Warning<
10132+
"passing '%0' format string where '%1' format string is expected">,
10133+
InGroup<Format>;
10134+
def warn_format_cmp_role_mismatch : Warning<
10135+
"format argument is %select{a value|an indirect field width|an indirect "
10136+
"precision|an auxiliary value}0, but it should be %select{a value|an indirect "
10137+
"field width|an indirect precision|an auxiliary value}1">, InGroup<Format>;
10138+
def warn_format_cmp_modifierfor_mismatch : Warning<
10139+
"format argument modifies specifier at position %0, but it should modify "
10140+
"specifier at position %1">, InGroup<Format>;
10141+
def warn_format_cmp_sensitivity_mismatch : Warning<
10142+
"argument sensitivity is %select{unspecified|private|public|sensitive}0, but "
10143+
"it should be %select{unspecified|private|public|sensitive}1">, InGroup<Format>;
10144+
def warn_format_cmp_specifier_mismatch : Warning<
10145+
"format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
10146+
def warn_format_cmp_specifier_sign_mismatch : Warning<
10147+
"signedness of format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
10148+
def warn_format_cmp_specifier_mismatch_pedantic : Extension<
10149+
warn_format_cmp_specifier_sign_mismatch.Summary>, InGroup<FormatPedantic>;
10150+
def note_format_cmp_with : Note<
10151+
"comparing with this %select{specifier|format string}0">;
1012810152

1012910153
def warn_null_arg : Warning<
1013010154
"null passed to a callee that requires a non-null argument">,

clang/include/clang/Sema/Sema.h

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2353,6 +2353,7 @@ class Sema final : public SemaBase {
23532353
FAPK_Fixed, // values to format are fixed (no C-style variadic arguments)
23542354
FAPK_Variadic, // values to format are passed as variadic arguments
23552355
FAPK_VAList, // values to format are passed in a va_list
2356+
FAPK_Elsewhere, // values to format are not passed to this function
23562357
};
23572358

23582359
// Used to grab the relevant information from a FormatAttr and a
@@ -2363,12 +2364,15 @@ class Sema final : public SemaBase {
23632364
FormatArgumentPassingKind ArgPassingKind;
23642365
};
23652366

2366-
/// Given a FunctionDecl's FormatAttr, attempts to populate the
2367-
/// FomatStringInfo parameter with the FormatAttr's correct format_idx and
2368-
/// firstDataArg. Returns true when the format fits the function and the
2369-
/// FormatStringInfo has been populated.
2370-
static bool getFormatStringInfo(const FormatAttr *Format, bool IsCXXMember,
2371-
bool IsVariadic, FormatStringInfo *FSI);
2367+
/// Given a function and its FormatAttr or FormatMatchesAttr info, attempts to
2368+
/// populate the FomatStringInfo parameter with the attribute's correct
2369+
/// format_idx and firstDataArg. Returns true when the format fits the
2370+
/// function and the FormatStringInfo has been populated.
2371+
static bool getFormatStringInfo(const Decl *Function, unsigned FormatIdx,
2372+
unsigned FirstArg, FormatStringInfo *FSI);
2373+
static bool getFormatStringInfo(unsigned FormatIdx, unsigned FirstArg,
2374+
bool IsCXXMember, bool IsVariadic,
2375+
FormatStringInfo *FSI);
23722376

23732377
// Used by C++ template instantiation.
23742378
ExprResult BuiltinShuffleVector(CallExpr *TheCall);
@@ -2390,7 +2394,10 @@ class Sema final : public SemaBase {
23902394
FST_OSLog,
23912395
FST_Unknown
23922396
};
2397+
static StringRef GetFormatStringTypeName(FormatStringType FST);
2398+
static FormatStringType GetFormatStringType(StringRef FormatFlavor);
23932399
static FormatStringType GetFormatStringType(const FormatAttr *Format);
2400+
static FormatStringType GetFormatStringType(const FormatMatchesAttr *Format);
23942401

23952402
bool FormatStringHasSArg(const StringLiteral *FExpr);
23962403

@@ -2532,6 +2539,25 @@ class Sema final : public SemaBase {
25322539
bool IsMemberFunction, SourceLocation Loc, SourceRange Range,
25332540
VariadicCallType CallType);
25342541

2542+
/// Verify that two format strings (as understood by attribute(format) and
2543+
/// attribute(format_matches) are compatible. If they are incompatible,
2544+
/// diagnostics are emitted with the assumption that \c
2545+
/// AuthoritativeFormatString is correct and
2546+
/// \c TestedFormatString is wrong. If \c FunctionCallArg is provided,
2547+
/// diagnostics will point to it and a note will refer to \c
2548+
/// TestedFormatString or \c AuthoritativeFormatString as appropriate.
2549+
bool
2550+
CheckFormatStringsCompatible(FormatStringType FST,
2551+
const StringLiteral *AuthoritativeFormatString,
2552+
const StringLiteral *TestedFormatString,
2553+
const Expr *FunctionCallArg = nullptr);
2554+
2555+
/// Verify that one format string (as understood by attribute(format)) is
2556+
/// self-consistent; for instance, that it doesn't have multiple positional
2557+
/// arguments referring to the same argument in incompatible ways. Diagnose
2558+
/// if it isn't.
2559+
bool ValidateFormatString(FormatStringType FST, const StringLiteral *Str);
2560+
25352561
/// \brief Enforce the bounds of a TCB
25362562
/// CheckTCBEnforcement - Enforces that every function in a named TCB only
25372563
/// directly calls other functions in the same TCB as marked by the
@@ -2733,11 +2759,17 @@ class Sema final : public SemaBase {
27332759
VariadicCallType CallType, SourceLocation Loc,
27342760
SourceRange Range,
27352761
llvm::SmallBitVector &CheckedVarArgs);
2762+
bool CheckFormatString(const FormatMatchesAttr *Format,
2763+
ArrayRef<const Expr *> Args, bool IsCXXMember,
2764+
VariadicCallType CallType, SourceLocation Loc,
2765+
SourceRange Range,
2766+
llvm::SmallBitVector &CheckedVarArgs);
27362767
bool CheckFormatArguments(ArrayRef<const Expr *> Args,
2737-
FormatArgumentPassingKind FAPK, unsigned format_idx,
2738-
unsigned firstDataArg, FormatStringType Type,
2739-
VariadicCallType CallType, SourceLocation Loc,
2740-
SourceRange range,
2768+
FormatArgumentPassingKind FAPK,
2769+
const StringLiteral *ReferenceFormatString,
2770+
unsigned format_idx, unsigned firstDataArg,
2771+
FormatStringType Type, VariadicCallType CallType,
2772+
SourceLocation Loc, SourceRange range,
27412773
llvm::SmallBitVector &CheckedVarArgs);
27422774

27432775
void CheckInfNaNFunction(const CallExpr *Call, const FunctionDecl *FDecl);
@@ -4727,6 +4759,11 @@ class Sema final : public SemaBase {
47274759
FormatAttr *mergeFormatAttr(Decl *D, const AttributeCommonInfo &CI,
47284760
IdentifierInfo *Format, int FormatIdx,
47294761
int FirstArg);
4762+
FormatMatchesAttr *mergeFormatMatchesAttr(Decl *D,
4763+
const AttributeCommonInfo &CI,
4764+
IdentifierInfo *Format,
4765+
int FormatIdx,
4766+
StringLiteral *FormatStr);
47304767

47314768
/// AddAlignedAttr - Adds an aligned attribute to a particular declaration.
47324769
void AddAlignedAttr(Decl *D, const AttributeCommonInfo &CI, Expr *E,

clang/lib/AST/AttrImpl.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,4 +296,8 @@ bool clang::isValidPointerAttrType(QualType T, bool RefOkay) {
296296
return T->isAnyPointerType() || T->isBlockPointerType();
297297
}
298298

299+
StringLiteral *FormatMatchesAttr::getFormatString() const {
300+
return cast<StringLiteral>(getExpectedFormat());
301+
}
302+
299303
#include "clang/AST/AttrImpl.inc"

0 commit comments

Comments
 (0)