Skip to content

[clang-format] Allow specifying the language for .h files #128122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2025

Conversation

owenca
Copy link
Contributor

@owenca owenca commented Feb 21, 2025

Closes #128119

@llvmbot llvmbot added clang Clang issues not falling into any other category clang-format labels Feb 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/pr-subscribers-clang

Author: Owen Pan (owenca)

Changes

Closes #128119


Full diff: https://github.com/llvm/llvm-project/pull/128122.diff

5 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+7-1)
  • (modified) clang/docs/ReleaseNotes.rst (+3)
  • (modified) clang/include/clang/Format/Format.h (+6-1)
  • (modified) clang/lib/Format/Format.cpp (+33)
  • (modified) clang/unittests/Format/FormatTest.cpp (+9)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index bf6dd9e13915f..2d4ead76cfef2 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4782,7 +4782,13 @@ the configuration (without a prefix: ``Auto``).
 .. _Language:
 
 **Language** (``LanguageKind``) :versionbadge:`clang-format 3.5` :ref:`¶ <Language>`
-  Language, this format style is targeted at.
+  The language that this format style targets.
+
+  .. note::
+
+   You can also specify the language (``Cpp`` or ``ObjC``) for ``.h`` files
+   by adding a ``// clang-format Language:`` line before the first
+   non-comment and non-empty line, e.g. ``// clang-format Language: ObjC``.
 
   Possible values:
 
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index e1c61992512b5..0e65f72623f28 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -269,6 +269,9 @@ clang-format
 - Adds ``BreakBeforeTemplateCloser`` option.
 - Adds ``BinPackLongBracedList`` option to override bin packing options in
   long (20 item or more) braced list initializer lists.
+- Allow specifying the language (C++ or Objective-C) for a ``.h`` file by adding
+  a special comment (e.g. ``// clang-format Language: ObjC``) near the top of
+  the file.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 16956b4e0fbd4..55709a0261b12 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3353,7 +3353,12 @@ struct FormatStyle {
   }
   bool isTableGen() const { return Language == LK_TableGen; }
 
-  /// Language, this format style is targeted at.
+  /// The language that this format style targets.
+  /// \note
+  ///  You can also specify the language (``Cpp`` or ``ObjC``) for ``.h`` files
+  ///  by adding a ``// clang-format Language:`` line before the first
+  ///  non-comment and non-empty line, e.g. ``// clang-format Language: ObjC``.
+  /// \endnote
   /// \version 3.5
   LanguageKind Language;
 
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 0898b69528ebc..400f39ecc5483 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -4021,6 +4021,35 @@ static FormatStyle::LanguageKind getLanguageByFileName(StringRef FileName) {
   return FormatStyle::LK_Cpp;
 }
 
+static FormatStyle::LanguageKind getLanguageByComment(const Environment &Env) {
+  const auto ID = Env.getFileID();
+  const auto &SourceMgr = Env.getSourceManager();
+
+  LangOptions LangOpts;
+  LangOpts.CPlusPlus = 1;
+  LangOpts.LineComment = 1;
+
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  for (Token Tok; !Lex.LexFromRawLexer(Tok) && Tok.is(tok::comment);) {
+    auto Text = StringRef(SourceMgr.getCharacterData(Tok.getLocation()),
+                          Tok.getLength());
+    if (!Text.consume_front("// clang-format Language:"))
+      continue;
+
+    Text = Text.trim();
+    // if (Text == "C")
+    //   return FormatStyle::LK_C;
+    if (Text == "Cpp")
+      return FormatStyle::LK_Cpp;
+    if (Text == "ObjC")
+      return FormatStyle::LK_ObjC;
+  }
+
+  return FormatStyle::LK_None;
+}
+
 FormatStyle::LanguageKind guessLanguage(StringRef FileName, StringRef Code) {
   const auto GuessedLanguage = getLanguageByFileName(FileName);
   if (GuessedLanguage == FormatStyle::LK_Cpp) {
@@ -4030,6 +4059,10 @@ FormatStyle::LanguageKind guessLanguage(StringRef FileName, StringRef Code) {
     if (!Code.empty() && (Extension.empty() || Extension == ".h")) {
       auto NonEmptyFileName = FileName.empty() ? "guess.h" : FileName;
       Environment Env(Code, NonEmptyFileName, /*Ranges=*/{});
+      if (const auto Language = getLanguageByComment(Env);
+          Language != FormatStyle::LK_None) {
+        return Language;
+      }
       ObjCHeaderStyleGuesser Guesser(Env, getLLVMStyle());
       Guesser.process();
       if (Guesser.isObjC())
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index 132264486100d..05febf12c17ba 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -25136,6 +25136,15 @@ TEST_F(FormatTest, GuessLanguageWithChildLines) {
       guessLanguage("foo.h", "#define FOO ({ foo(); ({ NSString *s; }) })"));
 }
 
+TEST_F(FormatTest, GetLanguageByComment) {
+  EXPECT_EQ(FormatStyle::LK_Cpp,
+            guessLanguage("foo.h", "// clang-format Language: Cpp\n"
+                                   "int DoStuff(CGRect rect);"));
+  EXPECT_EQ(FormatStyle::LK_ObjC,
+            guessLanguage("foo.h", "// clang-format Language: ObjC\n"
+                                   "int i;"));
+}
+
 TEST_F(FormatTest, TypenameMacros) {
   std::vector<std::string> TypenameMacros = {"STACK_OF", "LIST", "TAILQ_ENTRY"};
 

@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Closes #128119


Full diff: https://github.com/llvm/llvm-project/pull/128122.diff

5 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+7-1)
  • (modified) clang/docs/ReleaseNotes.rst (+3)
  • (modified) clang/include/clang/Format/Format.h (+6-1)
  • (modified) clang/lib/Format/Format.cpp (+33)
  • (modified) clang/unittests/Format/FormatTest.cpp (+9)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index bf6dd9e13915f..2d4ead76cfef2 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4782,7 +4782,13 @@ the configuration (without a prefix: ``Auto``).
 .. _Language:
 
 **Language** (``LanguageKind``) :versionbadge:`clang-format 3.5` :ref:`¶ <Language>`
-  Language, this format style is targeted at.
+  The language that this format style targets.
+
+  .. note::
+
+   You can also specify the language (``Cpp`` or ``ObjC``) for ``.h`` files
+   by adding a ``// clang-format Language:`` line before the first
+   non-comment and non-empty line, e.g. ``// clang-format Language: ObjC``.
 
   Possible values:
 
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index e1c61992512b5..0e65f72623f28 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -269,6 +269,9 @@ clang-format
 - Adds ``BreakBeforeTemplateCloser`` option.
 - Adds ``BinPackLongBracedList`` option to override bin packing options in
   long (20 item or more) braced list initializer lists.
+- Allow specifying the language (C++ or Objective-C) for a ``.h`` file by adding
+  a special comment (e.g. ``// clang-format Language: ObjC``) near the top of
+  the file.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 16956b4e0fbd4..55709a0261b12 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3353,7 +3353,12 @@ struct FormatStyle {
   }
   bool isTableGen() const { return Language == LK_TableGen; }
 
-  /// Language, this format style is targeted at.
+  /// The language that this format style targets.
+  /// \note
+  ///  You can also specify the language (``Cpp`` or ``ObjC``) for ``.h`` files
+  ///  by adding a ``// clang-format Language:`` line before the first
+  ///  non-comment and non-empty line, e.g. ``// clang-format Language: ObjC``.
+  /// \endnote
   /// \version 3.5
   LanguageKind Language;
 
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 0898b69528ebc..400f39ecc5483 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -4021,6 +4021,35 @@ static FormatStyle::LanguageKind getLanguageByFileName(StringRef FileName) {
   return FormatStyle::LK_Cpp;
 }
 
+static FormatStyle::LanguageKind getLanguageByComment(const Environment &Env) {
+  const auto ID = Env.getFileID();
+  const auto &SourceMgr = Env.getSourceManager();
+
+  LangOptions LangOpts;
+  LangOpts.CPlusPlus = 1;
+  LangOpts.LineComment = 1;
+
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  for (Token Tok; !Lex.LexFromRawLexer(Tok) && Tok.is(tok::comment);) {
+    auto Text = StringRef(SourceMgr.getCharacterData(Tok.getLocation()),
+                          Tok.getLength());
+    if (!Text.consume_front("// clang-format Language:"))
+      continue;
+
+    Text = Text.trim();
+    // if (Text == "C")
+    //   return FormatStyle::LK_C;
+    if (Text == "Cpp")
+      return FormatStyle::LK_Cpp;
+    if (Text == "ObjC")
+      return FormatStyle::LK_ObjC;
+  }
+
+  return FormatStyle::LK_None;
+}
+
 FormatStyle::LanguageKind guessLanguage(StringRef FileName, StringRef Code) {
   const auto GuessedLanguage = getLanguageByFileName(FileName);
   if (GuessedLanguage == FormatStyle::LK_Cpp) {
@@ -4030,6 +4059,10 @@ FormatStyle::LanguageKind guessLanguage(StringRef FileName, StringRef Code) {
     if (!Code.empty() && (Extension.empty() || Extension == ".h")) {
       auto NonEmptyFileName = FileName.empty() ? "guess.h" : FileName;
       Environment Env(Code, NonEmptyFileName, /*Ranges=*/{});
+      if (const auto Language = getLanguageByComment(Env);
+          Language != FormatStyle::LK_None) {
+        return Language;
+      }
       ObjCHeaderStyleGuesser Guesser(Env, getLLVMStyle());
       Guesser.process();
       if (Guesser.isObjC())
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index 132264486100d..05febf12c17ba 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -25136,6 +25136,15 @@ TEST_F(FormatTest, GuessLanguageWithChildLines) {
       guessLanguage("foo.h", "#define FOO ({ foo(); ({ NSString *s; }) })"));
 }
 
+TEST_F(FormatTest, GetLanguageByComment) {
+  EXPECT_EQ(FormatStyle::LK_Cpp,
+            guessLanguage("foo.h", "// clang-format Language: Cpp\n"
+                                   "int DoStuff(CGRect rect);"));
+  EXPECT_EQ(FormatStyle::LK_ObjC,
+            guessLanguage("foo.h", "// clang-format Language: ObjC\n"
+                                   "int i;"));
+}
+
 TEST_F(FormatTest, TypenameMacros) {
   std::vector<std::string> TypenameMacros = {"STACK_OF", "LIST", "TAILQ_ENTRY"};
 

Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should help, I wonder if there is a equivalent of this (but for C/ObjC) that we can support

# -*-Mode: Makefile;-*-

@mydeveloperday
Copy link
Contributor

a bit old school..

/* -*- Mode: C++; -*- */
/* -*- Mode: C; -*- */
/* -*- Mode: objc; -*- */

@owenca
Copy link
Contributor Author

owenca commented Feb 21, 2025

a bit old school..

/* -*- Mode: C++; -*- */
/* -*- Mode: C; -*- */
/* -*- Mode: objc; -*- */

We support the following now:

// clang-format off
// clang-format off: reason
// clang-format on
// clang-format on: reason
/* clang-format off */
/* clang-format on */

and in configuration:

Language: Cpp
Language: ObjC
# After adding LK_C:
# Language: C

So to make it simple and consistent, I chose // clang-format Language: ObjC, etc. IMO, we must include // clang-format as a prefix in order to not interfere with comment directives from other tools like clang-tidy and lint.

@mydeveloperday
Copy link
Contributor

I think

// clang-format Language: ObjC

is fine, I just wondered if we wanted to support the Emacs mode as well.. (maybe a later commit if someone specifically asks)

@owenca
Copy link
Contributor Author

owenca commented Feb 22, 2025

I think

// clang-format Language: ObjC

is fine, I just wondered if we wanted to support the Emacs mode as well.. (maybe a later commit if someone specifically asks)

Got it, but nah.

@owenca owenca removed the clang Clang issues not falling into any other category label Feb 22, 2025
@owenca owenca merged commit ffc61dc into llvm:main Feb 22, 2025
12 checks passed
@owenca owenca deleted the clang-format-language branch February 22, 2025 04:46
@llvm llvm deleted a comment from llvm-ci Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow specifying the language in C++ and Objective-C header files
4 participants