Skip to content

adds additional information to the ProcessInfo object for elf processes #88995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 18, 2024

Conversation

feg208
Copy link
Contributor

@feg208 feg208 commented Apr 16, 2024

This adds some additional bits into a ProcessInfo structure that will be of use in filling structs in an elf core file. This is a demand for implementing process save-core

@feg208 feg208 requested a review from JDevlieghere as a code owner April 16, 2024 21:55
@llvmbot llvmbot added the lldb label Apr 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2024

@llvm/pr-subscribers-lldb

Author: Fred Grim (feg208)

Changes

This adds some additional bits into a ProcessInfo structure that will be of use in filling structs in an elf core file. This is a demand for implementing process save-core


Full diff: https://github.com/llvm/llvm-project/pull/88995.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Utility/ProcessInfo.h (+71)
  • (modified) lldb/source/Host/linux/Host.cpp (+105-20)
  • (modified) lldb/unittests/Host/linux/HostTest.cpp (+6)
diff --git a/lldb/include/lldb/Utility/ProcessInfo.h b/lldb/include/lldb/Utility/ProcessInfo.h
index 7fb5b37be0f48f..e9fe71e1b851d1 100644
--- a/lldb/include/lldb/Utility/ProcessInfo.h
+++ b/lldb/include/lldb/Utility/ProcessInfo.h
@@ -139,6 +139,11 @@ class ProcessInfo {
 // to that process.
 class ProcessInstanceInfo : public ProcessInfo {
 public:
+  struct timespec {
+    time_t tv_sec = 0;
+    long int tv_usec = 0;
+  };
+
   ProcessInstanceInfo() = default;
 
   ProcessInstanceInfo(const char *name, const ArchSpec &arch, lldb::pid_t pid)
@@ -172,6 +177,66 @@ class ProcessInstanceInfo : public ProcessInfo {
     return m_parent_pid != LLDB_INVALID_PROCESS_ID;
   }
 
+  lldb::pid_t GetProcessGroupID() const { return m_process_group_id; }
+
+  void SetProcessGroupID(lldb::pid_t pgrp) { m_process_group_id = pgrp; }
+
+  bool ProcessGroupIDIsValid() const {
+    return m_process_group_id != LLDB_INVALID_PROCESS_ID;
+  }
+
+  lldb::pid_t GetProcessSessionID() const { return m_process_session_id; }
+
+  void SetProcessSessionID(lldb::pid_t session) {
+    m_process_session_id = session;
+  }
+
+  bool ProcessSessionIDIsValid() const {
+    return m_process_session_id != LLDB_INVALID_PROCESS_ID;
+  }
+
+  struct timespec GetUserTime() const { return m_user_time; }
+
+  void SetUserTime(struct timespec utime) { m_user_time = utime; }
+
+  bool UserTimeIsValid() const {
+    return m_user_time.tv_sec > 0 || m_user_time.tv_usec > 0;
+  }
+
+  struct timespec GetSystemTime() const { return m_system_time; }
+
+  void SetSystemTime(struct timespec stime) { m_system_time = stime; }
+
+  bool SystemTimeIsValid() const {
+    return m_system_time.tv_sec > 0 || m_system_time.tv_usec > 0;
+  }
+
+  struct timespec GetCumulativeUserTime() const {
+    return m_cumulative_user_time;
+  }
+
+  void SetCumulativeUserTime(struct timespec cutime) {
+    m_cumulative_user_time = cutime;
+  }
+
+  bool CumulativeUserTimeIsValid() const {
+    return m_cumulative_user_time.tv_sec > 0 ||
+           m_cumulative_user_time.tv_usec > 0;
+  }
+
+  struct timespec GetCumulativeSystemTime() const {
+    return m_cumulative_system_time;
+  }
+
+  void SetCumulativeSystemTime(struct timespec cstime) {
+    m_cumulative_system_time = cstime;
+  }
+
+  bool CumulativeSystemTimeIsValid() const {
+    return m_cumulative_system_time.tv_sec > 0 ||
+           m_cumulative_system_time.tv_sec > 0;
+  }
+
   void Dump(Stream &s, UserIDResolver &resolver) const;
 
   static void DumpTableHeader(Stream &s, bool show_args, bool verbose);
@@ -183,6 +248,12 @@ class ProcessInstanceInfo : public ProcessInfo {
   uint32_t m_euid = UINT32_MAX;
   uint32_t m_egid = UINT32_MAX;
   lldb::pid_t m_parent_pid = LLDB_INVALID_PROCESS_ID;
+  lldb::pid_t m_process_group_id = LLDB_INVALID_PROCESS_ID;
+  lldb::pid_t m_process_session_id = LLDB_INVALID_PROCESS_ID;
+  struct timespec m_user_time {};
+  struct timespec m_system_time {};
+  struct timespec m_cumulative_user_time {};
+  struct timespec m_cumulative_system_time {};
 };
 
 typedef std::vector<ProcessInstanceInfo> ProcessInstanceInfoList;
diff --git a/lldb/source/Host/linux/Host.cpp b/lldb/source/Host/linux/Host.cpp
index 6c57384aa38a13..c6490f2fc9e2f5 100644
--- a/lldb/source/Host/linux/Host.cpp
+++ b/lldb/source/Host/linux/Host.cpp
@@ -49,6 +49,29 @@ enum class ProcessState {
   TracedOrStopped,
   Zombie,
 };
+
+constexpr int task_comm_len = 16;
+
+struct StatFields {
+  ::pid_t pid = LLDB_INVALID_PROCESS_ID;
+  char comm[task_comm_len];
+  char state;
+  ::pid_t ppid = LLDB_INVALID_PROCESS_ID;
+  ::pid_t pgrp = LLDB_INVALID_PROCESS_ID;
+  ::pid_t session = LLDB_INVALID_PROCESS_ID;
+  int tty_nr;
+  int tpgid;
+  unsigned flags;
+  long unsigned minflt;
+  long unsigned cminflt;
+  long unsigned majflt;
+  long unsigned cmajflt;
+  long unsigned utime;
+  long unsigned stime;
+  long cutime;
+  long cstime;
+  // .... other things. We don't need them below
+};
 }
 
 namespace lldb_private {
@@ -60,11 +83,92 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
                           ::pid_t &Tgid) {
   Log *log = GetLog(LLDBLog::Host);
 
-  auto BufferOrError = getProcFile(Pid, "status");
+  auto BufferOrError = getProcFile(Pid, "stat");
   if (!BufferOrError)
     return false;
 
   llvm::StringRef Rest = BufferOrError.get()->getBuffer();
+  if (Rest.empty())
+    return false;
+  StatFields stat_fields;
+  if (sscanf(Rest.data(),
+             "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld",
+             &stat_fields.pid, stat_fields.comm, &stat_fields.state,
+             &stat_fields.ppid, &stat_fields.pgrp, &stat_fields.session,
+             &stat_fields.tty_nr, &stat_fields.tpgid, &stat_fields.flags,
+             &stat_fields.minflt, &stat_fields.cminflt, &stat_fields.majflt,
+             &stat_fields.cmajflt, &stat_fields.utime, &stat_fields.stime,
+             &stat_fields.cutime, &stat_fields.cstime) < 0) {
+    return false;
+  }
+
+  auto convert = [sc_clk_ticks = sysconf(_SC_CLK_TCK)](auto time_in_ticks) {
+    ProcessInstanceInfo::timespec ts;
+    if (sc_clk_ticks <= 0) {
+      return ts;
+    }
+    ts.tv_sec = time_in_ticks / sc_clk_ticks;
+    double remainder =
+        (static_cast<double>(time_in_ticks) / sc_clk_ticks) - ts.tv_sec;
+    ts.tv_usec =
+        std::chrono::microseconds{std::lround(1e+6 * remainder)}.count();
+    return ts;
+  };
+
+  ProcessInfo.SetParentProcessID(stat_fields.ppid);
+  ProcessInfo.SetProcessGroupID(stat_fields.pgrp);
+  ProcessInfo.SetProcessSessionID(stat_fields.session);
+  ProcessInfo.SetUserTime(convert(stat_fields.utime));
+  ProcessInfo.SetSystemTime(convert(stat_fields.stime));
+  ProcessInfo.SetCumulativeUserTime(convert(stat_fields.cutime));
+  ProcessInfo.SetCumulativeSystemTime(convert(stat_fields.cstime));
+  switch (stat_fields.state) {
+  case 'R':
+    State = ProcessState::Running;
+    break;
+  case 'S':
+    State = ProcessState::Sleeping;
+    break;
+  case 'D':
+    State = ProcessState::DiskSleep;
+    break;
+  case 'Z':
+    State = ProcessState::Zombie;
+    break;
+  case 'X':
+    State = ProcessState::Dead;
+    break;
+  case 'P':
+    State = ProcessState::Parked;
+    break;
+  case 'W':
+    State = ProcessState::Paging;
+    break;
+  case 'I':
+    State = ProcessState::Idle;
+    break;
+  case 'T': // Stopped on a signal or (before Linux 2.6.33) trace stopped
+    [[fallthrough]];
+  case 't':
+    State = ProcessState::TracedOrStopped;
+    break;
+  default:
+    State = ProcessState::Unknown;
+    break;
+  }
+
+  if (State == ProcessState::Unknown) {
+    LLDB_LOG(log, "Unknown process state {0}", stat_fields.state);
+  }
+
+  BufferOrError = getProcFile(Pid, "status");
+  if (!BufferOrError)
+    return false;
+
+  Rest = BufferOrError.get()->getBuffer();
+  if (Rest.empty())
+    return false;
+
   while (!Rest.empty()) {
     llvm::StringRef Line;
     std::tie(Line, Rest) = Rest.split('\n');
@@ -89,25 +193,6 @@ static bool GetStatusInfo(::pid_t Pid, ProcessInstanceInfo &ProcessInfo,
 
       ProcessInfo.SetUserID(RUid);
       ProcessInfo.SetEffectiveUserID(EUid);
-    } else if (Line.consume_front("PPid:")) {
-      ::pid_t PPid;
-      Line.ltrim().consumeInteger(10, PPid);
-      ProcessInfo.SetParentProcessID(PPid);
-    } else if (Line.consume_front("State:")) {
-      State = llvm::StringSwitch<ProcessState>(Line.ltrim().take_front(1))
-                  .Case("D", ProcessState::DiskSleep)
-                  .Case("I", ProcessState::Idle)
-                  .Case("R", ProcessState::Running)
-                  .Case("S", ProcessState::Sleeping)
-                  .CaseLower("T", ProcessState::TracedOrStopped)
-                  .Case("W", ProcessState::Paging)
-                  .Case("P", ProcessState::Parked)
-                  .Case("X", ProcessState::Dead)
-                  .Case("Z", ProcessState::Zombie)
-                  .Default(ProcessState::Unknown);
-      if (State == ProcessState::Unknown) {
-        LLDB_LOG(log, "Unknown process state {0}", Line);
-      }
     } else if (Line.consume_front("TracerPid:")) {
       Line = Line.ltrim();
       Line.consumeInteger(10, TracerPid);
diff --git a/lldb/unittests/Host/linux/HostTest.cpp b/lldb/unittests/Host/linux/HostTest.cpp
index 78bbe470d69531..32b1a3678e1fa4 100644
--- a/lldb/unittests/Host/linux/HostTest.cpp
+++ b/lldb/unittests/Host/linux/HostTest.cpp
@@ -40,6 +40,12 @@ TEST_F(HostTest, GetProcessInfo) {
   ASSERT_TRUE(Info.ParentProcessIDIsValid());
   EXPECT_EQ(lldb::pid_t(getppid()), Info.GetParentProcessID());
 
+  ASSERT_TRUE(Info.ProcessGroupIDIsValid());
+  EXPECT_EQ(lldb::pid_t(getpgrp()), Info.GetProcessGroupID());
+
+  ASSERT_TRUE(Info.ProcessSessionIDIsValid());
+  EXPECT_EQ(lldb::pid_t(getsid(getpid())), Info.GetProcessSessionID());
+
   ASSERT_TRUE(Info.EffectiveUserIDIsValid());
   EXPECT_EQ(geteuid(), Info.GetEffectiveUserID());
 

@jimingham
Copy link
Collaborator

These seem like fairly POSIX-y bits of data, is there not a Posix way to get these (so other posix systems will also get this info?)

Also, it looks like you added user time and system time information, but you didn't test that those get valid values.

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

These seem like fairly POSIX-y bits of data, is there not a Posix way to get these (so other posix systems will also get this info?)

Not that I know of for arbitrary processes in linux. fwiw the original code without these values got similar bits out of the proc filesystem

Also, it looks like you added user time and system time information, but you didn't test that those get valid values.

I did not. My concern here is that this would make for flaky tests. The issue is that these times are going to be very dependent on the cpus they are scheduled to, what else is running on that same machine and so on. I guess we could do a bunch of stuff to pin the unit test process in place and make it more controllable but that tends to imply permissions for the host executing the unit test. Or maybe you were thinking of another way? I am all ears if so.

@jimingham
Copy link
Collaborator

jimingham commented Apr 17, 2024 via email

@emaste
Copy link
Member

emaste commented Apr 17, 2024

is there not a Posix way to get these

Not really. On FreeBSD these generally come from sysctl(s).

Of the ones added in this pull request SetParentProcessID is already handled on FreeBSD, while SetProcessGroupID, SetProcessSessionID, SetUserTime, SetSystemTime, SetCumulativeUserTime, SetCumulativeSystemTime are not

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

The user and system times should be monotonically increasing. So you could stop at a breakpoint, fetch the times, then run your program through a little spin loop to burn some CPU before hitting a second breakpoint. Then get the times again and assert that they are > the first set. You could also set a timer in the test between the first and second stop and assert that the difference in system and user time is less than or equal to the timer difference. A single threaded program can only run on one core at a time, so that should always be true.

Ok let me add a test along these lines. I'll run the test a bunch locally and in this pr to make sure I don't make a mess

Copy link

github-actions bot commented Apr 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

The user and system times should be monotonically increasing. So you could stop at a breakpoint, fetch the times, then run your program through a little spin loop to burn some CPU before hitting a second breakpoint. Then get the times again and assert that they are > the first set. You could also set a timer in the test between the first and second stop and assert that the difference in system and user time is less than or equal to the timer difference. A single threaded program can only run on one core at a time, so that should always be true.

I added a test for user time. System time seems really likely to be flaky in the unittest. It'll increase with kernel time. However if I use side-effect free system calls like getpid() to try and increase that counter it seems like that'll move around a lot from different kernels on different machines

@jimingham
Copy link
Collaborator

That test looks good. It seems unlikely that you would buggily always produce monotonically increasing times...

Not to be a pest, but since this info is only provided on Linux, we should maybe add a test to make sure that asking for this information on systems that don't provide it fails gracefully?

@jimingham
Copy link
Collaborator

The user and system times should be monotonically increasing. So you could stop at a breakpoint, fetch the times, then run your program through a little spin loop to burn some CPU before hitting a second breakpoint. Then get the times again and assert that they are > the first set. You could also set a timer in the test between the first and second stop and assert that the difference in system and user time is less than or equal to the timer difference. A single threaded program can only run on one core at a time, so that should always be true.

I added a test for user time. System time seems really likely to be flaky in the unittest. It'll increase with kernel time. However if I use side-effect free system calls like getpid() to try and increase that counter it seems like that'll move around a lot from different kernels on different machines

It should never decrease, however. If you were just getting garbage values, then there should be a roughly even chance the second value will be less than the first. So this still gives some confidence this isn't totally bogus...

@feg208 feg208 force-pushed the users/feg208/adds_process_info_for_core branch from 9871016 to 31c582a Compare April 17, 2024 17:36
@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

That test looks good. It seems unlikely that you would buggily always produce monotonically increasing times...

Not to be a pest, but since this info is only provided on Linux, we should maybe add a test to make sure that asking for this information on systems that don't provide it fails gracefully?

Not at all. I'll add it

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

That test looks good. It seems unlikely that you would buggily always produce monotonically increasing times...

Not to be a pest, but since this info is only provided on Linux, we should maybe add a test to make sure that asking for this information on systems that don't provide it fails gracefully?

Actually looking at lldb/unittest/Host/CMakeLists.txt I see:

if (CMAKE_SYSTEM_NAME MATCHES "Linux|Android")
    list(APPEND FILES
      linux/HostTest.cpp
      linux/SupportTest.cpp
 )
endif()

which should ensure the test is never built nor executed outside of linux

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

It should never decrease, however. If you were just getting garbage values, then there should be a roughly even chance the second value will be less than the first. So this still gives some confidence this isn't totally bogus...

For sure. My concern is that it wouldn't increase either. The granularity is microseconds but with a call like getpid() for example the kernel might just return a cached value which, I assume after enough loops, would increase the timer but what is enough? And enough will change I think as build environments get faster machines faster kernels and so on.

@jimingham
Copy link
Collaborator

jimingham commented Apr 17, 2024 via email

@feg208
Copy link
Contributor Author

feg208 commented Apr 17, 2024

Yes, you'd have to put this test into a host test for a system that doesn't support this, like use the Posix one or make one for the Windows or Darwin Hosts.

Done!

Copy link
Collaborator

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@feg208 feg208 merged commit 4681079 into llvm:main Apr 18, 2024
@JDevlieghere
Copy link
Member

Can we get (a subset) of this information on Darwin? I understand it's only necessary for ELF core files, and from the discussion that there's no standard POSIX way to get this, but it still feels a bit unfortunate to implement all these methods but only implement this for one specific platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants