Add script to fetch PR review comments (#1722)

jonsimantov · google-labs-jules[bot] · web-flow · commit b96909fcbbf2 · 2025-06-09T20:08:13.000Z
* feat: Add script to fetch PR review comments

This commit introduces a new script `scripts/gha/get_pr_review_comments.py`
that allows you to fetch review comments from a specified GitHub Pull Request.
The comments are formatted to include the commenter, file path, line number,
diff hunk, and the comment body, making it easy to paste into me for review.

The script utilizes a new function `get_pull_request_review_comments`
added to the existing `scripts/gha/firebase_github.py` library. This new
function handles fetching line-specific comments from the GitHub API,
including pagination.

The script takes a PR number as a required argument and can optionally
take repository owner, repository name, and GitHub token as arguments,
with the token also being configurable via the GITHUB_TOKEN environment
variable.

* feat: Enhance PR comment script with context and filters

This commit significantly enhances the `scripts/gha/get_pr_review_comments.py`
script and its underlying library function in `scripts/gha/firebase_github.py`.

Key improvements include:
- Copyright year updated to 2025.
- Output now includes comment ID, in_reply_to_id (if applicable), and
  the comment creation timestamp.
- Comments are marked as [OUTDATED] if their diff position is no longer
  current (i.e., API 'position' field is null).
- Added a `--context-lines &lt;N&gt;` argument (default 10) to control the
  amount of diff hunk context displayed. N=0 shows the full hunk.
- Introduced a `--since &lt;ISO_8601_timestamp&gt;` argument to filter
  comments, showing only those created at or after the specified time.
  The `get_pull_request_review_comments` function in the library
  was updated to support this `since` parameter in the API call.

These changes provide more comprehensive comment information and allow
for better control over the data fetched, making it more useful for
reviewing and addressing PR feedback, especially in complex PRs with
multiple review rounds.

* fix: Correct IndentationError in get_pr_review_comments.py

This commit fixes an IndentationError in the
`scripts/gha/get_pr_review_comments.py` script. The error was
caused by a malformed comment on the final print statement within
the main loop.

The stray comment has been removed and the print statement's
newline character has been ensured. This resolves the syntax error
and allows the script to be parsed and executed correctly.

* fix: Correct --context-lines behavior for non-line-specific comments

This commit fixes an issue in `scripts/gha/get_pr_review_comments.py`
where the `--context-lines` argument did not correctly suppress the
full diff hunk for comments not associated with a specific line
(i.e., where the API 'position' field is null).

The `print_contextual_diff_hunk` function has been updated to:
- Print an explanatory message instead of the full diff hunk when
  `--context-lines &gt; 0` and the comment's position is null or invalid.
- Retain the behavior of printing the full diff hunk if
  `--context-lines 0` is specified.
- A redundant line in the context calculation logic was also removed.

This ensures that setting a context limit via `--context-lines`
will not unexpectedly display full diff hunks for file-level or
other non-line-specific comments.

* feat: Simplify diff hunk display and add comment filters

This commit refactors the `scripts/gha/get_pr_review_comments.py` script
to simplify its output and add new filtering capabilities, based on your
feedback.

Key changes:
- Diff Hunk Display: The complex contextual diff hunk display has been
  removed. The script now either shows the full diff hunk (if
  `--context-lines 0`) or the last N lines of the diff hunk (if
  `--context-lines N &gt; 0`). The `print_contextual_diff_hunk` function
  was removed, and this logic is now inline.
- Skip Outdated Comments: A new `--skip-outdated` flag allows you
  to exclude comments marked as [OUTDATED] from the output.
- Line Number Display: For [OUTDATED] comments, the script now
  prefers `original_line` for the "Line in File Diff" field, falling
  back to `line`, then "N/A".
- Metadata: Continues to display comment ID, reply ID, timestamp,
  status, user, file, URL, and body.

These changes aim to make the script easier to maintain and its output
more predictable, while still providing essential information and
filtering options for reviewing PR comments.

* refactor: Update script description and format diff hunks

This commit applies two minor updates to the
`scripts/gha/get_pr_review_comments.py` script:

1.  The script's description in the command-line help (argparse)
    has been made more generic, changing from "format for use with
    me" to "format into a simple text output".
2.  The diff hunk context displayed for each comment is now enclosed
    in triple backticks (```) to ensure it's rendered correctly
    as a preformatted code block in Markdown environments.

These changes improve the script's general usability and the
presentation of its output.

* fix: Adjust 'next command' timestamp increment to 2 seconds

This commit modifies the "suggest next command" feature in
`scripts/gha/get_pr_review_comments.py`. The time added to the
last processed comment's timestamp (for the `--since` parameter
in the suggested command) has been changed from 1 second to 2 seconds.

This adjustment provides a slightly larger buffer to more reliably
exclude already seen comments when fetching subsequent comments,
addressing potential timestamp granularity or query resolution
behavior observed with the GitHub API. The `since` parameter for
the relevant API endpoint filters by `created_at`, and this change
is a heuristic improvement for that existing logic.

* docs: Minor textual cleanups in PR comments script

This commit applies minor textual updates to the
`scripts/gha/get_pr_review_comments.py` script:

- Removed an explanatory comment from the `import firebase_github` line
  for a cleaner import block.
- Refined the script's description in the command-line help text for
  slightly improved conciseness (removed an article "a").

* feat: Format output as Markdown for improved readability

This commit updates the `scripts/gha/get_pr_review_comments.py` script
to format its entire output using Markdown. This significantly
improves the readability and structure of the comment data when
pasted into Markdown-aware systems.

Changes include:
- Comment attribution (user, ID, reply ID) is now an H3 heading
  with bolding and code formatting.
- Metadata (Timestamp, Status, File, Line, URL) is presented as a
  Markdown bulleted list with bold labels and appropriate formatting
  for values (code ticks, links).
- "Diff Hunk Context" and "Comment Body" are now H4 headings.
- The diff hunk itself remains wrapped in triple backticks for
  code block rendering.
- A Markdown horizontal rule (---) is used to separate individual
  comments.

These changes make the script's output more organized and easier
to parse visually.

* style: Adjust Markdown headings for structure and conciseness

This commit refines the Markdown heading structure in the output of
`scripts/gha/get_pr_review_comments.py` for improved readability
and document hierarchy.

Changes include:
- The main output title "Review Comments" is now an H1 heading.
- Each comment's attribution line (user, ID) is now an H2 heading.
- Section headings within each comment, "Context" (formerly "Diff
  Hunk Context") and "Comment" (formerly "Comment Body"), are now
  H3 headings.

These changes make the script's output more organized and easier to
navigate when rendered as Markdown.

* style: Adjust default context lines and Markdown spacing

This commit applies final readability adjustments to the output of
`scripts/gha/get_pr_review_comments.py`:

- The default value for the `--context-lines` argument has been
  changed from 10 to 0. This means the full diff hunk will be
  displayed by default, aligning with your feedback preferring
  more context initially unless otherwise specified. The help text
  for this argument has also been updated.
- Markdown Spacing:
    - An additional newline is added after the main H1 title
      ("# Review Comments") for better separation.
    - A newline is added before the "### Context:" H3 subheading
      to separate it from the metadata list.
    - A newline is added before the "### Comment:" H3 subheading
      to separate it from the diff hunk block.

These changes further refine the script's output for clarity and
your experience.

* feat: Refactor comment filtering with new status terms and flags

This commit introduces a more granular system for classifying and
filtering pull request review comments in the
`scripts/gha/get_pr_review_comments.py` script.

New Comment Statuses:
- `[IRRELEVANT]`: Comment's original diff position is lost (`position`
  is null). Displays `original_line`.
- `[OLD]`: Comment is anchored to the diff, but its line number has
  changed (`line` != `original_line`). Displays current `line`.
- `[CURRENT]`: Comment is anchored and its line number is unchanged.
  Displays current `line`.

New Command-Line Flags:
- `--exclude-old` (default False): If set, hides `[OLD]` comments.
- `--include-irrelevant` (default False): If set, shows `[IRRELEVANT]`
  comments (which are hidden by default).
- The old `--skip-outdated` flag has been removed.

Default Behavior:
- Shows `[CURRENT]` and `[OLD]` comments.
- Hides `[IRRELEVANT]` comments.

This provides you with more precise control over which comments
are displayed, improving the script's utility for various review
workflows. The "suggest next command" feature correctly interacts
with these new filters, only considering non-skipped comments for
its timestamp calculation.

* feat: Improve context display and suggested command robustness

This commit enhances `scripts/gha/get_pr_review_comments.py` in two ways:

1.  Suggested Command: The "suggest next command" feature now
    prepends `sys.executable` to the command. This ensures that the
    suggested command uses the same Python interpreter that the script
    was originally run with, making it more robust across different
    environments or if a specific interpreter was used.

2.  Diff Hunk Context Display:
    - The default for `--context-lines` is now 10 (reverted from 0).
    - When `--context-lines &gt; 0`, the script will first print the
      diff hunk header line (if it starts with "@@ ").
    - It will then print the last N (`args.context_lines`) lines from
      the *remainder* of the hunk. This ensures the header is shown
      for context, and then the trailing lines of that hunk are
      displayed, avoiding double-printing of the header if it would
      have naturally fallen into the "last N lines" of the full hunk.
    - If `--context-lines == 0`, the full hunk is displayed.

* style: Refactor hunk printing to use join for conciseness

This commit makes a minor stylistic refactoring in the
`scripts/gha/get_pr_review_comments.py` script.

When displaying the trailing lines of a diff hunk (for
`--context-lines &gt; 0`), the script now uses `print("\n".join(lines))`
instead of a `for` loop with `print()` for each line.

This change achieves the same visual output but is more concise
and Pythonic for joining and printing a list of strings as
multiple lines.

* fix: Align 'since' filter and next command with observed API behavior (updated_at)

This commit modifies `scripts/gha/get_pr_review_comments.py` to
correctly use `updated_at` timestamps for its `--since` filtering
and the "suggest next command" feature. This aligns with the
observed behavior of the GitHub API endpoint for listing pull
request review comments, where the `since` parameter filters by
update time rather than creation time (contrary to some initial
documentation interpretations for this specific endpoint).

Changes include:
- The "suggest next command" feature now tracks the maximum
  `updated_at` timestamp from processed comments to calculate the
  `--since` value for the next suggested command.
- The help text for the `--since` argument has been updated to
  clarify it filters by "updated at or after".
- The informational message printed to stderr when the `--since`
  filter is active now also states "updated since".
- The `created_at` timestamp continues to be displayed for each
  comment for informational purposes.

* style: Condense printing of trailing hunk lines

This commit makes a minor stylistic refactoring in the
`scripts/gha/get_pr_review_comments.py` script.

When displaying the trailing lines of a diff hunk (for
`--context-lines &gt; 0`, after the header line is potentially
printed and removed from the `hunk_lines` list), the script
now uses `print("\n".join(hunk_lines[-args.context_lines:]))`
instead of explicitly creating a sub-list and then looping
through it with `print()` for each line.

This change achieves the same visual output (printing a newline
if `hunk_lines` becomes empty after header removal) but is more
concise.

* chore: Remove specific stale developer comments

This commit ensures that specific stale developer comments,
previously identified as artifacts of the iterative development
process, are not present in the current version of
`scripts/gha/get_pr_review_comments.py`.

The targeted comments were:
- `# Removed skip_outdated message block`
- `# is_effectively_outdated is no longer needed with the new distinct flags`

A verification step confirmed these are no longer in the script,
contributing to a cleaner codebase focused on comments relevant
only to the current state of the code.

* fix: Ensure removal of specific stale developer comments

This commit ensures that specific stale developer comments,
which were artifacts of the iterative development process,
are definitively removed from the current version of
`scripts/gha/get_pr_review_comments.py`.

The targeted comments were:
- `# Removed skip_outdated message block`
- `# is_effectively_outdated is no longer needed with the new distinct flags`

These lines were confirmed to be absent after a targeted removal
operation, contributing to a cleaner codebase.

---------

Co-authored-by: google-labs-jules[bot] &lt;161369871+google-labs-jules[bot]@users.noreply.github.com&gt;
diff --git a/scripts/gha/firebase_github.py b/scripts/gha/firebase_github.py
@@ -225,6 +225,49 @@ def get_reviews(token, pull_number):
   return results
 
 
+def get_pull_request_review_comments(token, pull_number, since=None):
+  """https://docs.github.com/en/rest/pulls/comments#list-review-comments-on-a-pull-request"""
+  url = f'{GITHUB_API_URL}/pulls/{pull_number}/comments'
+  headers = {'Accept': 'application/vnd.github.v3+json', 'Authorization': f'token {token}'}
+
+  page = 1
+  per_page = 100
+  results = []
+
+  # Base parameters for the API request
+  base_params = {'per_page': per_page}
+  if since:
+    base_params['since'] = since
+
+  while True: # Loop indefinitely until explicitly broken
+    current_page_params = base_params.copy()
+    current_page_params['page'] = page
+
+    try:
+      with requests_retry_session().get(url, headers=headers, params=current_page_params,
+                        stream=True, timeout=TIMEOUT) as response:
+        response.raise_for_status()
+        # Log which page and if 'since' was used for clarity
+        logging.info("get_pull_request_review_comments: %s params %s response: %s", url, current_page_params, response)
+
+        current_page_results = response.json()
+        if not current_page_results: # No more results on this page
+            break # Exit loop, no more comments to fetch
+
+        results.extend(current_page_results)
+
+        # If fewer results than per_page were returned, it's the last page
+        if len(current_page_results) < per_page:
+            break # Exit loop, this was the last page
+
+        page += 1 # Increment page for the next iteration
+
+    except requests.exceptions.RequestException as e:
+      logging.error(f"Error fetching review comments (page {page}, params: {current_page_params}) for PR {pull_number}: {e}")
+      break # Stop trying if there's an error
+  return results
+
+
 def create_workflow_dispatch(token, workflow_id, ref, inputs):
   """https://docs.github.com/en/rest/reference/actions#create-a-workflow-dispatch-event"""
   url = f'{GITHUB_API_URL}/actions/workflows/{workflow_id}/dispatches'
diff --git a/scripts/gha/get_pr_review_comments.py b/scripts/gha/get_pr_review_comments.py
@@ -0,0 +1,231 @@
+#!/usr/bin/env python3
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Fetches and formats review comments from a GitHub Pull Request."""
+
+import argparse
+import os
+import sys
+import firebase_github
+import datetime
+from datetime import timezone, timedelta
+
+
+def main():
+    STATUS_IRRELEVANT = "[IRRELEVANT]"
+    STATUS_OLD = "[OLD]"
+    STATUS_CURRENT = "[CURRENT]"
+
+    default_owner = firebase_github.OWNER
+    default_repo = firebase_github.REPO
+
+    parser = argparse.ArgumentParser(
+        description="Fetch review comments from a GitHub PR and format into simple text output.",
+        formatter_class=argparse.RawTextHelpFormatter
+    )
+    parser.add_argument(
+        "--pull_number",
+        type=int,
+        required=True,
+        help="Pull request number."
+    )
+    parser.add_argument(
+        "--owner",
+        type=str,
+        default=default_owner,
+        help=f"Repository owner. Defaults to '{default_owner}'."
+    )
+    parser.add_argument(
+        "--repo",
+        type=str,
+        default=default_repo,
+        help=f"Repository name. Defaults to '{default_repo}'."
+    )
+    parser.add_argument(
+        "--token",
+        type=str,
+        default=os.environ.get("GITHUB_TOKEN"),
+        help="GitHub token. Can also be set via GITHUB_TOKEN env var."
+    )
+    parser.add_argument(
+        "--context-lines",
+        type=int,
+        default=10,
+        help="Number of context lines from the diff hunk. 0 for full hunk. If > 0, shows header (if any) and last N lines of the remaining hunk. Default: 10."
+    )
+    parser.add_argument(
+        "--since",
+        type=str,
+        default=None,
+        help="Only show comments updated at or after this ISO 8601 timestamp (e.g., YYYY-MM-DDTHH:MM:SSZ)."
+    )
+    parser.add_argument(
+        "--exclude-old",
+        action="store_true",
+        default=False,
+        help="Exclude comments marked [OLD] (where line number has changed due to code updates but position is still valid)."
+    )
+    parser.add_argument(
+        "--include-irrelevant",
+        action="store_true",
+        default=False,
+        help="Include comments marked [IRRELEVANT] (where GitHub can no longer anchor the comment to the diff, i.e., position is null)."
+    )
+
+    args = parser.parse_args()
+
+    if not args.token:
+        sys.stderr.write("Error: GitHub token not provided. Set GITHUB_TOKEN or use --token.\n")
+        sys.exit(1)
+
+    if args.owner != firebase_github.OWNER or args.repo != firebase_github.REPO:
+        repo_url = f"https://github.com/{args.owner}/{args.repo}"
+        if not firebase_github.set_repo_url(repo_url):
+            sys.stderr.write(f"Error: Invalid repo URL: {args.owner}/{args.repo}. Expected https://github.com/owner/repo\n")
+            sys.exit(1)
+        sys.stderr.write(f"Targeting repository: {firebase_github.OWNER}/{firebase_github.REPO}\n")
+
+    sys.stderr.write(f"Fetching comments for PR #{args.pull_number} from {firebase_github.OWNER}/{firebase_github.REPO}...\n")
+    if args.since:
+        sys.stderr.write(f"Filtering comments updated since: {args.since}\n")
+
+
+    comments = firebase_github.get_pull_request_review_comments(
+        args.token,
+        args.pull_number,
+        since=args.since
+    )
+
+    if not comments:
+        sys.stderr.write(f"No review comments found for PR #{args.pull_number} (or matching filters), or an error occurred.\n")
+        return
+
+    latest_activity_timestamp_obj = None
+    processed_comments_count = 0
+    print("# Review Comments\n\n")
+    for comment in comments:
+        created_at_str = comment.get("created_at")
+
+        current_pos = comment.get("position")
+        current_line = comment.get("line")
+        original_line = comment.get("original_line")
+
+        status_text = ""
+        line_to_display = None
+
+        if current_pos is None:
+            status_text = STATUS_IRRELEVANT
+            line_to_display = original_line
+        elif original_line is not None and current_line != original_line:
+            status_text = STATUS_OLD
+            line_to_display = current_line
+        else:
+            status_text = STATUS_CURRENT
+            line_to_display = current_line
+
+        if line_to_display is None:
+            line_to_display = "N/A"
+
+        if status_text == STATUS_IRRELEVANT and not args.include_irrelevant:
+            continue
+        if status_text == STATUS_OLD and args.exclude_old:
+            continue
+
+        # Track latest 'updated_at' for '--since' suggestion; 'created_at' is for display.
+        updated_at_str = comment.get("updated_at")
+        if updated_at_str: # Check if updated_at_str is not None and not empty
+            try:
+                if sys.version_info < (3, 11):
+                    dt_str_updated = updated_at_str.replace("Z", "+00:00")
+                else:
+                    dt_str_updated = updated_at_str
+                current_comment_activity_dt = datetime.datetime.fromisoformat(dt_str_updated)
+                if latest_activity_timestamp_obj is None or current_comment_activity_dt > latest_activity_timestamp_obj:
+                    latest_activity_timestamp_obj = current_comment_activity_dt
+            except ValueError:
+                sys.stderr.write(f"Warning: Could not parse updated_at timestamp: {updated_at_str}\n")
+
+        # Get other comment details
+        user = comment.get("user", {}).get("login", "Unknown user")
+        path = comment.get("path", "N/A")
+        body = comment.get("body", "").strip()
+
+        if not body:
+            continue
+
+        processed_comments_count += 1
+
+        diff_hunk = comment.get("diff_hunk")
+        html_url = comment.get("html_url", "N/A")
+        comment_id = comment.get("id")
+        in_reply_to_id = comment.get("in_reply_to_id")
+
+        print(f"## Comment by: **{user}** (ID: `{comment_id}`){f' (In Reply To: `{in_reply_to_id}`)' if in_reply_to_id else ''}\n")
+        if created_at_str:
+            print(f"*   **Timestamp**: `{created_at_str}`")
+        print(f"*   **Status**: `{status_text}`")
+        print(f"*   **File**: `{path}`")
+        print(f"*   **Line**: `{line_to_display}`")
+        print(f"*   **URL**: <{html_url}>\n")
+
+        print("\n### Context:")
+        print("```") # Start of Markdown code block
+        if diff_hunk and diff_hunk.strip():
+            if args.context_lines == 0: # User wants the full hunk
+                print(diff_hunk)
+            else: # User wants N lines of context (args.context_lines > 0)
+                hunk_lines = diff_hunk.split('\n')
+                if hunk_lines and hunk_lines[0].startswith("@@ "):
+                    print(hunk_lines[0])
+                    hunk_lines = hunk_lines[1:] # Modify list in place for remaining operations
+
+                # Proceed with the (potentially modified) hunk_lines
+                # If hunk_lines is empty here (e.g. original hunk was only a header that was removed),
+                # hunk_lines[-args.context_lines:] will be [], and "\n".join([]) is "",
+                # so print("") will effectively print a newline. This is acceptable.
+                print("\n".join(hunk_lines[-args.context_lines:]))
+        else: # diff_hunk was None or empty
+            print("(No diff hunk available for this comment)")
+        print("```") # End of Markdown code block
+
+        print("\n### Comment:")
+        print(body)
+        print("\n---")
+
+    sys.stderr.write(f"\nPrinted {processed_comments_count} comments to stdout.\n")
+
+    if latest_activity_timestamp_obj:
+        try:
+            # Ensure it's UTC before adding timedelta, then format
+            next_since_dt = latest_activity_timestamp_obj.astimezone(timezone.utc) + timedelta(seconds=2)
+            next_since_str = next_since_dt.strftime('%Y-%m-%dT%H:%M:%SZ')
+
+            new_cmd_args = [sys.executable, sys.argv[0]] # Start with interpreter and script path
+            i = 1 # Start checking from actual arguments in sys.argv
+            while i < len(sys.argv):
+                if sys.argv[i] == "--since":
+                    i += 2 # Skip --since and its value
+                    continue
+                new_cmd_args.append(sys.argv[i])
+                i += 1
+
+            new_cmd_args.extend(["--since", next_since_str])
+            suggested_cmd = " ".join(new_cmd_args)
+            sys.stderr.write(f"\nTo get comments created after the last one in this batch, try:\n{suggested_cmd}\n")
+        except Exception as e:
+            sys.stderr.write(f"\nWarning: Could not generate next command suggestion: {e}\n")
+
+if __name__ == "__main__":
+    main()