Skip to content

Commit b96909f

Browse files
Add script to fetch PR review comments (#1722)
* feat: Add script to fetch PR review comments This commit introduces a new script `scripts/gha/get_pr_review_comments.py` that allows you to fetch review comments from a specified GitHub Pull Request. The comments are formatted to include the commenter, file path, line number, diff hunk, and the comment body, making it easy to paste into me for review. The script utilizes a new function `get_pull_request_review_comments` added to the existing `scripts/gha/firebase_github.py` library. This new function handles fetching line-specific comments from the GitHub API, including pagination. The script takes a PR number as a required argument and can optionally take repository owner, repository name, and GitHub token as arguments, with the token also being configurable via the GITHUB_TOKEN environment variable. * feat: Enhance PR comment script with context and filters This commit significantly enhances the `scripts/gha/get_pr_review_comments.py` script and its underlying library function in `scripts/gha/firebase_github.py`. Key improvements include: - Copyright year updated to 2025. - Output now includes comment ID, in_reply_to_id (if applicable), and the comment creation timestamp. - Comments are marked as [OUTDATED] if their diff position is no longer current (i.e., API 'position' field is null). - Added a `--context-lines <N>` argument (default 10) to control the amount of diff hunk context displayed. N=0 shows the full hunk. - Introduced a `--since <ISO_8601_timestamp>` argument to filter comments, showing only those created at or after the specified time. The `get_pull_request_review_comments` function in the library was updated to support this `since` parameter in the API call. These changes provide more comprehensive comment information and allow for better control over the data fetched, making it more useful for reviewing and addressing PR feedback, especially in complex PRs with multiple review rounds. * fix: Correct IndentationError in get_pr_review_comments.py This commit fixes an IndentationError in the `scripts/gha/get_pr_review_comments.py` script. The error was caused by a malformed comment on the final print statement within the main loop. The stray comment has been removed and the print statement's newline character has been ensured. This resolves the syntax error and allows the script to be parsed and executed correctly. * fix: Correct --context-lines behavior for non-line-specific comments This commit fixes an issue in `scripts/gha/get_pr_review_comments.py` where the `--context-lines` argument did not correctly suppress the full diff hunk for comments not associated with a specific line (i.e., where the API 'position' field is null). The `print_contextual_diff_hunk` function has been updated to: - Print an explanatory message instead of the full diff hunk when `--context-lines > 0` and the comment's position is null or invalid. - Retain the behavior of printing the full diff hunk if `--context-lines 0` is specified. - A redundant line in the context calculation logic was also removed. This ensures that setting a context limit via `--context-lines` will not unexpectedly display full diff hunks for file-level or other non-line-specific comments. * feat: Simplify diff hunk display and add comment filters This commit refactors the `scripts/gha/get_pr_review_comments.py` script to simplify its output and add new filtering capabilities, based on your feedback. Key changes: - Diff Hunk Display: The complex contextual diff hunk display has been removed. The script now either shows the full diff hunk (if `--context-lines 0`) or the last N lines of the diff hunk (if `--context-lines N > 0`). The `print_contextual_diff_hunk` function was removed, and this logic is now inline. - Skip Outdated Comments: A new `--skip-outdated` flag allows you to exclude comments marked as [OUTDATED] from the output. - Line Number Display: For [OUTDATED] comments, the script now prefers `original_line` for the "Line in File Diff" field, falling back to `line`, then "N/A". - Metadata: Continues to display comment ID, reply ID, timestamp, status, user, file, URL, and body. These changes aim to make the script easier to maintain and its output more predictable, while still providing essential information and filtering options for reviewing PR comments. * refactor: Update script description and format diff hunks This commit applies two minor updates to the `scripts/gha/get_pr_review_comments.py` script: 1. The script's description in the command-line help (argparse) has been made more generic, changing from "format for use with me" to "format into a simple text output". 2. The diff hunk context displayed for each comment is now enclosed in triple backticks (```) to ensure it's rendered correctly as a preformatted code block in Markdown environments. These changes improve the script's general usability and the presentation of its output. * fix: Adjust 'next command' timestamp increment to 2 seconds This commit modifies the "suggest next command" feature in `scripts/gha/get_pr_review_comments.py`. The time added to the last processed comment's timestamp (for the `--since` parameter in the suggested command) has been changed from 1 second to 2 seconds. This adjustment provides a slightly larger buffer to more reliably exclude already seen comments when fetching subsequent comments, addressing potential timestamp granularity or query resolution behavior observed with the GitHub API. The `since` parameter for the relevant API endpoint filters by `created_at`, and this change is a heuristic improvement for that existing logic. * docs: Minor textual cleanups in PR comments script This commit applies minor textual updates to the `scripts/gha/get_pr_review_comments.py` script: - Removed an explanatory comment from the `import firebase_github` line for a cleaner import block. - Refined the script's description in the command-line help text for slightly improved conciseness (removed an article "a"). * feat: Format output as Markdown for improved readability This commit updates the `scripts/gha/get_pr_review_comments.py` script to format its entire output using Markdown. This significantly improves the readability and structure of the comment data when pasted into Markdown-aware systems. Changes include: - Comment attribution (user, ID, reply ID) is now an H3 heading with bolding and code formatting. - Metadata (Timestamp, Status, File, Line, URL) is presented as a Markdown bulleted list with bold labels and appropriate formatting for values (code ticks, links). - "Diff Hunk Context" and "Comment Body" are now H4 headings. - The diff hunk itself remains wrapped in triple backticks for code block rendering. - A Markdown horizontal rule (---) is used to separate individual comments. These changes make the script's output more organized and easier to parse visually. * style: Adjust Markdown headings for structure and conciseness This commit refines the Markdown heading structure in the output of `scripts/gha/get_pr_review_comments.py` for improved readability and document hierarchy. Changes include: - The main output title "Review Comments" is now an H1 heading. - Each comment's attribution line (user, ID) is now an H2 heading. - Section headings within each comment, "Context" (formerly "Diff Hunk Context") and "Comment" (formerly "Comment Body"), are now H3 headings. These changes make the script's output more organized and easier to navigate when rendered as Markdown. * style: Adjust default context lines and Markdown spacing This commit applies final readability adjustments to the output of `scripts/gha/get_pr_review_comments.py`: - The default value for the `--context-lines` argument has been changed from 10 to 0. This means the full diff hunk will be displayed by default, aligning with your feedback preferring more context initially unless otherwise specified. The help text for this argument has also been updated. - Markdown Spacing: - An additional newline is added after the main H1 title ("# Review Comments") for better separation. - A newline is added before the "### Context:" H3 subheading to separate it from the metadata list. - A newline is added before the "### Comment:" H3 subheading to separate it from the diff hunk block. These changes further refine the script's output for clarity and your experience. * feat: Refactor comment filtering with new status terms and flags This commit introduces a more granular system for classifying and filtering pull request review comments in the `scripts/gha/get_pr_review_comments.py` script. New Comment Statuses: - `[IRRELEVANT]`: Comment's original diff position is lost (`position` is null). Displays `original_line`. - `[OLD]`: Comment is anchored to the diff, but its line number has changed (`line` != `original_line`). Displays current `line`. - `[CURRENT]`: Comment is anchored and its line number is unchanged. Displays current `line`. New Command-Line Flags: - `--exclude-old` (default False): If set, hides `[OLD]` comments. - `--include-irrelevant` (default False): If set, shows `[IRRELEVANT]` comments (which are hidden by default). - The old `--skip-outdated` flag has been removed. Default Behavior: - Shows `[CURRENT]` and `[OLD]` comments. - Hides `[IRRELEVANT]` comments. This provides you with more precise control over which comments are displayed, improving the script's utility for various review workflows. The "suggest next command" feature correctly interacts with these new filters, only considering non-skipped comments for its timestamp calculation. * feat: Improve context display and suggested command robustness This commit enhances `scripts/gha/get_pr_review_comments.py` in two ways: 1. Suggested Command: The "suggest next command" feature now prepends `sys.executable` to the command. This ensures that the suggested command uses the same Python interpreter that the script was originally run with, making it more robust across different environments or if a specific interpreter was used. 2. Diff Hunk Context Display: - The default for `--context-lines` is now 10 (reverted from 0). - When `--context-lines > 0`, the script will first print the diff hunk header line (if it starts with "@@ "). - It will then print the last N (`args.context_lines`) lines from the *remainder* of the hunk. This ensures the header is shown for context, and then the trailing lines of that hunk are displayed, avoiding double-printing of the header if it would have naturally fallen into the "last N lines" of the full hunk. - If `--context-lines == 0`, the full hunk is displayed. * style: Refactor hunk printing to use join for conciseness This commit makes a minor stylistic refactoring in the `scripts/gha/get_pr_review_comments.py` script. When displaying the trailing lines of a diff hunk (for `--context-lines > 0`), the script now uses `print("\n".join(lines))` instead of a `for` loop with `print()` for each line. This change achieves the same visual output but is more concise and Pythonic for joining and printing a list of strings as multiple lines. * fix: Align 'since' filter and next command with observed API behavior (updated_at) This commit modifies `scripts/gha/get_pr_review_comments.py` to correctly use `updated_at` timestamps for its `--since` filtering and the "suggest next command" feature. This aligns with the observed behavior of the GitHub API endpoint for listing pull request review comments, where the `since` parameter filters by update time rather than creation time (contrary to some initial documentation interpretations for this specific endpoint). Changes include: - The "suggest next command" feature now tracks the maximum `updated_at` timestamp from processed comments to calculate the `--since` value for the next suggested command. - The help text for the `--since` argument has been updated to clarify it filters by "updated at or after". - The informational message printed to stderr when the `--since` filter is active now also states "updated since". - The `created_at` timestamp continues to be displayed for each comment for informational purposes. * style: Condense printing of trailing hunk lines This commit makes a minor stylistic refactoring in the `scripts/gha/get_pr_review_comments.py` script. When displaying the trailing lines of a diff hunk (for `--context-lines > 0`, after the header line is potentially printed and removed from the `hunk_lines` list), the script now uses `print("\n".join(hunk_lines[-args.context_lines:]))` instead of explicitly creating a sub-list and then looping through it with `print()` for each line. This change achieves the same visual output (printing a newline if `hunk_lines` becomes empty after header removal) but is more concise. * chore: Remove specific stale developer comments This commit ensures that specific stale developer comments, previously identified as artifacts of the iterative development process, are not present in the current version of `scripts/gha/get_pr_review_comments.py`. The targeted comments were: - `# Removed skip_outdated message block` - `# is_effectively_outdated is no longer needed with the new distinct flags` A verification step confirmed these are no longer in the script, contributing to a cleaner codebase focused on comments relevant only to the current state of the code. * fix: Ensure removal of specific stale developer comments This commit ensures that specific stale developer comments, which were artifacts of the iterative development process, are definitively removed from the current version of `scripts/gha/get_pr_review_comments.py`. The targeted comments were: - `# Removed skip_outdated message block` - `# is_effectively_outdated is no longer needed with the new distinct flags` These lines were confirmed to be absent after a targeted removal operation, contributing to a cleaner codebase. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
1 parent a06d206 commit b96909f

File tree

2 files changed

+274
-0
lines changed

2 files changed

+274
-0
lines changed

scripts/gha/firebase_github.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,49 @@ def get_reviews(token, pull_number):
225225
return results
226226

227227

228+
def get_pull_request_review_comments(token, pull_number, since=None):
229+
"""https://docs.github.com/en/rest/pulls/comments#list-review-comments-on-a-pull-request"""
230+
url = f'{GITHUB_API_URL}/pulls/{pull_number}/comments'
231+
headers = {'Accept': 'application/vnd.github.v3+json', 'Authorization': f'token {token}'}
232+
233+
page = 1
234+
per_page = 100
235+
results = []
236+
237+
# Base parameters for the API request
238+
base_params = {'per_page': per_page}
239+
if since:
240+
base_params['since'] = since
241+
242+
while True: # Loop indefinitely until explicitly broken
243+
current_page_params = base_params.copy()
244+
current_page_params['page'] = page
245+
246+
try:
247+
with requests_retry_session().get(url, headers=headers, params=current_page_params,
248+
stream=True, timeout=TIMEOUT) as response:
249+
response.raise_for_status()
250+
# Log which page and if 'since' was used for clarity
251+
logging.info("get_pull_request_review_comments: %s params %s response: %s", url, current_page_params, response)
252+
253+
current_page_results = response.json()
254+
if not current_page_results: # No more results on this page
255+
break # Exit loop, no more comments to fetch
256+
257+
results.extend(current_page_results)
258+
259+
# If fewer results than per_page were returned, it's the last page
260+
if len(current_page_results) < per_page:
261+
break # Exit loop, this was the last page
262+
263+
page += 1 # Increment page for the next iteration
264+
265+
except requests.exceptions.RequestException as e:
266+
logging.error(f"Error fetching review comments (page {page}, params: {current_page_params}) for PR {pull_number}: {e}")
267+
break # Stop trying if there's an error
268+
return results
269+
270+
228271
def create_workflow_dispatch(token, workflow_id, ref, inputs):
229272
"""https://docs.github.com/en/rest/reference/actions#create-a-workflow-dispatch-event"""
230273
url = f'{GITHUB_API_URL}/actions/workflows/{workflow_id}/dispatches'

scripts/gha/get_pr_review_comments.py

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
#!/usr/bin/env python3
2+
# Copyright 2025 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Fetches and formats review comments from a GitHub Pull Request."""
17+
18+
import argparse
19+
import os
20+
import sys
21+
import firebase_github
22+
import datetime
23+
from datetime import timezone, timedelta
24+
25+
26+
def main():
27+
STATUS_IRRELEVANT = "[IRRELEVANT]"
28+
STATUS_OLD = "[OLD]"
29+
STATUS_CURRENT = "[CURRENT]"
30+
31+
default_owner = firebase_github.OWNER
32+
default_repo = firebase_github.REPO
33+
34+
parser = argparse.ArgumentParser(
35+
description="Fetch review comments from a GitHub PR and format into simple text output.",
36+
formatter_class=argparse.RawTextHelpFormatter
37+
)
38+
parser.add_argument(
39+
"--pull_number",
40+
type=int,
41+
required=True,
42+
help="Pull request number."
43+
)
44+
parser.add_argument(
45+
"--owner",
46+
type=str,
47+
default=default_owner,
48+
help=f"Repository owner. Defaults to '{default_owner}'."
49+
)
50+
parser.add_argument(
51+
"--repo",
52+
type=str,
53+
default=default_repo,
54+
help=f"Repository name. Defaults to '{default_repo}'."
55+
)
56+
parser.add_argument(
57+
"--token",
58+
type=str,
59+
default=os.environ.get("GITHUB_TOKEN"),
60+
help="GitHub token. Can also be set via GITHUB_TOKEN env var."
61+
)
62+
parser.add_argument(
63+
"--context-lines",
64+
type=int,
65+
default=10,
66+
help="Number of context lines from the diff hunk. 0 for full hunk. If > 0, shows header (if any) and last N lines of the remaining hunk. Default: 10."
67+
)
68+
parser.add_argument(
69+
"--since",
70+
type=str,
71+
default=None,
72+
help="Only show comments updated at or after this ISO 8601 timestamp (e.g., YYYY-MM-DDTHH:MM:SSZ)."
73+
)
74+
parser.add_argument(
75+
"--exclude-old",
76+
action="store_true",
77+
default=False,
78+
help="Exclude comments marked [OLD] (where line number has changed due to code updates but position is still valid)."
79+
)
80+
parser.add_argument(
81+
"--include-irrelevant",
82+
action="store_true",
83+
default=False,
84+
help="Include comments marked [IRRELEVANT] (where GitHub can no longer anchor the comment to the diff, i.e., position is null)."
85+
)
86+
87+
args = parser.parse_args()
88+
89+
if not args.token:
90+
sys.stderr.write("Error: GitHub token not provided. Set GITHUB_TOKEN or use --token.\n")
91+
sys.exit(1)
92+
93+
if args.owner != firebase_github.OWNER or args.repo != firebase_github.REPO:
94+
repo_url = f"https://github.com/{args.owner}/{args.repo}"
95+
if not firebase_github.set_repo_url(repo_url):
96+
sys.stderr.write(f"Error: Invalid repo URL: {args.owner}/{args.repo}. Expected https://github.com/owner/repo\n")
97+
sys.exit(1)
98+
sys.stderr.write(f"Targeting repository: {firebase_github.OWNER}/{firebase_github.REPO}\n")
99+
100+
sys.stderr.write(f"Fetching comments for PR #{args.pull_number} from {firebase_github.OWNER}/{firebase_github.REPO}...\n")
101+
if args.since:
102+
sys.stderr.write(f"Filtering comments updated since: {args.since}\n")
103+
104+
105+
comments = firebase_github.get_pull_request_review_comments(
106+
args.token,
107+
args.pull_number,
108+
since=args.since
109+
)
110+
111+
if not comments:
112+
sys.stderr.write(f"No review comments found for PR #{args.pull_number} (or matching filters), or an error occurred.\n")
113+
return
114+
115+
latest_activity_timestamp_obj = None
116+
processed_comments_count = 0
117+
print("# Review Comments\n\n")
118+
for comment in comments:
119+
created_at_str = comment.get("created_at")
120+
121+
current_pos = comment.get("position")
122+
current_line = comment.get("line")
123+
original_line = comment.get("original_line")
124+
125+
status_text = ""
126+
line_to_display = None
127+
128+
if current_pos is None:
129+
status_text = STATUS_IRRELEVANT
130+
line_to_display = original_line
131+
elif original_line is not None and current_line != original_line:
132+
status_text = STATUS_OLD
133+
line_to_display = current_line
134+
else:
135+
status_text = STATUS_CURRENT
136+
line_to_display = current_line
137+
138+
if line_to_display is None:
139+
line_to_display = "N/A"
140+
141+
if status_text == STATUS_IRRELEVANT and not args.include_irrelevant:
142+
continue
143+
if status_text == STATUS_OLD and args.exclude_old:
144+
continue
145+
146+
# Track latest 'updated_at' for '--since' suggestion; 'created_at' is for display.
147+
updated_at_str = comment.get("updated_at")
148+
if updated_at_str: # Check if updated_at_str is not None and not empty
149+
try:
150+
if sys.version_info < (3, 11):
151+
dt_str_updated = updated_at_str.replace("Z", "+00:00")
152+
else:
153+
dt_str_updated = updated_at_str
154+
current_comment_activity_dt = datetime.datetime.fromisoformat(dt_str_updated)
155+
if latest_activity_timestamp_obj is None or current_comment_activity_dt > latest_activity_timestamp_obj:
156+
latest_activity_timestamp_obj = current_comment_activity_dt
157+
except ValueError:
158+
sys.stderr.write(f"Warning: Could not parse updated_at timestamp: {updated_at_str}\n")
159+
160+
# Get other comment details
161+
user = comment.get("user", {}).get("login", "Unknown user")
162+
path = comment.get("path", "N/A")
163+
body = comment.get("body", "").strip()
164+
165+
if not body:
166+
continue
167+
168+
processed_comments_count += 1
169+
170+
diff_hunk = comment.get("diff_hunk")
171+
html_url = comment.get("html_url", "N/A")
172+
comment_id = comment.get("id")
173+
in_reply_to_id = comment.get("in_reply_to_id")
174+
175+
print(f"## Comment by: **{user}** (ID: `{comment_id}`){f' (In Reply To: `{in_reply_to_id}`)' if in_reply_to_id else ''}\n")
176+
if created_at_str:
177+
print(f"* **Timestamp**: `{created_at_str}`")
178+
print(f"* **Status**: `{status_text}`")
179+
print(f"* **File**: `{path}`")
180+
print(f"* **Line**: `{line_to_display}`")
181+
print(f"* **URL**: <{html_url}>\n")
182+
183+
print("\n### Context:")
184+
print("```") # Start of Markdown code block
185+
if diff_hunk and diff_hunk.strip():
186+
if args.context_lines == 0: # User wants the full hunk
187+
print(diff_hunk)
188+
else: # User wants N lines of context (args.context_lines > 0)
189+
hunk_lines = diff_hunk.split('\n')
190+
if hunk_lines and hunk_lines[0].startswith("@@ "):
191+
print(hunk_lines[0])
192+
hunk_lines = hunk_lines[1:] # Modify list in place for remaining operations
193+
194+
# Proceed with the (potentially modified) hunk_lines
195+
# If hunk_lines is empty here (e.g. original hunk was only a header that was removed),
196+
# hunk_lines[-args.context_lines:] will be [], and "\n".join([]) is "",
197+
# so print("") will effectively print a newline. This is acceptable.
198+
print("\n".join(hunk_lines[-args.context_lines:]))
199+
else: # diff_hunk was None or empty
200+
print("(No diff hunk available for this comment)")
201+
print("```") # End of Markdown code block
202+
203+
print("\n### Comment:")
204+
print(body)
205+
print("\n---")
206+
207+
sys.stderr.write(f"\nPrinted {processed_comments_count} comments to stdout.\n")
208+
209+
if latest_activity_timestamp_obj:
210+
try:
211+
# Ensure it's UTC before adding timedelta, then format
212+
next_since_dt = latest_activity_timestamp_obj.astimezone(timezone.utc) + timedelta(seconds=2)
213+
next_since_str = next_since_dt.strftime('%Y-%m-%dT%H:%M:%SZ')
214+
215+
new_cmd_args = [sys.executable, sys.argv[0]] # Start with interpreter and script path
216+
i = 1 # Start checking from actual arguments in sys.argv
217+
while i < len(sys.argv):
218+
if sys.argv[i] == "--since":
219+
i += 2 # Skip --since and its value
220+
continue
221+
new_cmd_args.append(sys.argv[i])
222+
i += 1
223+
224+
new_cmd_args.extend(["--since", next_since_str])
225+
suggested_cmd = " ".join(new_cmd_args)
226+
sys.stderr.write(f"\nTo get comments created after the last one in this batch, try:\n{suggested_cmd}\n")
227+
except Exception as e:
228+
sys.stderr.write(f"\nWarning: Could not generate next command suggestion: {e}\n")
229+
230+
if __name__ == "__main__":
231+
main()

0 commit comments

Comments
 (0)