Skip to content

Make codebase.reset only reset changes made by the sdk #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4561f26
Basic test case
bagel897 Jan 24, 2025
3556f6b
WIP: update test cases
bagel897 Jan 24, 2025
03b0a86
v0 reset via diffs
bagel897 Jan 24, 2025
28bb402
Reset commited changes
bagel897 Jan 24, 2025
2d66622
Add stress test
bagel897 Jan 24, 2025
8c1beaa
Fix stress test
bagel897 Jan 24, 2025
8d34044
Use cython
bagel897 Jan 24, 2025
1f0188a
Remove extreme case
bagel897 Jan 24, 2025
f55dcc7
Handle unapplied diffs
bagel897 Jan 24, 2025
c25e900
Merge branch 'develop' into eagarwal-cg-10342-codebasereset-should-no…
eacodegen Jan 24, 2025
1ef801c
Fix non-determinism
Jan 24, 2025
df76870
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
Jan 24, 2025
421a73b
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
Jan 24, 2025
5ca3137
Update types
Jan 24, 2025
05694c6
Make changetype an IntEnum
Jan 27, 2025
a3818a5
Fix bug
Jan 27, 2025
e08c8c5
Add README.md for all top level codegen modules (#86)
caroljung-cg Jan 24, 2025
a0c4970
fix bg color in doc codeblocks (#87)
rushilpatel0 Jan 24, 2025
88afb3b
fix: url for widget in overview docs (#88)
rushilpatel0 Jan 24, 2025
0df6b6c
fix: remove widget border (#89)
rushilpatel0 Jan 24, 2025
c50fae0
chore(CG-10487): add back docs spell checker pre-commit (#90)
christinewangcw Jan 24, 2025
f73ee67
chore(deps): update pre-commit hook codespell-project/codespell to v2…
renovate[bot] Jan 24, 2025
26d47b2
nit: reduce gap in view source on github (#93)
christinewangcw Jan 24, 2025
e89a7ca
chore(deps): update pre-commit hook codespell-project/codespell to v2…
renovate[bot] Jan 24, 2025
44b8ffb
docs: adds function call parameter helpers (#95)
jayhack Jan 25, 2025
e86d5de
fix: highside token typo (#97)
christinewangcw Jan 25, 2025
c0954a6
CG-10491: Move skills testing + utils into tests/ (#96)
caroljung-cg Jan 25, 2025
e1811cf
Move conftest.py -> tests/conftest.py (#99)
caroljung-cg Jan 25, 2025
635d7fd
CG-10501: Remove run_string from codegen package (#100)
caroljung-cg Jan 25, 2025
70769e6
docs: act via code blog [wip] (#101)
jayhack Jan 25, 2025
6b51c6b
Add graph widget (#102)
rushilpatel0 Jan 26, 2025
88b17d1
chore(deps): update pre-commit hook fpgmaas/deptry to v0.23.0 (#103)
renovate[bot] Jan 26, 2025
33dfad2
docs: act-via-code (#105)
jayhack Jan 26, 2025
2161e1f
docs: fixed up export docs (#106)
jayhack Jan 26, 2025
afd3371
feat: fixes codegen run and codegen create (#107)
jayhack Jan 26, 2025
454f921
feat: adds system-prompt.txt (#108)
jayhack Jan 26, 2025
d42c9c2
docs: tightens up many docs (#109)
jayhack Jan 27, 2025
2e8c45d
chore(deps): lock file maintenance (#110)
renovate[bot] Jan 27, 2025
ee50184
feat: fetches system-prompt + guide (#111)
jayhack Jan 27, 2025
daf57a5
skip tests requiring auth, remove hardcoded gh install (#47)
Jan 27, 2025
c8f6948
fix: url for graph widget (#114)
rushilpatel0 Jan 27, 2025
7027a30
Adds missing return types, Styling modifications (CG-10500, CG-10499,…
jemeza-codegen Jan 27, 2025
071a0e3
Revert "skip tests requiring auth, remove hardcoded gh install" (#115)
Jan 27, 2025
8bd733d
Merge branch 'develop' into eagarwal-cg-10342-codebasereset-should-no…
eacodegen Jan 27, 2025
f56dbc5
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
bagel897 Jan 27, 2025
eaaf7e9
Fix bot commit issue
bagel897 Jan 27, 2025
eb5b90c
rename
bagel897 Jan 27, 2025
0cca4de
Fix bugs
bagel897 Jan 27, 2025
ceb5e80
Clear all syncs
bagel897 Jan 27, 2025
e73a8af
Fix codemod testing
bagel897 Jan 27, 2025
2b33294
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
bagel897 Jan 27, 2025
39bcdb9
Update bot commit behavior
bagel897 Jan 27, 2025
bfca93b
Fix reset
bagel897 Jan 27, 2025
d621525
Set group
bagel897 Jan 27, 2025
acd359a
reorganize
bagel897 Jan 27, 2025
7e4684c
Add benchmarking
bagel897 Jan 27, 2025
9dc444c
Add loguru
bagel897 Jan 27, 2025
8aa8547
Add loguru
bagel897 Jan 27, 2025
8790623
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
bagel897 Jan 27, 2025
ac6d731
don't create parent dir
bagel897 Jan 27, 2025
1bb194d
Merge branch 'develop' into eagarwal-cg-10342-codebasereset-should-no…
eacodegen Jan 27, 2025
fe56300
Merge branch 'develop' of github.com:codegen-sh/codegen-sdk into eaga…
bagel897 Jan 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,4 @@ graph-sitter-types/typings/**
coverage.json
tests/integration/verified_codemods/codemod_data/repo_commits.json
.codegen/*
.benchmarks/*
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ dev-dependencies = [
"black>=24.8.0",
"isort>=5.13.2",
"emoji>=2.14.0",
"pytest-benchmark[histogram]>=5.1.0",
"loguru>=0.7.3",
]
keyring-provider = "subprocess"
#extra-index-url = ["https://[email protected]/pypi/codegen/simple/"]
Expand Down Expand Up @@ -197,7 +199,7 @@ pythonpath = "."
norecursedirs = "repos expected"
# addopts = -v --cov=app --cov-report=term

addopts = "--dist=loadgroup --junitxml=build/test-results/test/TEST.xml --strict-config --import-mode=importlib --cov-context=test --cov-report=json --cov-config=pyproject.toml -p no:doctest"
addopts = "--dist=loadgroup --junitxml=build/test-results/test/TEST.xml --strict-config --import-mode=importlib --cov-context=test --cov-report=json --cov-config=pyproject.toml -p no:doctest --benchmark-autosave"
filterwarnings ="""
ignore::DeprecationWarning:botocore.*:
ignore::DeprecationWarning:sqlalchemy.*:
Expand Down
2 changes: 1 addition & 1 deletion src/codegen/git/repo_operator/local_repo_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,8 @@ def create_from_files(cls, repo_path: str, files: dict[str, str], bot_commit: bo
def create_from_commit(cls, repo_path: str, commit: str, url: str) -> Self:
"""Do a shallow checkout of a particular commit to get a repository from a given remote URL."""
op = cls(repo_config=BaseRepoConfig(), repo_path=repo_path, bot_commit=False)
op.discard_changes()
if op.get_active_branch_or_commit() != commit:
op.discard_changes()
op.create_remote("origin", url)
op.git_cli.remotes["origin"].fetch(commit, depth=1)
op.checkout_commit(commit)
Expand Down
3 changes: 2 additions & 1 deletion src/codegen/git/repo_operator/remote_repo_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ def __init__(
setup_option: SetupOption = SetupOption.PULL_OR_CLONE,
shallow: bool = True,
github_type: GithubType = GithubType.GithubEnterprise,
bot_commit: bool = True,
) -> None:
super().__init__(repo_config=repo_config, base_dir=base_dir)
super().__init__(repo_config=repo_config, base_dir=base_dir, bot_commit=bot_commit)
self.github_type = github_type
self.setup_repo_dir(setup_option=setup_option, shallow=shallow)

Expand Down
15 changes: 12 additions & 3 deletions src/codegen/git/repo_operator/repo_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,21 @@ def viz_file_path(self) -> str:
def git_cli(self) -> GitCLI:
"""Note: this is recursive, may want to look out"""
git_cli = GitCLI(self.repo_path)
has_username = False
has_email = False
with git_cli.config_reader(None) as reader:
if reader.has_option("user", "name"):
has_username = True
if reader.has_option("user", "email"):
has_email = True
with git_cli.config_writer("repository") as writer:
if self.bot_commit:
if not has_username or not has_email or self.bot_commit:
if not writer.has_section("user"):
writer.add_section("user")
writer.set("user", "name", CODEGEN_BOT_NAME)
writer.set("user", "email", CODEGEN_BOT_EMAIL)
if not has_username or self.bot_commit:
writer.set("user", "name", CODEGEN_BOT_NAME)
if not has_email or self.bot_commit:
writer.set("user", "email", CODEGEN_BOT_EMAIL)
return git_cli

@property
Expand Down
45 changes: 42 additions & 3 deletions src/codegen/sdk/codebase/codebase_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from codegen.sdk.core.interfaces.importable import Importable
from codegen.sdk.core.node_id_factory import NodeId
from codegen.sdk.enums import Edge, EdgeType, NodeType, ProgrammingLanguage
from codegen.sdk.extensions.io import write_changes
from codegen.sdk.extensions.sort import sort_editables
from codegen.sdk.extensions.utils import uncache_all
from codegen.sdk.typescript.external.ts_declassify.ts_declassify import TSDeclassify
Expand Down Expand Up @@ -107,6 +108,7 @@ class CodebaseGraph:
flags: Flags
session_options: SessionOptions = SessionOptions()
projects: list[ProjectConfig]
unapplied_diffs: list[DiffLite]

def __init__(
self,
Expand Down Expand Up @@ -161,6 +163,7 @@ def __init__(
self.synced_commit = None
self.pending_syncs = []
self.all_syncs = []
self.unapplied_diffs = []
self.pending_files = set()
self.flags = Flags()

Expand Down Expand Up @@ -232,9 +235,40 @@ def apply_diffs(self, diff_list: list[DiffLite]) -> None:
self.generation += 1
self._process_diff_files(by_sync_type)

def _reset_files(self, syncs: list[DiffLite]) -> None:
files_to_write = []
files_to_remove = []
modified_files = set()
for sync in syncs:
if sync.path in modified_files:
continue
if sync.change_type == ChangeType.Removed:
files_to_write.append((sync.path, sync.old_content))
modified_files.add(sync.path)
logger.info(f"Removing {sync.path} from disk")
elif sync.change_type == ChangeType.Modified:
files_to_write.append((sync.path, sync.old_content))
modified_files.add(sync.path)
elif sync.change_type == ChangeType.Renamed:
files_to_write.append((sync.rename_from, sync.old_content))
files_to_remove.append(sync.rename_to)
modified_files.add(sync.rename_from)
modified_files.add(sync.rename_to)
elif sync.change_type == ChangeType.Added:
files_to_remove.append(sync.path)
modified_files.add(sync.path)
logger.info(f"Writing {len(files_to_write)} files to disk and removing {len(files_to_remove)} files")
write_changes(files_to_remove, files_to_write)

@stopwatch
def reset_codebase(self) -> None:
self._reset_files(self.all_syncs + self.pending_syncs + self.unapplied_diffs)
self.unapplied_diffs.clear()

@stopwatch
def undo_applied_diffs(self) -> None:
self.transaction_manager.clear_transactions()
self.reset_codebase()
self.check_changes()
self.pending_syncs.clear() # Discard pending changes
if len(self.all_syncs) > 0:
Expand All @@ -256,6 +290,9 @@ def _revert_diffs(self, diff_list: list[DiffLite]) -> None:

def save_commit(self, commit: GitCommit) -> None:
if commit is not None:
logger.info(f"Saving commit {commit.hexsha} to graph")
self.all_syncs.clear()
self.unapplied_diffs.clear()
self.synced_commit = commit
if self.config.feature_flags.verify_graph:
self.old_graph = self._graph.copy()
Expand Down Expand Up @@ -630,9 +667,11 @@ def commit_transactions(self, sync_graph: bool = True, sync_file: bool = True, f
# Commit transactions for all contexts
files_to_lock = self.transaction_manager.to_commit(files)
diffs = self.transaction_manager.commit(files_to_lock)
# Filter diffs to only include files that are still in the graph
diffs = [diff for diff in diffs if self.get_file(diff.path) is not None]
self.pending_syncs.extend(diffs)
for diff in diffs:
if self.get_file(diff.path) is None:
self.unapplied_diffs.append(diff)
else:
self.pending_syncs.append(diff)

# Write files if requested
if sync_file:
Expand Down
19 changes: 12 additions & 7 deletions src/codegen/sdk/codebase/diff_lite.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from enum import Enum, auto
from enum import IntEnum, auto
from os import PathLike
from pathlib import Path
from typing import NamedTuple, Self
Expand All @@ -7,7 +7,7 @@
from watchfiles import Change


class ChangeType(Enum):
class ChangeType(IntEnum):
Modified = auto()
Removed = auto()
Renamed = auto()
Expand Down Expand Up @@ -40,8 +40,9 @@ class DiffLite(NamedTuple):

change_type: ChangeType
path: Path
rename_from: str | None = None
rename_to: str | None = None
rename_from: Path | None = None
rename_to: Path | None = None
old_content: bytes | None = None

@classmethod
def from_watch_change(cls, change: Change, path: PathLike) -> Self:
Expand All @@ -52,11 +53,15 @@ def from_watch_change(cls, change: Change, path: PathLike) -> Self:

@classmethod
def from_git_diff(cls, git_diff: Diff):
old = None
if git_diff.a_blob:
old = git_diff.a_blob.data_stream.read()
return cls(
change_type=ChangeType.from_git_change_type(git_diff.change_type),
path=Path(git_diff.a_path),
rename_from=git_diff.rename_from,
rename_to=git_diff.rename_to,
path=Path(git_diff.a_path) if git_diff.a_path else None,
rename_from=Path(git_diff.rename_from) if git_diff.rename_from else None,
rename_to=Path(git_diff.rename_to) if git_diff.rename_to else None,
old_content=old,
)

@classmethod
Expand Down
19 changes: 13 additions & 6 deletions src/codegen/sdk/codebase/transaction_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from pathlib import Path
from typing import TYPE_CHECKING

from codegen.sdk.codebase.diff_lite import DiffLite
from codegen.sdk.codebase.diff_lite import ChangeType, DiffLite
from codegen.sdk.codebase.transactions import (
EditTransaction,
FileAddTransaction,
Expand Down Expand Up @@ -163,16 +163,16 @@ def to_commit(self, files: set[Path] | None = None) -> set[Path]:
return set(self.queued_transactions.keys())
return files.intersection(self.queued_transactions)

def commit(self, files: set[Path]) -> set[DiffLite]:
def commit(self, files: set[Path]) -> list[DiffLite]:
"""Execute transactions in bulk for each file, in reverse order of start_byte.
Returns the set of diffs that were committed.
Returns the list of diffs that were committed.
"""
if self._commiting:
logger.warn("Skipping commit, already committing")
return set()
return []
self._commiting = True
try:
diffs: set[DiffLite] = set()
diffs: list[DiffLite] = []
if not self.queued_transactions or len(self.queued_transactions) == 0:
return diffs

Expand All @@ -187,9 +187,16 @@ def commit(self, files: set[Path]) -> set[DiffLite]:
logger.info(f"Committing {len(self.queued_transactions[file])} transactions for {file}")
for file_path in files:
file_transactions = self.queued_transactions.pop(file_path, [])
modified = False
for transaction in file_transactions:
# Add diff IF the file is a source file
diffs.add(transaction.get_diff())
diff = transaction.get_diff()
if diff.change_type == ChangeType.Modified:
if not modified:
modified = True
diffs.append(diff)
else:
diffs.append(diff)
transaction.execute()
return diffs
finally:
Expand Down
10 changes: 5 additions & 5 deletions src/codegen/sdk/codebase/transactions.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def execute(self) -> None:

def get_diff(self) -> DiffLite:
"""Gets the diff produced by this transaction"""
return DiffLite(ChangeType.Modified, self.file_path)
return DiffLite(ChangeType.Modified, self.file_path, old_content=self.file.content_bytes)

def diff_str(self) -> str:
"""Human-readable string representation of the change"""
Expand Down Expand Up @@ -170,7 +170,7 @@ def execute(self) -> None:

def get_diff(self) -> DiffLite:
"""Gets the diff produced by this transaction"""
return DiffLite(ChangeType.Modified, self.file_path)
return DiffLite(ChangeType.Modified, self.file_path, old_content=self.file.content_bytes)

def diff_str(self) -> str:
"""Human-readable string representation of the change"""
Expand Down Expand Up @@ -205,7 +205,7 @@ def execute(self) -> None:

def get_diff(self) -> DiffLite:
"""Gets the diff produced by this transaction"""
return DiffLite(ChangeType.Modified, self.file_path)
return DiffLite(ChangeType.Modified, self.file_path, old_content=self.file.content_bytes)

def diff_str(self) -> str:
"""Human-readable string representation of the change"""
Expand Down Expand Up @@ -269,7 +269,7 @@ def execute(self) -> None:

def get_diff(self) -> DiffLite:
"""Gets the diff produced by this transaction"""
return DiffLite(ChangeType.Renamed, self.file_path, self.file_path, self.new_file_path)
return DiffLite(ChangeType.Renamed, self.file_path, self.file_path, self.new_file_path, old_content=self.file.content_bytes)

def diff_str(self) -> str:
"""Human-readable string representation of the change"""
Expand All @@ -294,7 +294,7 @@ def execute(self) -> None:

def get_diff(self) -> DiffLite:
"""Gets the diff produced by this transaction"""
return DiffLite(ChangeType.Removed, self.file_path)
return DiffLite(ChangeType.Removed, self.file_path, old_content=self.file.content_bytes)

def diff_str(self) -> str:
"""Human-readable string representation of the change"""
Expand Down
15 changes: 9 additions & 6 deletions src/codegen/sdk/core/codebase.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ def current_commit(self) -> GitCommit | None:
return self._op.git_cli.head.commit

@stopwatch
def reset(self) -> None:
def reset(self, git_reset: bool = False) -> None:
"""Resets the codebase by:
- Discarding any staged/unstaged changes
- Resetting stop codemod limits: (max seconds, max transactions, max AI requests)
Expand All @@ -751,7 +751,8 @@ def reset(self) -> None:
- .ipynb files (Jupyter notebooks, where you are likely developing)
"""
logger.info("Resetting codebase ...")
self._op.discard_changes() # Discard any changes made to the raw file state
if git_reset:
self._op.discard_changes() # Discard any changes made to the raw file state
self._num_ai_requests = 0
self.reset_logs()
self.G.undo_applied_diffs()
Expand Down Expand Up @@ -818,12 +819,14 @@ def get_diffs(self, base: str | None = None) -> list[Diff]:
return self._op.get_diffs(base)

@noapidoc
def get_diff(self, base: str | None = None) -> str:
def get_diff(self, base: str | None = None, stage_files: bool = False) -> str:
"""Produce a single git diff for all files."""
self._op.git_cli.git.add(A=True) # add all changes to the index so untracked files are included in the diff
if stage_files:
self._op.git_cli.git.add(A=True) # add all changes to the index so untracked files are included in the diff
if base is None:
return self._op.git_cli.git.diff(patch=True, full_index=True, staged=True)
return self._op.git_cli.git.diff(base, full_index=True)
diff = self._op.git_cli.git.diff("HEAD", patch=True, full_index=True)
return diff
return self._op.git_cli.git.diff(base, patch=True, full_index=True)

@noapidoc
def clean_repo(self):
Expand Down
12 changes: 12 additions & 0 deletions src/codegen/sdk/extensions/io.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor
import os


def write_changes(files_to_remove: list[Path], files_to_write: list[tuple[Path, bytes]]):
# Start at the oldest sync and then apply non-conflicting newer changes
with ThreadPoolExecutor() as executor:
for file_to_remove in files_to_remove:
executor.submit(os.remove, file_to_remove)
for file_to_write, content in files_to_write:
executor.submit(file_to_write.write_bytes, content)
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@

@pytest.fixture
def op(repo_config, request, tmpdir):
yield RemoteRepoOperator(repo_config, shallow=request.param, base_dir=tmpdir)
op = RemoteRepoOperator(repo_config, shallow=request.param, base_dir=tmpdir, bot_commit=False)
yield op


@pytest.mark.parametrize("op", shallow_options, ids=lambda x: f"shallow={x}", indirect=True)
Expand Down
3 changes: 2 additions & 1 deletion tests/integration/codemod/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,9 @@ def _codebase(repo: Repo, op: RepoOperator, request) -> YieldFixture[Codebase]:
projects = [ProjectConfig(repo_operator=op, programming_language=repo.language, subdirectories=repo.subdirectories, base_path=repo.base_path)]
Codebases[repo.name] = Codebase(projects=projects, config=CodebaseConfig(feature_flags=feature_flags))
codebase = Codebases[repo.name]
codebase.reset()
codebase.reset(git_reset=True)
yield codebase
codebase.reset(git_reset=True)


@pytest.fixture
Expand Down
2 changes: 1 addition & 1 deletion tests/shared/codemod/codebase_comparison_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def compare_codebase_diff(
diff = codebase.get_diff() + "\n"
if not snapshot._snapshot_update:
modified = gather_modified_files(codebase)
codebase.reset()
codebase.reset(git_reset=True)
logger.info("Converting diff file to expected repository")
if convert_diff_to_repo(expected_dir, expected_diff, codebase):
return compare_codebase_with_snapshot(codebase, expected_dir, diff_path, snapshot, modified)
Expand Down
Loading