Skip to content

CG-10508: Docs explain differences between SourceFile and File types #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 28, 2025
146 changes: 97 additions & 49 deletions docs/building-with-codegen/files-and-directories.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@ icon: "folder-tree"
iconType: "solid"
---

Codegen provides two primary abstractions for working with your codebase's file structure:
Codegen provides three primary abstractions for working with your codebase's file structure:

- [File](/api-reference/core/File)
- [Directory](/api-reference/core/Directory)
- [File](/api-reference/core/File) - Represents a file in the codebase (e.g. README.md, package.json, etc.)
- [SourceFile](/api-reference/core/SourceFile) - Represents a source code file (e.g. Python, TypeScript, React, etc.)
- [Directory](/api-reference/core/Directory) - Represents a directory in the codebase

Both of these expose a rich API for accessing and manipulating their contents.
<Info>
[SourceFile](/api-reference/core/SourceFile) is a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files.
</Info>

This guide explains how to effectively use these classes to manage your codebase.

Expand All @@ -31,8 +34,10 @@ for file in codebase.files:

# Check if a file exists
exists = codebase.has_file("path/to/file.py")

```


These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.

```python
Expand All @@ -50,61 +55,58 @@ dir = file.directory
exists = codebase.has_directory("path/to/dir")
```

## Working with Non-Code Files (README, JSON, etc.)
## Differences between SourceFile and File

By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code files like README.md, package.json, or .env:
- [File](/api-reference/core/File) - a general purpose class that represents any file in the codebase including non-code files like README.md, .env, .json, image files, etc.
- [SourceFile](/api-reference/core/SourceFile) - a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files written in languages supported by the [codegen-sdk](/introduction/overview) (Python, TypeScript, JavaScript, React).

```python
# Get all files in the codebase (including README, docs, config files)
files = codebase.files(extensions="*")
The majority of intended use cases involve using exclusively [SourceFile](/api-reference/core/SourceFile) objects as these contain code that can be parsed and manipulated by the [codegen-sdk](/introduction/overview). However, there may be cases where it will be necessary to work with non-code files. In these cases, you will need to use the [File](/api-reference/core/File) class.

# Print files that are not source code (documentation, config, etc)
for file in files:
if not file.filepath.endswith(('.py', '.ts', '.js')):
print(f"📄 Non-code file: {file.filepath}")
```

You can also filter for specific file types:
By default, the `codebase.files` property will only return [SourceFile](/api-reference/core/SourceFile) objects. To include non-code files the `extensions='*'` argument must be used.

```python
# Get only markdown documentation files
docs = codebase.files(extensions=[".md", ".mdx"])
# Get all source files in the codebase
source_files = codebase.files

# Get configuration files
config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
# Get all files in the codebase (including non-code files)
all_files = codebase.files(extensions="*")
```

These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.

## Raw Content and Metadata
When getting a file with `codebase.get_file`, files ending in `.py, .js, .ts, .jsx, .tsx` are returned as [SourceFile](/api-reference/core/SourceFile) objects while other files are returned as [File](/api-reference/core/File) objects.

Furthermore, you can use the `isinstance` function to check if a file is a [SourceFile](/api-reference/core/SourceFile):

```python
# Grab raw file string content
content = file.content # For text files
print('Length:', len(content))
print('# of functions:', len(file.functions))
py_file = codebase.get_file("path/to/file.py")
if isinstance(py_file, SourceFile):
print(f"File {py_file.filepath} is a source file")

# Access file metadata
name = file.name # Base name without extension
extension = file.extension # File extension with dot
filepath = file.filepath # Full relative path
dir = file.directory # Parent directory
# prints: `File path/to/file.py is a source file`

# Access directory metadata
name = dir.name # Base name without extension
path = dir.path # Full relative path from repository root
parent = dir.parent # Parent directory
mdx_file = codebase.get_file("path/to/file.mdx")
if not isinstance(mdx_file, SourceFile):
print(f"File {mdx_file.filepath} is a non-code file")

# prints: `File path/to/file.mdx is a non-code file`
```

<Note>
Currently, the codebase object can only parse source code files of one language at a time. This means that if you want to work with both Python and TypeScript files, you will need to create two separate codebase objects.
</Note>

## Accessing Code

Files and Directories provide several APIs for accessing and iterating over their code.
[SourceFiles](/api-reference/core/SourceFile) and [Directories](/api-reference/core/Directory) provide several APIs for accessing and iterating over their code.

See, for example:

- `.functions` ([File](/api-reference/core/File#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](../api-reference/core/Function) in the file/directory
- `.classes` ([File](/api-reference/core/File#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](../api-reference/core/Class) in the file/directory
- `.imports` ([File](/api-reference/core/File#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](../api-reference/core/Import) in the file/directory
- `.functions` ([SourceFile](/api-reference/core/SourceFile#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](/api-reference/core/Function) in the file/directory
- `.classes` ([SourceFile](/api-reference/core/SourceFile#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](/api-reference/core/Class) in the file/directory
- `.imports` ([SourceFile](/api-reference/core/SourceFile#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](/api-reference/core/Import) in the file/directory
- `.get_function(...)` ([SourceFile](/api-reference/core/SourceFile#get-function) / [Directory](/api-reference/core/Directory#get-function)) - Get a specific function by name
- `.get_class(...)` ([SourceFile](/api-reference/core/SourceFile#get-class) / [Directory](/api-reference/core/Directory#get-class)) - Get a specific class by name
- `.get_global_var(...)` ([SourceFile](/api-reference/core/SourceFile#get-global-var) / [Directory](/api-reference/core/Directory#get-global-var)) - Get a specific global variable by name


```python
Expand Down Expand Up @@ -142,22 +144,68 @@ if main_function:
print(f"Local var: {var.name} = {var.value}")
```

## Working with Non-Code Files (README, JSON, etc.)

By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code [files](/api-reference/core/File) like README.md, package.json, or .env:

```python
# Get all files in the codebase (including README, docs, config files)
files = codebase.files(extensions="*")

# Print files that are not source code (documentation, config, etc)
for file in files:
if not file.filepath.endswith(('.py', '.ts', '.js')):
print(f"📄 Non-code file: {file.filepath}")
```

You can also filter for specific file types:

```python
# Get only markdown documentation files
docs = codebase.files(extensions=[".md", ".mdx"])

# Get configuration files
config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
```

These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.

## Raw Content and Metadata

```python
# Grab raw file string content
content = file.content # For text files
print('Length:', len(content))
print('# of functions:', len(file.functions))

# Access file metadata
name = file.name # Base name without extension
extension = file.extension # File extension with dot
filepath = file.filepath # Full relative path
dir = file.directory # Parent directory

# Access directory metadata
name = dir.name # Base name without extension
path = dir.path # Full relative path from repository root
parent = dir.parent # Parent directory
```

## Editing Files Directly

Files themselves are [`Editable`](../api-reference/core/Editable.mdx) objects, just like Functions and Classes.
Files themselves are [`Editable`](/api-reference/core/Editable.mdx) objects, just like Functions and Classes.

<Tip>
Learn more about the [Editable API](/building-with-codegen/the-editable-api).
</Tip>

This means they expose many useful operations, including:

- [`File.search`](../api-reference/core/File#search) - Search for all functions named "main"
- [`File.edit`](../api-reference/core/Editable#edit) - Edit the file
- [`File.replace`](../api-reference/core/File#replace) - Replace all instances of a string with another string
- [`File.insert_before`](../api-reference/core/File#insert-before) - Insert text before a specific string
- [`File.insert_after`](../api-reference/core/File#insert-after) - Insert text after a specific string
- [`File.remove`](../api-reference/core/File#remove) - Remove a specific string
- [`File.search`](/api-reference/core/File#search) - Search for all functions named "main"
- [`File.edit`](/api-reference/core/File#edit) - Edit the file
- [`File.replace`](/api-reference/core/File#replace) - Replace all instances of a string with another string
- [`File.insert_before`](/api-reference/core/File#insert-before) - Insert text before a specific string
- [`File.insert_after`](/api-reference/core/File#insert-after) - Insert text after a specific string
- [`File.remove`](/api-reference/core/File#remove) - Remove a specific string

```python
# Get a file
Expand All @@ -183,7 +231,7 @@ file.insert_after("def end():\npass")
file.remove()
```

You can frequently do bulk modifictions via the [`.edit(...)`](../api-reference/core/Editable#edit) method or [`.replace(...)`](../api-reference/core/File#replace) method.
You can frequently do bulk modifictions via the [`.edit(...)`](/api-reference/core/Editable#edit) method or [`.replace(...)`](/api-reference/core/File#replace) method.

<Note>
Most useful operations will have bespoke APIs that handle edge cases, update
Expand All @@ -192,7 +240,7 @@ You can frequently do bulk modifictions via the [`.edit(...)`](../api-reference/

## Moving and Renaming Files

Files can be manipulated through methods like [`File.update_filepath()`](../api-reference/core/File#update-filepath), [`File.rename()`](../api-reference/core/File#rename), and [`File.remove()`](../api-reference/core/File#remove):
Files can be manipulated through methods like [`File.update_filepath()`](/api-reference/core/File#update-filepath), [`File.rename()`](/api-reference/core/File#rename), and [`File.remove()`](/api-reference/core/File#remove):

```python
# Move/rename a file
Expand All @@ -216,7 +264,7 @@ for file in codebase.files:

## Directories

[`Directories`](/api-reference/core/Directory) expose a similar API to the [File](../api-reference/core/File.mdx) class, with the addition of the `subdirectories` property.
[`Directories`](/api-reference/core/Directory) expose a similar API to the [File](/api-reference/core/File.mdx) class, with the addition of the `subdirectories` property.

```python
# Get a directory
Expand Down
2 changes: 1 addition & 1 deletion docs/building-with-codegen/the-editable-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Every Editable provides:
- [source](../api-reference/core/Editable#source) - the text content of the Editable
- [extended_source](../api-reference/core/Editable#extended_source) - includes relevant content like decorators, comments, etc.
- Information about the file that contains the Editable:
- [file](../api-reference/core/Editable#file) - the [File](../api-reference/core/File) that contains this Editable
- [file](../api-reference/core/Editable#file) - the [SourceFile](../api-reference/core/SourceFile) that contains this Editable
- Relationship tracking
- [parent_class](../api-reference/core/Editable#parent-class) - the [Class](../api-reference/core/Class) that contains this Editable
- [parent_function](../api-reference/core/Editable#parent-function) - the [Function](../api-reference/core/Function) that contains this Editable
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/flask-to-fastapi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ python run.py

The script will:

1. Process all Python files in your codebase
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to FastAPI patterns

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/python2-to-python3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ python run.py

The script will:

1. Process all Python files in your codebase
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to Python 3 syntax

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/training-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ This will:

<Tip>
You can use any Git repository as your source codebase by passing the repo URL
to [Codebase.from_repo(...)](/api-reference/core/codebase#from-repo).
to [Codebase.from_repo(...)](/api-reference/core/Codebase#from-repo).
</Tip>

## Using the Training Data
Expand Down