Skip to content

Add Comprehensive Codebase Analyzer #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 83 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,117 +1,122 @@
<br />
# Comprehensive Codebase Analyzer

<p align="center">
<a href="https://docs.codegen.com">
<img src="https://i.imgur.com/6RF9W0z.jpeg" />
</a>
</p>
A powerful static code analysis system that provides extensive information about your codebase using the Codegen SDK.

<h2 align="center">
Scriptable interface to a powerful, multi-lingual language server.
</h2>
## Features

<div align="center">
This analyzer provides comprehensive analysis of your codebase, including:

[![PyPI](https://img.shields.io/badge/PyPi-codegen-gray?style=flat-square&color=blue)](https://pypi.org/project/codegen/)
[![Documentation](https://img.shields.io/badge/Docs-docs.codegen.com-purple?style=flat-square)](https://docs.codegen.com)
[![Slack Community](https://img.shields.io/badge/Slack-Join-4A154B?logo=slack&style=flat-square)](https://community.codegen.com)
[![License](https://img.shields.io/badge/Code%20License-Apache%202.0-gray?&color=gray)](https://github.com/codegen-sh/codegen-sdk/tree/develop?tab=Apache-2.0-1-ov-file)
[![Follow on X](https://img.shields.io/twitter/follow/codegen?style=social)](https://x.com/codegen)
### 1. Codebase Structure Analysis

</div>
- File Statistics (count, language, size)
- Symbol Tree Analysis
- Import/Export Analysis
- Module Organization

<br />
### 2. Symbol-Level Analysis

[Codegen](https://docs.codegen.com) is a python library for manipulating codebases.
- Function Analysis (parameters, return types, complexity)
- Class Analysis (methods, attributes, inheritance)
- Variable Analysis
- Type Analysis

```python
from codegen import Codebase
### 3. Dependency and Flow Analysis

# Codegen builds a complete graph connecting
# functions, classes, imports and their relationships
codebase = Codebase("./")
- Call Graph Generation
- Data Flow Analysis
- Control Flow Analysis
- Symbol Usage Analysis

# Work with code without dealing with syntax trees or parsing
for function in codebase.functions:
# Comprehensive static analysis for references, dependencies, etc.
if not function.usages:
# Auto-handles references and imports to maintain correctness
function.move_to_file("deprecated.py")
```
### 4. Code Quality Analysis

Write code that transforms code. Codegen combines the parsing power of [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) with the graph algorithms of [rustworkx](https://github.com/Qiskit/rustworkx) to enable scriptable, multi-language code manipulation at scale.
- Unused Code Detection
- Code Duplication Analysis
- Complexity Metrics
- Style and Convention Analysis

## Installation and Usage
### 5. Visualization Capabilities

We support
- Dependency Graphs
- Call Graphs
- Symbol Trees
- Heat Maps

- Running Codegen in Python 3.12 - 3.13 (recommended: Python 3.13+)
- macOS and Linux
- macOS is supported
- Linux is supported on x86_64 and aarch64 with glibc 2.34+
- Windows is supported via WSL. See [here](https://docs.codegen.com/building-with-codegen/codegen-with-wsl) for more details.
- Python, Typescript, Javascript and React codebases
### 6. Language-Specific Analysis

```
# Install inside existing project
uv pip install codegen
- Python-Specific Analysis
- TypeScript-Specific Analysis

# Install global CLI
uv tool install codegen --python 3.13
### 7. Code Metrics

# Create a codemod for a given repo
cd path/to/repo
codegen init
codegen create test-function
- Monthly Commits
- Cyclomatic Complexity
- Halstead Volume
- Maintainability Index

# Run the codemod
codegen run test-function
## Installation

# Create an isolated venv with codegen => open jupyter
codegen notebook
```
1. Clone the repository:

## Usage
```bash
git clone https://github.com/yourusername/codebase-analyzer.git
cd codebase-analyzer
```

See [Getting Started](https://docs.codegen.com/introduction/getting-started) for a full tutorial.
2. Install dependencies:

```
from codegen import Codebase
```bash
pip install -r requirements.txt
```

## Troubleshooting
## Usage

Having issues? Here are some common problems and their solutions:
### Analyzing a Repository

- **I'm hitting an UV error related to `[[ packages ]]`**: This means you're likely using an outdated version of UV. Try updating to the latest version with: `uv self update`.
- **I'm hitting an error about `No module named 'codegen.sdk.extensions.utils'`**: The compiled cython extensions are out of sync. Update them with `uv sync --reinstall-package codegen`.
- **I'm hitting a `RecursionError: maximum recursion depth exceeded` error while parsing my codebase**: If you are using python 3.12, try upgrading to 3.13. If you are already on 3.13, try upping the recursion limit with `sys.setrecursionlimit(10000)`.
```bash
# Analyze from URL
python codebase_analyzer.py --repo-url https://github.com/username/repo

If you run into additional issues not listed here, please [join our slack community](https://community.codegen.com) and we'll help you out!
# Analyze local repository
python codebase_analyzer.py --repo-path /path/to/repo

## Resources
# Specify language
python codebase_analyzer.py --repo-url https://github.com/username/repo --language python

- [Docs](https://docs.codegen.com)
- [Getting Started](https://docs.codegen.com/introduction/getting-started)
- [Contributing](CONTRIBUTING.md)
- [Contact Us](https://codegen.com/contact)
# Analyze specific categories
python codebase_analyzer.py --repo-url https://github.com/username/repo --categories codebase_structure code_quality
```

## Why Codegen?
### Output Formats

Software development is fundamentally programmatic. Refactoring a codebase, enforcing patterns, or analyzing control flow - these are all operations that can (and should) be expressed as programs themselves.
```bash
# Output as JSON
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format json --output-file analysis.json

We built Codegen backwards from real-world refactors performed on enterprise codebases. Instead of starting with theoretical abstractions, we focused on creating APIs that match how developers actually think about code changes:
# Generate HTML report
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format html --output-file report.html

- **Natural mental model**: Write transforms that read like your thought process - "move this function", "rename this variable", "add this parameter". No more wrestling with ASTs or manual import management.
# Print to console (default)
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format console
```

- **Battle-tested on complex codebases**: Handle Python, TypeScript, and React codebases with millions of lines of code.
## Available Analysis Categories

- **Built for advanced intelligences**: As AI developers become more sophisticated, they need expressive yet precise tools to manipulate code. Codegen provides a programmatic interface that both humans and AI can use to express complex transformations through code itself.
- `codebase_structure`: File statistics, symbol tree, import/export analysis, module organization
- `symbol_level`: Function, class, variable, and type analysis
- `dependency_flow`: Call graphs, data flow, control flow, symbol usage
- `code_quality`: Unused code, duplication, complexity, style
- `visualization`: Dependency graphs, call graphs, symbol trees, heat maps
- `language_specific`: Language-specific analysis features
- `code_metrics`: Commits, complexity, volume, maintainability

## Contributing
## Requirements

Please see our [Contributing Guide](CONTRIBUTING.md) for instructions on how to set up the development environment and submit contributions.
- Python 3.8+
- Codegen SDK
- NetworkX
- Matplotlib
- Rich

## Enterprise
## License

For more information on enterprise engagements, please [contact us](https://codegen.com/contact) or [request a demo](https://codegen.com/request-demo).
MIT
Loading
Loading