Skip to content

Commit 54e83bf

Browse files
authored
Merge pull request #3 from Zeeeepa/codegen-bot/comprehensive-codebase-analyzer
Add Comprehensive Codebase Analyzer
2 parents b36c180 + 503da96 commit 54e83bf

File tree

4 files changed

+1957
-87
lines changed

4 files changed

+1957
-87
lines changed

README.md

Lines changed: 83 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,122 @@
1-
<br />
1+
# Comprehensive Codebase Analyzer
22

3-
<p align="center">
4-
<a href="https://docs.codegen.com">
5-
<img src="https://i.imgur.com/6RF9W0z.jpeg" />
6-
</a>
7-
</p>
3+
A powerful static code analysis system that provides extensive information about your codebase using the Codegen SDK.
84

9-
<h2 align="center">
10-
Scriptable interface to a powerful, multi-lingual language server.
11-
</h2>
5+
## Features
126

13-
<div align="center">
7+
This analyzer provides comprehensive analysis of your codebase, including:
148

15-
[![PyPI](https://img.shields.io/badge/PyPi-codegen-gray?style=flat-square&color=blue)](https://pypi.org/project/codegen/)
16-
[![Documentation](https://img.shields.io/badge/Docs-docs.codegen.com-purple?style=flat-square)](https://docs.codegen.com)
17-
[![Slack Community](https://img.shields.io/badge/Slack-Join-4A154B?logo=slack&style=flat-square)](https://community.codegen.com)
18-
[![License](https://img.shields.io/badge/Code%20License-Apache%202.0-gray?&color=gray)](https://github.com/codegen-sh/codegen-sdk/tree/develop?tab=Apache-2.0-1-ov-file)
19-
[![Follow on X](https://img.shields.io/twitter/follow/codegen?style=social)](https://x.com/codegen)
9+
### 1. Codebase Structure Analysis
2010

21-
</div>
11+
- File Statistics (count, language, size)
12+
- Symbol Tree Analysis
13+
- Import/Export Analysis
14+
- Module Organization
2215

23-
<br />
16+
### 2. Symbol-Level Analysis
2417

25-
[Codegen](https://docs.codegen.com) is a python library for manipulating codebases.
18+
- Function Analysis (parameters, return types, complexity)
19+
- Class Analysis (methods, attributes, inheritance)
20+
- Variable Analysis
21+
- Type Analysis
2622

27-
```python
28-
from codegen import Codebase
23+
### 3. Dependency and Flow Analysis
2924

30-
# Codegen builds a complete graph connecting
31-
# functions, classes, imports and their relationships
32-
codebase = Codebase("./")
25+
- Call Graph Generation
26+
- Data Flow Analysis
27+
- Control Flow Analysis
28+
- Symbol Usage Analysis
3329

34-
# Work with code without dealing with syntax trees or parsing
35-
for function in codebase.functions:
36-
# Comprehensive static analysis for references, dependencies, etc.
37-
if not function.usages:
38-
# Auto-handles references and imports to maintain correctness
39-
function.move_to_file("deprecated.py")
40-
```
30+
### 4. Code Quality Analysis
4131

42-
Write code that transforms code. Codegen combines the parsing power of [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) with the graph algorithms of [rustworkx](https://github.com/Qiskit/rustworkx) to enable scriptable, multi-language code manipulation at scale.
32+
- Unused Code Detection
33+
- Code Duplication Analysis
34+
- Complexity Metrics
35+
- Style and Convention Analysis
4336

44-
## Installation and Usage
37+
### 5. Visualization Capabilities
4538

46-
We support
39+
- Dependency Graphs
40+
- Call Graphs
41+
- Symbol Trees
42+
- Heat Maps
4743

48-
- Running Codegen in Python 3.12 - 3.13 (recommended: Python 3.13+)
49-
- macOS and Linux
50-
- macOS is supported
51-
- Linux is supported on x86_64 and aarch64 with glibc 2.34+
52-
- Windows is supported via WSL. See [here](https://docs.codegen.com/building-with-codegen/codegen-with-wsl) for more details.
53-
- Python, Typescript, Javascript and React codebases
44+
### 6. Language-Specific Analysis
5445

55-
```
56-
# Install inside existing project
57-
uv pip install codegen
46+
- Python-Specific Analysis
47+
- TypeScript-Specific Analysis
5848

59-
# Install global CLI
60-
uv tool install codegen --python 3.13
49+
### 7. Code Metrics
6150

62-
# Create a codemod for a given repo
63-
cd path/to/repo
64-
codegen init
65-
codegen create test-function
51+
- Monthly Commits
52+
- Cyclomatic Complexity
53+
- Halstead Volume
54+
- Maintainability Index
6655

67-
# Run the codemod
68-
codegen run test-function
56+
## Installation
6957

70-
# Create an isolated venv with codegen => open jupyter
71-
codegen notebook
72-
```
58+
1. Clone the repository:
7359

74-
## Usage
60+
```bash
61+
git clone https://github.com/yourusername/codebase-analyzer.git
62+
cd codebase-analyzer
63+
```
7564

76-
See [Getting Started](https://docs.codegen.com/introduction/getting-started) for a full tutorial.
65+
2. Install dependencies:
7766

78-
```
79-
from codegen import Codebase
67+
```bash
68+
pip install -r requirements.txt
8069
```
8170

82-
## Troubleshooting
71+
## Usage
8372

84-
Having issues? Here are some common problems and their solutions:
73+
### Analyzing a Repository
8574

86-
- **I'm hitting an UV error related to `[[ packages ]]`**: This means you're likely using an outdated version of UV. Try updating to the latest version with: `uv self update`.
87-
- **I'm hitting an error about `No module named 'codegen.sdk.extensions.utils'`**: The compiled cython extensions are out of sync. Update them with `uv sync --reinstall-package codegen`.
88-
- **I'm hitting a `RecursionError: maximum recursion depth exceeded` error while parsing my codebase**: If you are using python 3.12, try upgrading to 3.13. If you are already on 3.13, try upping the recursion limit with `sys.setrecursionlimit(10000)`.
75+
```bash
76+
# Analyze from URL
77+
python codebase_analyzer.py --repo-url https://github.com/username/repo
8978

90-
If you run into additional issues not listed here, please [join our slack community](https://community.codegen.com) and we'll help you out!
79+
# Analyze local repository
80+
python codebase_analyzer.py --repo-path /path/to/repo
9181

92-
## Resources
82+
# Specify language
83+
python codebase_analyzer.py --repo-url https://github.com/username/repo --language python
9384

94-
- [Docs](https://docs.codegen.com)
95-
- [Getting Started](https://docs.codegen.com/introduction/getting-started)
96-
- [Contributing](CONTRIBUTING.md)
97-
- [Contact Us](https://codegen.com/contact)
85+
# Analyze specific categories
86+
python codebase_analyzer.py --repo-url https://github.com/username/repo --categories codebase_structure code_quality
87+
```
9888

99-
## Why Codegen?
89+
### Output Formats
10090

101-
Software development is fundamentally programmatic. Refactoring a codebase, enforcing patterns, or analyzing control flow - these are all operations that can (and should) be expressed as programs themselves.
91+
```bash
92+
# Output as JSON
93+
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format json --output-file analysis.json
10294

103-
We built Codegen backwards from real-world refactors performed on enterprise codebases. Instead of starting with theoretical abstractions, we focused on creating APIs that match how developers actually think about code changes:
95+
# Generate HTML report
96+
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format html --output-file report.html
10497

105-
- **Natural mental model**: Write transforms that read like your thought process - "move this function", "rename this variable", "add this parameter". No more wrestling with ASTs or manual import management.
98+
# Print to console (default)
99+
python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format console
100+
```
106101

107-
- **Battle-tested on complex codebases**: Handle Python, TypeScript, and React codebases with millions of lines of code.
102+
## Available Analysis Categories
108103

109-
- **Built for advanced intelligences**: As AI developers become more sophisticated, they need expressive yet precise tools to manipulate code. Codegen provides a programmatic interface that both humans and AI can use to express complex transformations through code itself.
104+
- `codebase_structure`: File statistics, symbol tree, import/export analysis, module organization
105+
- `symbol_level`: Function, class, variable, and type analysis
106+
- `dependency_flow`: Call graphs, data flow, control flow, symbol usage
107+
- `code_quality`: Unused code, duplication, complexity, style
108+
- `visualization`: Dependency graphs, call graphs, symbol trees, heat maps
109+
- `language_specific`: Language-specific analysis features
110+
- `code_metrics`: Commits, complexity, volume, maintainability
110111

111-
## Contributing
112+
## Requirements
112113

113-
Please see our [Contributing Guide](CONTRIBUTING.md) for instructions on how to set up the development environment and submit contributions.
114+
- Python 3.8+
115+
- Codegen SDK
116+
- NetworkX
117+
- Matplotlib
118+
- Rich
114119

115-
## Enterprise
120+
## License
116121

117-
For more information on enterprise engagements, please [contact us](https://codegen.com/contact) or [request a demo](https://codegen.com/request-demo).
122+
MIT

0 commit comments

Comments
 (0)