|
1 |
| -<br /> |
| 1 | +# Comprehensive Codebase Analyzer |
2 | 2 |
|
3 |
| -<p align="center"> |
4 |
| - <a href="https://docs.codegen.com"> |
5 |
| - <img src="https://i.imgur.com/6RF9W0z.jpeg" /> |
6 |
| - </a> |
7 |
| -</p> |
| 3 | +A powerful static code analysis system that provides extensive information about your codebase using the Codegen SDK. |
8 | 4 |
|
9 |
| -<h2 align="center"> |
10 |
| - Scriptable interface to a powerful, multi-lingual language server. |
11 |
| -</h2> |
| 5 | +## Features |
12 | 6 |
|
13 |
| -<div align="center"> |
| 7 | +This analyzer provides comprehensive analysis of your codebase, including: |
14 | 8 |
|
15 |
| -[](https://pypi.org/project/codegen/) |
16 |
| -[](https://docs.codegen.com) |
17 |
| -[](https://community.codegen.com) |
18 |
| -[](https://github.com/codegen-sh/codegen-sdk/tree/develop?tab=Apache-2.0-1-ov-file) |
19 |
| -[](https://x.com/codegen) |
| 9 | +### 1. Codebase Structure Analysis |
20 | 10 |
|
21 |
| -</div> |
| 11 | +- File Statistics (count, language, size) |
| 12 | +- Symbol Tree Analysis |
| 13 | +- Import/Export Analysis |
| 14 | +- Module Organization |
22 | 15 |
|
23 |
| -<br /> |
| 16 | +### 2. Symbol-Level Analysis |
24 | 17 |
|
25 |
| -[Codegen](https://docs.codegen.com) is a python library for manipulating codebases. |
| 18 | +- Function Analysis (parameters, return types, complexity) |
| 19 | +- Class Analysis (methods, attributes, inheritance) |
| 20 | +- Variable Analysis |
| 21 | +- Type Analysis |
26 | 22 |
|
27 |
| -```python |
28 |
| -from codegen import Codebase |
| 23 | +### 3. Dependency and Flow Analysis |
29 | 24 |
|
30 |
| -# Codegen builds a complete graph connecting |
31 |
| -# functions, classes, imports and their relationships |
32 |
| -codebase = Codebase("./") |
| 25 | +- Call Graph Generation |
| 26 | +- Data Flow Analysis |
| 27 | +- Control Flow Analysis |
| 28 | +- Symbol Usage Analysis |
33 | 29 |
|
34 |
| -# Work with code without dealing with syntax trees or parsing |
35 |
| -for function in codebase.functions: |
36 |
| - # Comprehensive static analysis for references, dependencies, etc. |
37 |
| - if not function.usages: |
38 |
| - # Auto-handles references and imports to maintain correctness |
39 |
| - function.move_to_file("deprecated.py") |
40 |
| -``` |
| 30 | +### 4. Code Quality Analysis |
41 | 31 |
|
42 |
| -Write code that transforms code. Codegen combines the parsing power of [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) with the graph algorithms of [rustworkx](https://github.com/Qiskit/rustworkx) to enable scriptable, multi-language code manipulation at scale. |
| 32 | +- Unused Code Detection |
| 33 | +- Code Duplication Analysis |
| 34 | +- Complexity Metrics |
| 35 | +- Style and Convention Analysis |
43 | 36 |
|
44 |
| -## Installation and Usage |
| 37 | +### 5. Visualization Capabilities |
45 | 38 |
|
46 |
| -We support |
| 39 | +- Dependency Graphs |
| 40 | +- Call Graphs |
| 41 | +- Symbol Trees |
| 42 | +- Heat Maps |
47 | 43 |
|
48 |
| -- Running Codegen in Python 3.12 - 3.13 (recommended: Python 3.13+) |
49 |
| -- macOS and Linux |
50 |
| - - macOS is supported |
51 |
| - - Linux is supported on x86_64 and aarch64 with glibc 2.34+ |
52 |
| - - Windows is supported via WSL. See [here](https://docs.codegen.com/building-with-codegen/codegen-with-wsl) for more details. |
53 |
| -- Python, Typescript, Javascript and React codebases |
| 44 | +### 6. Language-Specific Analysis |
54 | 45 |
|
55 |
| -``` |
56 |
| -# Install inside existing project |
57 |
| -uv pip install codegen |
| 46 | +- Python-Specific Analysis |
| 47 | +- TypeScript-Specific Analysis |
58 | 48 |
|
59 |
| -# Install global CLI |
60 |
| -uv tool install codegen --python 3.13 |
| 49 | +### 7. Code Metrics |
61 | 50 |
|
62 |
| -# Create a codemod for a given repo |
63 |
| -cd path/to/repo |
64 |
| -codegen init |
65 |
| -codegen create test-function |
| 51 | +- Monthly Commits |
| 52 | +- Cyclomatic Complexity |
| 53 | +- Halstead Volume |
| 54 | +- Maintainability Index |
66 | 55 |
|
67 |
| -# Run the codemod |
68 |
| -codegen run test-function |
| 56 | +## Installation |
69 | 57 |
|
70 |
| -# Create an isolated venv with codegen => open jupyter |
71 |
| -codegen notebook |
72 |
| -``` |
| 58 | +1. Clone the repository: |
73 | 59 |
|
74 |
| -## Usage |
| 60 | +```bash |
| 61 | +git clone https://github.com/yourusername/codebase-analyzer.git |
| 62 | +cd codebase-analyzer |
| 63 | +``` |
75 | 64 |
|
76 |
| -See [Getting Started](https://docs.codegen.com/introduction/getting-started) for a full tutorial. |
| 65 | +2. Install dependencies: |
77 | 66 |
|
78 |
| -``` |
79 |
| -from codegen import Codebase |
| 67 | +```bash |
| 68 | +pip install -r requirements.txt |
80 | 69 | ```
|
81 | 70 |
|
82 |
| -## Troubleshooting |
| 71 | +## Usage |
83 | 72 |
|
84 |
| -Having issues? Here are some common problems and their solutions: |
| 73 | +### Analyzing a Repository |
85 | 74 |
|
86 |
| -- **I'm hitting an UV error related to `[[ packages ]]`**: This means you're likely using an outdated version of UV. Try updating to the latest version with: `uv self update`. |
87 |
| -- **I'm hitting an error about `No module named 'codegen.sdk.extensions.utils'`**: The compiled cython extensions are out of sync. Update them with `uv sync --reinstall-package codegen`. |
88 |
| -- **I'm hitting a `RecursionError: maximum recursion depth exceeded` error while parsing my codebase**: If you are using python 3.12, try upgrading to 3.13. If you are already on 3.13, try upping the recursion limit with `sys.setrecursionlimit(10000)`. |
| 75 | +```bash |
| 76 | +# Analyze from URL |
| 77 | +python codebase_analyzer.py --repo-url https://github.com/username/repo |
89 | 78 |
|
90 |
| -If you run into additional issues not listed here, please [join our slack community](https://community.codegen.com) and we'll help you out! |
| 79 | +# Analyze local repository |
| 80 | +python codebase_analyzer.py --repo-path /path/to/repo |
91 | 81 |
|
92 |
| -## Resources |
| 82 | +# Specify language |
| 83 | +python codebase_analyzer.py --repo-url https://github.com/username/repo --language python |
93 | 84 |
|
94 |
| -- [Docs](https://docs.codegen.com) |
95 |
| -- [Getting Started](https://docs.codegen.com/introduction/getting-started) |
96 |
| -- [Contributing](CONTRIBUTING.md) |
97 |
| -- [Contact Us](https://codegen.com/contact) |
| 85 | +# Analyze specific categories |
| 86 | +python codebase_analyzer.py --repo-url https://github.com/username/repo --categories codebase_structure code_quality |
| 87 | +``` |
98 | 88 |
|
99 |
| -## Why Codegen? |
| 89 | +### Output Formats |
100 | 90 |
|
101 |
| -Software development is fundamentally programmatic. Refactoring a codebase, enforcing patterns, or analyzing control flow - these are all operations that can (and should) be expressed as programs themselves. |
| 91 | +```bash |
| 92 | +# Output as JSON |
| 93 | +python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format json --output-file analysis.json |
102 | 94 |
|
103 |
| -We built Codegen backwards from real-world refactors performed on enterprise codebases. Instead of starting with theoretical abstractions, we focused on creating APIs that match how developers actually think about code changes: |
| 95 | +# Generate HTML report |
| 96 | +python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format html --output-file report.html |
104 | 97 |
|
105 |
| -- **Natural mental model**: Write transforms that read like your thought process - "move this function", "rename this variable", "add this parameter". No more wrestling with ASTs or manual import management. |
| 98 | +# Print to console (default) |
| 99 | +python codebase_analyzer.py --repo-url https://github.com/username/repo --output-format console |
| 100 | +``` |
106 | 101 |
|
107 |
| -- **Battle-tested on complex codebases**: Handle Python, TypeScript, and React codebases with millions of lines of code. |
| 102 | +## Available Analysis Categories |
108 | 103 |
|
109 |
| -- **Built for advanced intelligences**: As AI developers become more sophisticated, they need expressive yet precise tools to manipulate code. Codegen provides a programmatic interface that both humans and AI can use to express complex transformations through code itself. |
| 104 | +- `codebase_structure`: File statistics, symbol tree, import/export analysis, module organization |
| 105 | +- `symbol_level`: Function, class, variable, and type analysis |
| 106 | +- `dependency_flow`: Call graphs, data flow, control flow, symbol usage |
| 107 | +- `code_quality`: Unused code, duplication, complexity, style |
| 108 | +- `visualization`: Dependency graphs, call graphs, symbol trees, heat maps |
| 109 | +- `language_specific`: Language-specific analysis features |
| 110 | +- `code_metrics`: Commits, complexity, volume, maintainability |
110 | 111 |
|
111 |
| -## Contributing |
| 112 | +## Requirements |
112 | 113 |
|
113 |
| -Please see our [Contributing Guide](CONTRIBUTING.md) for instructions on how to set up the development environment and submit contributions. |
| 114 | +- Python 3.8+ |
| 115 | +- Codegen SDK |
| 116 | +- NetworkX |
| 117 | +- Matplotlib |
| 118 | +- Rich |
114 | 119 |
|
115 |
| -## Enterprise |
| 120 | +## License |
116 | 121 |
|
117 |
| -For more information on enterprise engagements, please [contact us](https://codegen.com/contact) or [request a demo](https://codegen.com/request-demo). |
| 122 | +MIT |
0 commit comments