Skip to content

Commit b09df25

Browse files
authored
Codebase Analytics Dashboard Tutorial (#511)
1 parent c9aadf8 commit b09df25

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
title: "Building a Codebase Analytics Dashboard with Codegen"
3+
sidebarTitle: "Analytics Dashboard"
4+
icon: "calculator"
5+
iconType: "solid"
6+
---
7+
8+
This tutorial explains how codebase metrics are effiently calculated using the `codegen` library in the Codebase Analytics Dashboard. The metrics include indeces of codebase maintainabilith and complexity.
9+
10+
View the full code and setup instructions in our [codebase-analytics repository](https://github.com/codegen-sh/codebase-analytics).
11+
12+
## Line Metrics
13+
14+
Line metrics are used to determine the size and maintainability of the codebase.
15+
16+
### Lines of Code
17+
Lines of Code refers to the total number of lines in the source code, including blank lines and comments. This is accomplished with a simple count of all lines in the source file
18+
19+
### Logical Lines of Code (LLOC)
20+
LLOC is the amount of lines of code which contain actual functional statements. It excludes comments, blank lines, and other lines which do not contribute to the utility of the codebase.
21+
22+
### Source Lines of Code (SLOC)
23+
SLOC refers to the number of lines containing actual code, excluding blank lines. This includes programming language keywords and comments.
24+
25+
### Comment Density
26+
Comment density is calculated by dividing the lines of code which contain comments by the total lines of code in the codebase. The formula is:
27+
28+
```python
29+
"comment_density": (total_comments / total_loc * 100)
30+
```
31+
32+
It measures the proportion of comments in the codebase and is a good indicator of how much code is properly documented. Accordingly, it can show how maintainable and easy to understand the codebase is.
33+
34+
## Complexity Metrics
35+
36+
### Cyclomatic Complexity
37+
Cyclomatic Complexity measures the number of linearly independent paths through the codebase, making it a valuable indicator of how difficult code will be to test and maintain.
38+
39+
**Calculation Method**:
40+
- Base complexity of 1
41+
- +1 for each:
42+
- if statement
43+
- elif statement
44+
- for loop
45+
- while loop
46+
- +1 for each boolean operator (and, or) in conditions
47+
- +1 for each except block in try-catch statements
48+
49+
The `calculate_cyclomatic_complexity()` function traverses the Codgen codebase object and uses the above rules to find statement objects within each function and calculate the overall cyclomatic complexity of the codebase.
50+
51+
```python
52+
def calculate_cyclomatic_complexity(function):
53+
def analyze_statement(statement):
54+
complexity = 0
55+
56+
if isinstance(statement, IfBlockStatement):
57+
complexity += 1
58+
if hasattr(statement, "elif_statements"):
59+
complexity += len(statement.elif_statements)
60+
61+
elif isinstance(statement, (ForLoopStatement, WhileStatement)):
62+
complexity += 1
63+
64+
return complexity
65+
```
66+
67+
### Halstead Volume
68+
Halstead Volume is a software metric which measures the complexity of a codebase by counting the number of unique operators and operands. It is calculated by multiplying the sum of unique operators and operands by the logarithm base 2 of the sum of unique operators and operands.
69+
70+
**Halstead Volume**: `V = (N1 + N2) * log2(n1 + n2)`
71+
72+
This calculation uses codegen's expression types to make this calculation very efficient - these include BinaryExpression, UnaryExpression and ComparisonExpression. The function extracts operators and operands from the codebase object and calculated in `calculate_halstead_volume()` function.
73+
74+
```python
75+
def calculate_halstead_volume(operators, operands):
76+
n1 = len(set(operators))
77+
n2 = len(set(operands))
78+
79+
N1 = len(operators)
80+
N2 = len(operands)
81+
82+
N = N1 + N2
83+
n = n1 + n2
84+
85+
if n > 0:
86+
volume = N * math.log2(n)
87+
return volume, N1, N2, n1, n2
88+
return 0, N1, N2, n1, n2
89+
```
90+
91+
### Depth of Inheritance (DOI)
92+
Depth of Inheritance measures the length of inheritance chain for each class. It is calculated by counting the length of the superclasses list for each class in the codebase. The implementation is handled through a simple calculation using codegen's class information in the `calculate_doi()` function.
93+
94+
```python
95+
def calculate_doi(cls):
96+
return len(cls.superclasses)
97+
```
98+
99+
## Maintainability Index
100+
Maintainability Index is a software metric which measures how maintainable a codebase is. Maintainability is described as ease to support and change the code. This index is calculated as a factored formula consisting of SLOC (Source Lines Of Code), Cyclomatic Complexity and Halstead volume.
101+
102+
**Maintainability Index**: `M = 171 - 5.2 * ln(HV) - 0.23 * CC - 16.2 * ln(SLOC)`
103+
104+
This formula is then normalized to a scale of 0-100, where 100 is the maximum maintainability.
105+
106+
The implementation is handled through the `calculate_maintainability_index()` function. The codegen codebase object is used to efficiently extract the Cyclomatic Complexity and Halstead Volume for each function and class in the codebase, which are then used to calculate the maintainability index.
107+
108+
```python
109+
def calculate_maintainability_index(
110+
halstead_volume: float, cyclomatic_complexity: float, loc: int
111+
) -> int:
112+
"""Calculate the normalized maintainability index for a given function."""
113+
if loc <= 0:
114+
return 100
115+
116+
try:
117+
raw_mi = (
118+
171
119+
- 5.2 * math.log(max(1, halstead_volume))
120+
- 0.23 * cyclomatic_complexity
121+
- 16.2 * math.log(max(1, loc))
122+
)
123+
normalized_mi = max(0, min(100, raw_mi * 100 / 171))
124+
return int(normalized_mi)
125+
except (ValueError, TypeError):
126+
return 0
127+
```
128+
129+
## General Codebase Statistics
130+
The number of files is determined by traversing codegen's FileNode objects in the parsed codebase. The number of functions is calculated by counting FunctionDef nodes across all parsed files. The number of classes is obtained by summing ClassDef nodes throughout the codebase.
131+
132+
```python
133+
num_files = len(codebase.files(extensions="*"))
134+
num_functions = len(codebase.functions)
135+
num_classes = len(codebase.classes)
136+
```
137+
138+
The commit activity is calculated by using the git history of the repository. The number of commits is counted for each month in the last 12 months.
139+
140+
## Using the Analysis Tool (Modal Server)
141+
142+
The tool is implemented as a FastAPI application wrapped in a Modal deployment. To analyze a repository:
143+
144+
1. Send a POST request to `/analyze_repo` with the repository URL
145+
2. The tool will:
146+
- Clone the repository
147+
- Parse the codebase using codegen
148+
- Calculate all metrics
149+
- Return a comprehensive JSON response with all metrics
150+
151+
This is the only endpoint in the FastAPI server, as it takes care of the entire analysis process.
152+
153+
To run the FastAPI server locally, install all dependencies and run the server with `modal serve modal_main.py`.
154+
155+
## Frontend Dashboard
156+
157+
The frontend dashboard is implemented as a Next.js application. It is built using Shadcn/UI and uses the `@/components` directory for components.

0 commit comments

Comments
 (0)