Skip to content

Neo4j Extension and Tutorial #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/images/neo4j-call-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/neo4j-class-hierarchy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/neo4j-class-methods.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/neo4j-function-calls.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@
"tutorials/fixing-import-loops-in-pytorch",
"tutorials/python2-to-python3",
"tutorials/flask-to-fastapi",
"tutorials/build-mcp"
"tutorials/build-mcp",
"tutorials/neo4j-graph"
]
},
{
Expand Down
93 changes: 93 additions & 0 deletions docs/tutorials/neo4j-graph.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: "Neo4j Graph"
sidebarTitle: "Neo4j Graph"
icon: "database"
iconType: "solid"
---

<Frame caption="Function call graph for a codebase">
<img src="/images/neo4j-call-graph.png" />
</Frame>

# Neo4j Graph

Codegen can export codebase graphs to Neo4j for visualization and analysis.

## Installation
In order to use Neo4j you will need to install it and run it locally using Docker.

### Neo4j
First, install Neo4j using the official [installation guide](https://neo4j.com/docs/desktop-manual/current/installation/download-installation/).

### Docker
To run Neo4j locally using Docker, follow the instructions [here](https://neo4j.com/docs/apoc/current/installation/#docker).

## Launch Neo4j Locally

```bash
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:latest
```
## Usage

```python
from codegen import Codebase
from codegen.extensions.graph.main import visualize_codebase

# parse codebase
codebase = Codebase("path/to/codebase")

# export to Neo4j
visualize_codebase(codebase, "bolt://localhost:7687", "neo4j", "password")
```

## Visualization

Once exported, you can open the Neo4j browser at `http://localhost:7474`, sign in with the username `neo4j` and the password `password`, and use the following Cypher queries to visualize the codebase:

### Class Hierarchy

```cypher
Match (s: Class )-[r: INHERITS_FROM*]-> (e:Class) RETURN s, e LIMIT 10
```
<Frame caption="Class hierarchy for a codebase">
<img src="/images/neo4j-class-hierarchy.png" />
</Frame>

### Methods Defined by Each Class

```cypher
Match (s: Class )-[r: DEFINES]-> (e:Method) RETURN s, e LIMIT 10
```
<Frame caption="Methods defined by each class">
<img src="/images/neo4j-class-methods.png" />
</Frame>

### Function Calls

```cypher
Match (s: Func )-[r: CALLS]-> (e:Func) RETURN s, e LIMIT 10
```

<Frame caption="Function call graph for a codebase">
<img src="/images/neo4j-function-calls.png" />
</Frame>

### Call Graph

```cypher
Match path = (:(Method|Func)) -[:CALLS*5..10]-> (:(Method|Func))
Return path
LIMIT 20
```

<Frame caption="Call graph for a codebase">
<img src="/images/neo4j-call-graph.png" />
</Frame>
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ dependencies = [
"langchain_openai",
"numpy>=2.2.2",
"mcp[cli]",
"neo4j",
]

license = { text = "Apache-2.0" }
Expand Down
Empty file.
137 changes: 137 additions & 0 deletions src/codegen/extensions/graph/create_graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
from typing import Optional

from codegen.extensions.graph.utils import Node, NodeLabel, Relation, RelationLabel, SimpleGraph
from codegen.sdk.code_generation.doc_utils.utils import safe_get_class
from codegen.sdk.core.class_definition import Class
from codegen.sdk.core.external_module import ExternalModule
from codegen.sdk.core.function import Function
from codegen.sdk.python.class_definition import PyClass


def create_codebase_graph(codebase):
"""Create a SimpleGraph representing the codebase structure."""
# Initialize graph
graph = SimpleGraph()

# Track existing nodes by name to prevent duplicates
node_registry = {} # name -> node_id mapping

def get_or_create_node(name: str, label: NodeLabel, parent_name: Optional[str] = None, properties: dict | None = None):
"""Get existing node or create new one if it doesn't exist."""
full_name = f"{parent_name}.{name}" if parent_name and parent_name != "Class" else name
if full_name in node_registry:
return graph.nodes[node_registry[full_name]]

node = Node(name=name, full_name=full_name, label=label.value, properties=properties or {})
node_registry[full_name] = node.id
graph.add_node(node)
return node

def create_class_node(class_def):
"""Create a node for a class definition."""
return get_or_create_node(
name=class_def.name,
label=NodeLabel.CLASS,
properties={
"filepath": class_def.filepath if hasattr(class_def, "filepath") else "",
"source": class_def.source if hasattr(class_def, "source") else "",
"type": "class",
},
)

def create_function_node(func):
"""Create a node for a function/method."""
class_name = None
if func.is_method:
class_name = func.parent_class.name

return get_or_create_node(
name=func.name,
label=NodeLabel.METHOD if class_name else NodeLabel.FUNCTION,
parent_name=class_name,
properties={
"filepath": func.filepath if hasattr(func, "filepath") else "",
"is_async": func.is_async if hasattr(func, "is_async") else False,
"source": func.source if hasattr(func, "source") else "",
"type": "method" if class_name else "function",
},
)

def create_function_call_node(func_call):
"""Create a node for a function call."""
func_def = func_call.function_definition
if not func_def:
return None
if isinstance(func_def, ExternalModule):
parent_class = safe_get_class(codebase, func_def.name)
if parent_class and parent_class.get_method(func_call.name):
return create_function_node(parent_class.get_method(func_call.name))
else:
return None

call_node = None
if isinstance(func_def, Function):
call_node = create_function_node(func_def)

elif isinstance(func_def, Class):
call_node = create_class_node(func_def)

return call_node

# Process all classes
for class_def in codebase.classes:
class_node = create_class_node(class_def)

# Process methods
methods = class_def.methods
for method in methods:
method_node = create_function_node(method)

# Add DEFINES relation
defines_relation = Relation(
label=RelationLabel.DEFINES.value, source_id=class_node.id, target_id=method_node.id, properties={"relationship_description": "The parent class defines the method."}
)
graph.add_relation(defines_relation)

for call in method.function_calls:
call_node = create_function_call_node(call)
if call_node and call_node != method_node:
call_relation = Relation(
label=RelationLabel.CALLS.value, source_id=method_node.id, target_id=call_node.id, properties={"relationship_description": f"The method calls the {call_node.label}."}
)
graph.add_relation(call_relation)

# Add inheritance relations
if class_def.parent_classes:
for parent in class_def.parent_classes:
if not isinstance(parent, PyClass):
try:
parent = codebase.get_class(parent.name, optional=True)
if not parent:
continue
except Exception as e:
print(f"parent not found: {e}")
continue
if not hasattr(parent, "name"):
continue
parent_node = create_class_node(parent)

inherits_relation = Relation(
label=RelationLabel.INHERITS_FROM.value,
source_id=class_node.id,
target_id=parent_node.id,
properties={"relationship_description": "The child class inherits from the parent class."},
)
graph.add_relation(inherits_relation)

for func in codebase.functions:
func_node = create_function_node(func)
for call in func.function_calls:
call_node = create_function_call_node(call)
if call_node and call_node != func_node:
call_relation = Relation(
label=RelationLabel.CALLS.value, source_id=func_node.id, target_id=call_node.id, properties={"relationship_description": f"The function calls the {call_node.label}."}
)
graph.add_relation(call_relation)

return graph
43 changes: 43 additions & 0 deletions src/codegen/extensions/graph/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from codegen import Codebase
from codegen.extensions.graph.create_graph import create_codebase_graph
from codegen.extensions.graph.neo4j_exporter import Neo4jExporter
from codegen.shared.enums.programming_language import ProgrammingLanguage


def visualize_codebase(codebase, neo4j_uri: str, username: str, password: str):
"""Create and visualize a codebase graph in Neo4j.

Args:
codebase: The codebase object to analyze
neo4j_uri: URI for Neo4j database
username: Neo4j username
password: Neo4j password
"""
# Create the graph using your existing function
graph = create_codebase_graph(codebase)

# Export to Neo4j
exporter = Neo4jExporter(neo4j_uri, username, password)
try:
exporter.export_graph(graph)
print("Successfully exported graph to Neo4j")

# Print some useful Cypher queries for visualization
print("\nUseful Cypher queries for visualization:")
print("\n1. View all nodes and relationships:")
print("MATCH (n)-[r]->(m) RETURN n, r, m")

print("\n2. View class hierarchy:")
print("MATCH (c:Class)-[r:INHERITS_FROM]->(parent:Class) RETURN c, r, parent")

print("\n3. View methods defined by each class:")
print("MATCH (c:Class)-[r:DEFINES]->(m:Method) RETURN c, r, m")

finally:
exporter.close()


if __name__ == "__main__":
# Initialize codebase
codebase = Codebase("../../", programming_language=ProgrammingLanguage.PYTHON)

Check failure on line 42 in src/codegen/extensions/graph/main.py

View workflow job for this annotation

GitHub Actions / mypy

error: Need type annotation for "codebase" [var-annotated]
visualize_codebase(codebase, "bolt://localhost:7687", "neo4j", "password")
49 changes: 49 additions & 0 deletions src/codegen/extensions/graph/neo4j_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from neo4j import GraphDatabase

from codegen.extensions.graph.utils import SimpleGraph


class Neo4jExporter:
"""Class to handle exporting the codebase graph to Neo4j."""

def __init__(self, uri: str, username: str, password: str):
"""Initialize Neo4j connection."""
self.driver = GraphDatabase.driver(uri, auth=(username, password))

def close(self):
"""Close the Neo4j connection."""
self.driver.close()

def clear_database(self):
"""Clear all nodes and relationships in the database."""
with self.driver.session() as session:
session.run("MATCH (n) DETACH DELETE n")

def export_graph(self, graph: SimpleGraph):
"""Export the SimpleGraph to Neo4j."""
self.clear_database()

with self.driver.session() as session:
# Create nodes
for node in graph.nodes.values():
properties = {"name": node.name, "full_name": node.full_name, **{k: str(v) if isinstance(v, (dict, list)) else v for k, v in node.properties.items()}}

query = f"CREATE (n:{node.label} {{{', '.join(f'{k}: ${k}' for k in properties.keys())}}})"
session.run(query, properties)

# Create relationships
for relation in graph.relations:
source_node = graph.nodes[relation.source_id]
target_node = graph.nodes[relation.target_id]

properties = {**{k: str(v) if isinstance(v, (dict, list)) else v for k, v in relation.properties.items()}}

query = (
f"MATCH (source:{source_node.label} {{full_name: $source_name}}), "
f"(target:{target_node.label} {{full_name: $target_name}}) "
f"CREATE (source)-[r:{relation.label} "
f"{{{', '.join(f'{k}: ${k}' for k in properties.keys())}}}]->"
f"(target)"
)

session.run(query, {"source_name": source_node.full_name, "target_name": target_node.full_name, **properties})
Loading
Loading