|
| 1 | +--- |
| 2 | +title: "Deep Code Research with AI" |
| 3 | +sidebarTitle: "Code Research Agent" |
| 4 | +icon: "magnifying-glass" |
| 5 | +iconType: "solid" |
| 6 | +--- |
| 7 | + |
| 8 | +This guide demonstrates how to build an intelligent code research tool that can analyze and explain codebases using Codegen's and LangChain. The tool combines semantic code search, dependency analysis, and natural language understanding to help developers quickly understand new codebases. |
| 9 | + |
| 10 | +<Info>View the full code on [GitHub](https://github.com/codegen-sh/codegen-sdk/tree/develop/codegen-examples/examples/deep_code_research)</Info> |
| 11 | + |
| 12 | +<Tip>This example works with any public GitHub repository - just provide the repo name in the format owner/repo</Tip> |
| 13 | + |
| 14 | +## Overview |
| 15 | + |
| 16 | +The process involves three main components: |
| 17 | + |
| 18 | +1. A CLI interface for interacting with the research agent |
| 19 | +2. A set of code analysis tools powered by Codegen |
| 20 | +3. An LLM-powered agent that combines the tools to answer questions |
| 21 | + |
| 22 | +Let's walk through building each component. |
| 23 | + |
| 24 | +## Step 1: Setting Up the Research Tools |
| 25 | + |
| 26 | +First, let's import the necessary components and set up our research tools: |
| 27 | + |
| 28 | +```python |
| 29 | +from codegen import Codebase |
| 30 | +from codegen.extensions.langchain.agent import create_agent_with_tools |
| 31 | +from codegen.extensions.langchain.tools import ( |
| 32 | + ListDirectoryTool, |
| 33 | + RevealSymbolTool, |
| 34 | + SearchTool, |
| 35 | + SemanticSearchTool, |
| 36 | + ViewFileTool, |
| 37 | +) |
| 38 | +from langchain_core.messages import SystemMessage |
| 39 | +``` |
| 40 | + |
| 41 | +We'll create a function to initialize our codebase with a nice progress indicator: |
| 42 | + |
| 43 | +```python |
| 44 | +def initialize_codebase(repo_name: str) -> Optional[Codebase]: |
| 45 | + """Initialize a codebase with a spinner showing progress.""" |
| 46 | + with console.status("") as status: |
| 47 | + try: |
| 48 | + status.update(f"[bold blue]Cloning {repo_name}...[/bold blue]") |
| 49 | + codebase = Codebase.from_repo(repo_name) |
| 50 | + status.update("[bold green]✓ Repository cloned successfully![/bold green]") |
| 51 | + return codebase |
| 52 | + except Exception as e: |
| 53 | + console.print(f"[bold red]Error initializing codebase:[/bold red] {e}") |
| 54 | + return None |
| 55 | +``` |
| 56 | + |
| 57 | +Then we'll set up our research tools: |
| 58 | + |
| 59 | +```python |
| 60 | +# Create research tools |
| 61 | +tools = [ |
| 62 | + ViewFileTool(codebase), # View file contents |
| 63 | + ListDirectoryTool(codebase), # Explore directory structure |
| 64 | + SearchTool(codebase), # Text-based search |
| 65 | + SemanticSearchTool(codebase), # Natural language search |
| 66 | + RevealSymbolTool(codebase), # Analyze symbol relationships |
| 67 | +] |
| 68 | +``` |
| 69 | + |
| 70 | +Each tool provides specific capabilities: |
| 71 | +- `ViewFileTool`: Read and understand file contents |
| 72 | +- `ListDirectoryTool`: Explore the codebase structure |
| 73 | +- `SearchTool`: Find specific code patterns |
| 74 | +- `SemanticSearchTool`: Search using natural language |
| 75 | +- `RevealSymbolTool`: Analyze dependencies and usages |
| 76 | + |
| 77 | +## Step 2: Creating the Research Agent |
| 78 | + |
| 79 | +Next, we'll create an agent that can use these tools intelligently. We'll give it a detailed prompt about its role: |
| 80 | + |
| 81 | +```python |
| 82 | +RESEARCH_AGENT_PROMPT = """You are a code research expert. Your goal is to help users understand codebases by: |
| 83 | +1. Finding relevant code through semantic and text search |
| 84 | +2. Analyzing symbol relationships and dependencies |
| 85 | +3. Exploring directory structures |
| 86 | +4. Reading and explaining code |
| 87 | +
|
| 88 | +Always explain your findings in detail and provide context about how different parts of the code relate to each other. |
| 89 | +When analyzing code, consider: |
| 90 | +- The purpose and functionality of each component |
| 91 | +- How different parts interact |
| 92 | +- Key patterns and design decisions |
| 93 | +- Potential areas for improvement |
| 94 | +
|
| 95 | +Break down complex concepts into understandable pieces and use examples when helpful.""" |
| 96 | + |
| 97 | +# Initialize the agent |
| 98 | +agent = create_agent_with_tools( |
| 99 | + codebase=codebase, |
| 100 | + tools=tools, |
| 101 | + chat_history=[SystemMessage(content=RESEARCH_AGENT_PROMPT)], |
| 102 | + verbose=True |
| 103 | +) |
| 104 | +``` |
| 105 | + |
| 106 | +## Step 3: Building the CLI Interface |
| 107 | + |
| 108 | +Finally, we'll create a user-friendly CLI interface using rich-click: |
| 109 | + |
| 110 | +```python |
| 111 | +import rich_click as click |
| 112 | +from rich.console import Console |
| 113 | +from rich.markdown import Markdown |
| 114 | + |
| 115 | +@click.group() |
| 116 | +def cli(): |
| 117 | + """🔍 Codegen Code Research CLI""" |
| 118 | + pass |
| 119 | + |
| 120 | +@cli.command() |
| 121 | +@click.argument("repo_name", required=False) |
| 122 | +@click.option("--query", "-q", default=None, help="Initial research query.") |
| 123 | +def research(repo_name: Optional[str] = None, query: Optional[str] = None): |
| 124 | + """Start a code research session.""" |
| 125 | + # Initialize codebase |
| 126 | + codebase = initialize_codebase(repo_name) |
| 127 | + |
| 128 | + # Create and run the agent |
| 129 | + agent = create_research_agent(codebase) |
| 130 | + |
| 131 | + # Main research loop |
| 132 | + while True: |
| 133 | + if not query: |
| 134 | + query = Prompt.ask("[bold cyan]Research query[/bold cyan]") |
| 135 | + |
| 136 | + result = agent.invoke({"input": query}) |
| 137 | + console.print(Markdown(result["output"])) |
| 138 | + |
| 139 | + query = None # Clear for next iteration |
| 140 | +``` |
| 141 | + |
| 142 | +## Using the Research Tool |
| 143 | + |
| 144 | +You can use the tool in several ways: |
| 145 | + |
| 146 | +1. Interactive mode (will prompt for repo): |
| 147 | +```bash |
| 148 | +python run.py research |
| 149 | +``` |
| 150 | + |
| 151 | +2. Specify a repository: |
| 152 | +```bash |
| 153 | +python run.py research "fastapi/fastapi" |
| 154 | +``` |
| 155 | + |
| 156 | +3. Start with an initial query: |
| 157 | +```bash |
| 158 | +python run.py research "fastapi/fastapi" -q "Explain the main components" |
| 159 | +``` |
| 160 | + |
| 161 | +Example research queries: |
| 162 | +- "Explain the main components and their relationships" |
| 163 | +- "Find all usages of the FastAPI class" |
| 164 | +- "Show me the dependency graph for the routing module" |
| 165 | +- "What design patterns are used in this codebase?" |
| 166 | + |
| 167 | +<Tip> |
| 168 | + The agent maintains conversation history, so you can ask follow-up questions |
| 169 | + and build on previous findings. |
| 170 | +</Tip> |
| 171 | + |
| 172 | +## Advanced Usage |
| 173 | + |
| 174 | +### Custom Research Tools |
| 175 | + |
| 176 | +You can extend the agent with custom tools for specific analysis needs: |
| 177 | + |
| 178 | +```python |
| 179 | +from langchain.tools import BaseTool |
| 180 | +from pydantic import BaseModel, Field |
| 181 | + |
| 182 | +class CustomAnalysisTool(BaseTool): |
| 183 | + """Custom tool for specialized code analysis.""" |
| 184 | + name = "custom_analysis" |
| 185 | + description = "Performs specialized code analysis" |
| 186 | + |
| 187 | + def _run(self, query: str) -> str: |
| 188 | + # Custom analysis logic |
| 189 | + return results |
| 190 | + |
| 191 | +# Add to tools list |
| 192 | +tools.append(CustomAnalysisTool()) |
| 193 | +``` |
| 194 | + |
| 195 | +### Customizing the Agent |
| 196 | + |
| 197 | +You can modify the agent's behavior by adjusting its prompt: |
| 198 | + |
| 199 | +```python |
| 200 | +CUSTOM_PROMPT = """You are a specialized code reviewer focused on: |
| 201 | +1. Security best practices |
| 202 | +2. Performance optimization |
| 203 | +3. Code maintainability |
| 204 | +... |
| 205 | +""" |
| 206 | + |
| 207 | +agent = create_agent_with_tools( |
| 208 | + codebase=codebase, |
| 209 | + tools=tools, |
| 210 | + chat_history=[SystemMessage(content=CUSTOM_PROMPT)], |
| 211 | +) |
| 212 | +``` |
0 commit comments