Skip to content

[wip] Modal RAG example #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions codegen-examples/examples/langchain_agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Codegen LangChain Agent Example

<p align="center">
<a href="https://docs.codegen.com/tutorials/build-code-agent">
<img src="https://i.imgur.com/6RF9W0z.jpeg" />
</a>
</p>

<h2 align="center">
Build an intelligent code agent with LangChain and Codegen
</h2>

<div align="center">

[![Documentation](https://img.shields.io/badge/Docs-docs.codegen.com-purple?style=flat-square)](https://docs.codegen.com/tutorials/build-code-agent)
[![License](https://img.shields.io/badge/Code%20License-Apache%202.0-gray?&color=gray)](https://github.com/codegen-sh/codegen-sdk/tree/develop?tab=Apache-2.0-1-ov-file)

</div>

This example demonstrates how to build an intelligent code agent using Codegen's LangChain integration. The agent can analyze and manipulate codebases using natural language commands.

## Quick Start

```python
from codegen import Codebase
from codegen.extensions.langchain import create_codebase_agent

# Initialize codebase
codebase = Codebase.from_repo("fastapi/fastapi")

# Create the agent
agent = create_codebase_agent(codebase=codebase, model_name="gpt-4", verbose=True)

# Ask the agent to analyze code
result = agent.invoke({"input": "What are the dependencies of the FastAPI class?", "config": {"configurable": {"session_id": "demo"}}})
print(result["output"])
```

## Installation

```bash
# Install dependencies
pip install modal-client codegen langchain langchain-openai

# Run the example
python run.py
```

## Available Tools

The agent comes with several built-in tools for code operations:

- `ViewFileTool`: View file contents and metadata
- `ListDirectoryTool`: List directory contents
- `SearchTool`: Search code using regex
- `EditFileTool`: Edit file contents
- `CreateFileTool`: Create new files
- `DeleteFileTool`: Delete files
- `RenameFileTool`: Rename files and update imports
- `MoveSymbolTool`: Move functions/classes between files
- `RevealSymbolTool`: Analyze symbol dependencies
- `SemanticEditTool`: Make semantic code edits
- `CommitTool`: Commit changes to disk

## Example Operations

The agent can perform various code analysis and manipulation tasks:

```python
# Analyze dependencies
agent.invoke({"input": "What are the dependencies of the reveal_symbol function?", "config": {"configurable": {"session_id": "demo"}}})

# Find usage patterns
agent.invoke({"input": "Show me examples of dependency injection in the codebase", "config": {"configurable": {"session_id": "demo"}}})

# Move code
agent.invoke({"input": "Move the validate_email function to validation_utils.py", "config": {"configurable": {"session_id": "demo"}}})
```

## Learn More

- [Full Tutorial](https://docs.codegen.com/tutorials/build-code-agent)
107 changes: 107 additions & 0 deletions codegen-examples/examples/langchain_agent/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
"""Demo implementation of an agent with Codegen tools."""

from codegen import Codebase
from codegen.extensions.langchain.tools import (
CommitTool,
CreateFileTool,
DeleteFileTool,
EditFileTool,
ListDirectoryTool,
MoveSymbolTool,
RenameFileTool,
RevealSymbolTool,
SearchTool,
SemanticEditTool,
ViewFileTool,
)
from codegen.sdk.enums import ProgrammingLanguage
from langchain import hub
from langchain.agents import AgentExecutor
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain_core.chat_history import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI


def create_codebase_agent(
codebase: Codebase,
model_name: str = "gpt-4o",
temperature: float = 0,
verbose: bool = True,
) -> RunnableWithMessageHistory:
"""Create an agent with all codebase tools.

Args:
codebase: The codebase to operate on
model_name: Name of the model to use (default: gpt-4)
temperature: Model temperature (default: 0)
verbose: Whether to print agent's thought process (default: True)

Returns:
Initialized agent with message history
"""
# Initialize language model
llm = ChatOpenAI(
model_name=model_name,
temperature=temperature,
)

# Get all codebase tools
tools = [
ViewFileTool(codebase),
ListDirectoryTool(codebase),
SearchTool(codebase),
EditFileTool(codebase),
CreateFileTool(codebase),
DeleteFileTool(codebase),
RenameFileTool(codebase),
MoveSymbolTool(codebase),
RevealSymbolTool(codebase),
SemanticEditTool(codebase),
CommitTool(codebase),
]

# Get the prompt to use
prompt = hub.pull("hwchase17/openai-functions-agent")

# Create the agent
agent = OpenAIFunctionsAgent(
llm=llm,
tools=tools,
prompt=prompt,
)

# Create the agent executor
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=verbose,
)

# Create message history handler
message_history = ChatMessageHistory()

# Wrap with message history
return RunnableWithMessageHistory(
agent_executor,
lambda session_id: message_history,
input_messages_key="input",
history_messages_key="chat_history",
)


if __name__ == "__main__":
# Initialize codebase
print("Initializing codebase...")
codebase = Codebase.from_repo("fastapi/fastapi", programming_language=ProgrammingLanguage.PYTHON)

# Create agent with history
print("Creating agent...")
agent = create_codebase_agent(codebase)

print("\nAsking agent to analyze symbol relationships...")
result = agent.invoke(
{"input": "What are the dependencies of the reveal_symbol function?"},
config={"configurable": {"session_id": "demo"}},
)
print("Messages:", result["messages"])
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
"""Modal API endpoint for repository analysis."""

import modal
from pydantic import BaseModel

import modal # deptry: ignore
from codegen import Codebase
from pydantic import BaseModel

# Create image with dependencies
image = modal.Image.debian_slim(python_version="3.13").apt_install("git").pip_install("fastapi[standard]", "codegen>=0.5.30")
Expand Down
120 changes: 120 additions & 0 deletions codegen-examples/examples/modal_repo_rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Codegen RAG Q&A API

<p align="center">
<a href="https://docs.codegen.com">
<img src="https://i.imgur.com/6RF9W0z.jpeg" />
</a>
</p>

<h2 align="center">
Answer questions about any GitHub repository using RAG
</h2>

<div align="center">

[![Documentation](https://img.shields.io/badge/Docs-docs.codegen.com-purple?style=flat-square)](https://docs.codegen.com)
[![License](https://img.shields.io/badge/Code%20License-Apache%202.0-gray?&color=gray)](https://github.com/codegen-sh/codegen-sdk/tree/develop?tab=Apache-2.0-1-ov-file)

</div>

This example demonstrates how to build a RAG-powered code Q&A API using Codegen's VectorIndex and Modal. The API can answer questions about any GitHub repository by:

1. Creating embeddings for all files in the repository
1. Finding the most relevant files for a given question
1. Using GPT-4 to generate an answer based on the context

## Quick Start

1. Install dependencies:

```bash
pip install modal-client codegen openai
```

2. Create a Modal volume for storing indices:

```bash
modal volume create codegen-indices
```

3. Start the API server:

```bash
modal serve api.py
```

4. Test with curl:

```bash
curl -X POST "http://localhost:8000/answer_code_question" \
-H "Content-Type: application/json" \
-d '{
"repo_name": "fastapi/fastapi",
"query": "How does FastAPI handle dependency injection?"
}'
```

## API Reference

### POST /answer_code_question

Request body:

```json
{
"repo_name": "owner/repo",
"query": "Your question about the code"
}
```

Response format:

```json
{
"status": "success",
"error": "",
"answer": "Detailed answer based on the code...",
"context": [
{
"filepath": "path/to/file.py",
"snippet": "Relevant code snippet..."
}
]
}
```

## How It Works

1. The API uses Codegen to clone and analyze the repository
1. It creates/loads a VectorIndex of all files using OpenAI's embeddings
1. For each question:
- Finds the most semantically similar files
- Extracts relevant code snippets
- Uses GPT-4 to generate an answer based on the context

## Development

The API is built using:

- Modal for serverless deployment
- Codegen for repository analysis
- OpenAI for embeddings and Q&A
- FastAPI for the web endpoint

To deploy changes:

```bash
modal deploy api.py
```

## Environment Variables

Required environment variables:

- `OPENAI_API_KEY`: Your OpenAI API key

## Learn More

- [Codegen Documentation](https://docs.codegen.com)
- [Modal Documentation](https://modal.com/docs)
- [VectorIndex Tutorial](https://docs.codegen.com/building-with-codegen/semantic-code-search)
Loading
Loading