-
Notifications
You must be signed in to change notification settings - Fork 224
source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
JessicaGarson
merged 25 commits into
elastic:main
from
salgado:feature/multimodal-rag-gotham
Feb 28, 2025
Merged
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
a843f13
Add Multimodal RAG with Elasticsearch Gotham City tutorial
salgado e47c5e7
Add Multimodal RAG with Elasticsearch Gotham City tutorial
salgado 1557fb2
docs: add OpenAI API key setup instructions
salgado 39674b2
docs: exclude licence
salgado d2b1b19
fix: fixed comments
salgado 47d6240
docs: added env template
salgado 3748cef
issues fixed 1st review
salgado fc0f06a
foo
codefromthecrypt 4182f10
foo
codefromthecrypt 76475fa
polish-and-docker
codefromthecrypt 3244b2a
env-example
codefromthecrypt 1217ef6
env-example
codefromthecrypt 55904f1
fix glitch
codefromthecrypt e24ca5b
remove spurios log
codefromthecrypt fc2b80d
Add Jupyter notebook implementation of Multimodal RAG
salgado e381c57
Add Jupyter notebook implementation of Multimodal RAG
salgado 312baa4
Add Jupyter notebook implementation of Multimodal RAG
salgado c90ba3d
Update documentation with simpler README and Docker setup guide
salgado 36cd475
adding changes from review
JessicaGarson 6866d48
Update 01-mmrag-blog-quick-start.ipynb
JessicaGarson d34ab33
remove wrong folder
salgado 112c8fa
Remove Docker configuration files
salgado d7f2472
remove coker references
salgado a3abfb2
fixing first line notebook to test branch
salgado be1a03b
fixing first line notebook to main repo
salgado File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
13 changes: 13 additions & 0 deletions
13
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/.env.template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Elasticsearch Configuration | ||
ELASTIC_API_KEY=your_api_key_here | ||
ELASTICSEARCH_ENDPOINT=your_elastic_endpoint | ||
codefromthecrypt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# OpenAI Configuration | ||
OPENAI_API_KEY=your_openai_api_key_here | ||
codefromthecrypt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Model Configuration | ||
MODEL_PATH=~/.cache/torch/checkpoints/imagebind_huge.pth | ||
|
||
# Optional Configuration | ||
#LOG_LEVEL=INFO | ||
#DEBUG=False |
97 changes: 97 additions & 0 deletions
97
...orting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# Building a Multimodal RAG Pipeline with Elasticsearch: The Story of Gotham City | ||
|
||
This repository contains the code for implementing a Multimodal Retrieval-Augmented Generation (RAG) system using Elasticsearch. The system processes and analyzes different types of evidence (images, audio, text, and depth maps) to solve a crime in Gotham City. | ||
|
||
## Overview | ||
|
||
The pipeline demonstrates how to: | ||
- Generate unified embeddings for multiple modalities using ImageBind | ||
- Store and search vectors efficiently in Elasticsearch | ||
- Analyze evidence using GPT-4 to generate forensic reports | ||
|
||
## Prerequisites | ||
|
||
- Python 3.10+ | ||
codefromthecrypt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Elasticsearch cluster (cloud or local) | ||
- OpenAI API key - Setup an OpenAI account and create a [secret key](https://platform.openai.com/docs/quickstart) | ||
- 8GB+ RAM | ||
- GPU (optional but recommended) | ||
|
||
## Quick Start | ||
|
||
1. **Setup Environment** | ||
```bash | ||
|
||
# Make sure you have pytorch installed and Python 3.10+ | ||
pip install torch torchvision torchaudio | ||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Create and activate virtual environment | ||
python -m venv env_mmrag | ||
source env_mmrag/bin/activate # Unix/MacOS | ||
# or | ||
.\env_mmrag\Scripts\activate # Windows | ||
|
||
# Install dependencies | ||
pip install -r requirements.txt | ||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
2. **Configure Credentials** | ||
Create a `.env` file: | ||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```env | ||
ELASTICSEARCH_ENDPOINT="your-elasticsearch-endpoint" | ||
ELASTIC_API_KEY="your-elastic-api-key" | ||
OPENAI_API_KEY="your-openai-api-key" | ||
``` | ||
|
||
3. **Run the Demo** | ||
```bash | ||
# Verify file structure | ||
python stages/01-stage/files_check.py | ||
|
||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Generate embeddings | ||
python stages/02-stage/test_embedding_generation.py | ||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Index content | ||
python stages/03-stage/index_all_modalities.py | ||
|
||
# Search and analyze | ||
python stages/04-stage/rag_crime_analyze.py | ||
``` | ||
|
||
## Project Structure | ||
|
||
``` | ||
├── README.md | ||
├── requirements.txt | ||
├── src/ | ||
│ ├── embedding_generator.py # ImageBind wrapper | ||
│ ├── elastic_manager.py # Elasticsearch operations | ||
│ └── llm_analyzer.py # GPT-4 integration | ||
├── stages/ | ||
│ ├── 01-stage/ # File organization | ||
│ ├── 02-stage/ # Embedding generation | ||
│ ├── 03-stage/ # Elasticsearch indexing/search | ||
│ └── 04-stage/ # Evidence analysis | ||
└── data/ # Sample data | ||
salgado marked this conversation as resolved.
Show resolved
Hide resolved
|
||
├── images/ | ||
├── audios/ | ||
├── texts/ | ||
└── depths/ | ||
``` | ||
|
||
## Sample Data | ||
|
||
The repository includes sample evidence files: | ||
- Images: Crime scene photos and security camera footage | ||
- Audio: Suspicious sound recordings | ||
- Text: Mysterious notes and riddles | ||
- Depth Maps: 3D scene captures | ||
|
||
## How It Works | ||
|
||
1. **Evidence Collection**: Files are organized by modality in the `data/` directory | ||
2. **Embedding Generation**: ImageBind converts each piece of evidence into a 1024-dimensional vector | ||
3. **Vector Storage**: Elasticsearch stores embeddings with metadata for efficient retrieval | ||
4. **Similarity Search**: New evidence is compared against the database using k-NN search | ||
5. **Analysis**: GPT-4 analyzes the connections between evidence to identify suspects | ||
|
Binary file added
BIN
+440 KB
...log-content/building-multimodal-rag-with-elasticsearch-gotham/data/audios/joker_laugh.wav
Binary file not shown.
Binary file added
BIN
+57.5 KB
...building-multimodal-rag-with-elasticsearch-gotham/data/depths/depth_suspect.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+126 KB
...uilding-multimodal-rag-with-elasticsearch-gotham/data/depths/jdancing-depth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+201 KB
.../building-multimodal-rag-with-elasticsearch-gotham/data/images/crime_scene1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+134 KB
.../building-multimodal-rag-with-elasticsearch-gotham/data/images/crime_scene2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+2.41 MB
...tent/building-multimodal-rag-with-elasticsearch-gotham/data/images/jdancing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+95.4 KB
...t/building-multimodal-rag-with-elasticsearch-gotham/data/images/joker_alley.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+1.98 MB
...uilding-multimodal-rag-with-elasticsearch-gotham/data/images/joker_laughing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+2.3 MB
...building-multimodal-rag-with-elasticsearch-gotham/data/images/playing-cards.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+69.9 KB
...building-multimodal-rag-with-elasticsearch-gotham/data/images/suspect_depth.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions
8
...rting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/data/texts/note2.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Why so serious? | ||
|
||
The show has just begun and you're already running | ||
While clowns are dancing and the city's stunning | ||
In the abandoned theater, a surprise awaits | ||
Come play with me before it's too late! | ||
|
||
HAHAHAHAHA! |
15 changes: 15 additions & 0 deletions
15
...og-content/building-multimodal-rag-with-elasticsearch-gotham/data/texts/police_report.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
PRELIMINARY REPORT - GCPD | ||
Date: 01/28/2025 | ||
Time: 22:30 | ||
|
||
Incident: Break-in and Vandalism | ||
Location: Gotham Central Bank | ||
Evidence Found: | ||
- Playing cards scattered | ||
- Smile graffiti on walls | ||
- Suspicious audio recording | ||
- Witnesses report maniacal laughter | ||
|
||
Status: Under Investigation | ||
Priority Level: MAXIMUM | ||
Primary Suspect: Unknown (possible Joker involvement) |
16 changes: 16 additions & 0 deletions
16
...ting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/data/texts/riddle.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
HAHAHA! | ||
|
||
Dear Detective, | ||
|
||
In a city of endless night, a new game unfolds | ||
Where chaos reigns and fear takes hold | ||
I left a gift at Gotham Central Bank | ||
Time's ticking, your mind goes blank | ||
|
||
The clues are there, scattered with care | ||
Each laugh echoes everywhere | ||
Midnight strikes, you won't catch me | ||
In Gotham's heart, chaos runs free! | ||
|
||
With a smile, | ||
? |
5 changes: 5 additions & 0 deletions
5
...ing-blog-content/building-multimodal-rag-with-elasticsearch-gotham/data/texts/threats.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Incident Log: | ||
1. Gotham Central Bank - 22:15 - Alarm triggered | ||
2. Monarch Theater - 22:45 - Suspicious laughter reported | ||
3. Abandoned Amusement Park - 23:00 - Strange lights | ||
4. Ace Chemical Plant - 23:30 - Suspicious movement |
13 changes: 13 additions & 0 deletions
13
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/requirements.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
elasticsearch>=8.11.0 | ||
torch>=2.1.0 | ||
torchvision>=0.16.0 | ||
torchaudio>=2.1.0 | ||
imagebind @ git+https://github.com/hkchengrex/ImageBind.git | ||
openai>=1.0.0 | ||
python-dotenv>=1.0.0 | ||
numpy>=1.24.0 | ||
pillow>=10.0.0 | ||
opencv-python>=4.8.0 | ||
librosa>=0.10.0 | ||
matplotlib>=3.7.0 | ||
xformers>=0.0.23 # Adicione esta linha |
87 changes: 87 additions & 0 deletions
87
...ing-blog-content/building-multimodal-rag-with-elasticsearch-gotham/src/elastic_manager.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
from elasticsearch import Elasticsearch, helpers | ||
import base64 | ||
import os | ||
from dotenv import load_dotenv | ||
import numpy as np | ||
|
||
class ElasticsearchManager: | ||
"""Manages multimodal operations in Elasticsearch""" | ||
|
||
def __init__(self): | ||
load_dotenv() # Load variables from .env | ||
self.es = self._connect_elastic() | ||
self.index_name = "multimodal_content" | ||
self._setup_index() | ||
|
||
def _connect_elastic(self): | ||
codefromthecrypt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Connects to Elasticsearch""" | ||
return Elasticsearch( | ||
os.getenv("ELASTICSEARCH_ENDPOINT"), # Elasticsearch endpoint | ||
api_key=os.getenv("ELASTIC_API_KEY") | ||
) | ||
|
||
def _setup_index(self): | ||
"""Sets up the index if it doesn't exist""" | ||
if not self.es.indices.exists(index=self.index_name): | ||
mapping = { | ||
"mappings": { | ||
"properties": { | ||
"embedding": { | ||
"type": "dense_vector", | ||
"dims": 1024, | ||
"index": True, | ||
"similarity": "cosine" | ||
}, | ||
"modality": {"type": "keyword"}, | ||
"content": {"type": "binary"}, | ||
"description": {"type": "text"}, | ||
"metadata": {"type": "object"}, | ||
"content_path": {"type": "text"} | ||
} | ||
} | ||
} | ||
self.es.indices.create(index=self.index_name, body=mapping) | ||
|
||
def index_content(self, embedding, modality, content=None, description="", metadata=None, content_path=None): | ||
"""Indexes multimodal content""" | ||
doc = { | ||
"embedding": embedding.tolist(), | ||
"modality": modality, | ||
"description": description, | ||
"metadata": metadata or {}, | ||
"content_path": content_path | ||
} | ||
|
||
if content: | ||
doc["content"] = base64.b64encode(content).decode() if isinstance(content, bytes) else content | ||
|
||
return self.es.index(index=self.index_name, document=doc) | ||
|
||
def search_similar(self, query_embedding, modality=None, k=5): | ||
"""Searches for similar contents""" | ||
query = { | ||
"knn": { | ||
"field": "embedding", | ||
"query_vector": query_embedding.tolist(), | ||
"k": k, | ||
"num_candidates": 100, | ||
"filter": [{"term": {"modality": modality}}] if modality else [] | ||
} | ||
} | ||
|
||
try: | ||
response = self.es.search( | ||
index=self.index_name, | ||
query=query, | ||
size=k | ||
) | ||
|
||
# Return both source data and score for each hit | ||
return [{ | ||
**hit["_source"], | ||
"score": hit["_score"] | ||
} for hit in response["hits"]["hits"]] | ||
|
||
except Exception as e: | ||
print(f"Error: processing search_evidence: {str(e)}") | ||
return "Error generating search evidence" |
120 changes: 120 additions & 0 deletions
120
...blog-content/building-multimodal-rag-with-elasticsearch-gotham/src/embedding_generator.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
import os | ||
import cv2 | ||
from io import BytesIO | ||
import logging | ||
from torch.hub import download_url_to_file | ||
|
||
import torch | ||
import numpy as np | ||
from PIL import Image | ||
from imagebind import data | ||
from imagebind.models import imagebind_model | ||
|
||
from torchvision import transforms | ||
|
||
|
||
logging.basicConfig(level=logging.INFO) | ||
logger = logging.getLogger(__name__) | ||
|
||
class EmbeddingGenerator: | ||
"""Generates multimodal embeddings using ImageBind""" | ||
|
||
def __init__(self, device="cpu"): | ||
self.device = device | ||
self.model = self._load_model() | ||
|
||
def _load_model(self): | ||
"""Initialize and test the ImageBind model.""" | ||
checkpoint_path = "~/.cache/torch/checkpoints/imagebind_huge.pth" | ||
os.makedirs(os.path.expanduser("~/.cache/torch/checkpoints"), exist_ok=True) | ||
|
||
if not os.path.exists(os.path.expanduser(checkpoint_path)): | ||
print("Downloading ImageBind weights...") | ||
download_url_to_file( | ||
"https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth", | ||
os.path.expanduser(checkpoint_path) | ||
) | ||
|
||
try: | ||
checkpoint_path = os.path.expanduser("~/.cache/torch/checkpoints/imagebind_huge.pth") | ||
|
||
# Check if file exists | ||
if not os.path.exists(checkpoint_path): | ||
raise FileNotFoundError(f"Checkpoint not found: {checkpoint_path}") | ||
|
||
model = imagebind_model.imagebind_huge(pretrained=False) | ||
model.load_state_dict(torch.load(checkpoint_path)) | ||
model.eval().to(self.device) | ||
|
||
# Quick test with empty text input | ||
logger.info("Testing model with sample input...") | ||
test_input = data.load_and_transform_text([""], self.device) | ||
with torch.no_grad(): | ||
_ = model({"text": test_input}) | ||
|
||
logger.info("🤖 ImageBind model initialized successfully") | ||
return model | ||
except Exception as e: | ||
logger.error(f"🚨 Model initialization failed: {str(e)}") | ||
raise | ||
|
||
def generate_embedding(self, input_data, modality): | ||
"""Generates embedding for different modalities""" | ||
processors = { | ||
"vision": lambda x: data.load_and_transform_vision_data(x, self.device), | ||
"audio": lambda x: data.load_and_transform_audio_data(x, self.device), | ||
"text": lambda x: data.load_and_transform_text(x, self.device), | ||
"depth": self.process_depth | ||
} | ||
|
||
try: | ||
# Input type verification | ||
if not isinstance(input_data, list): | ||
raise ValueError(f"Input data must be a list. Received: {type(input_data)}") | ||
|
||
# Convert input data to a tensor format that the model can process | ||
# For images: [batch_size, channels, height, width] | ||
# For audio: [batch_size, channels, time] | ||
# For text: [batch_size, sequence_length] | ||
inputs = {modality: processors[modality](input_data)} | ||
with torch.no_grad(): | ||
embedding = self.model(inputs)[modality] | ||
return embedding.squeeze(0).cpu().numpy() | ||
except Exception as e: | ||
logger.error(f"Error generating {modality} embedding: {str(e)}", exc_info=True) | ||
raise | ||
|
||
|
||
def process_vision(self, image_path): | ||
"""Processes image""" | ||
return data.load_and_transform_vision_data([image_path], self.device) | ||
|
||
def process_audio(self, audio_path): | ||
"""Processes audio""" | ||
return data.load_and_transform_audio_data([audio_path], self.device) | ||
|
||
def process_text(self, text): | ||
"""Processes text""" | ||
return data.load_and_transform_text([text], self.device) | ||
|
||
def process_depth(self, depth_paths, device="cpu"): | ||
"""Custom processing for depth maps""" | ||
try: | ||
# Check file existence | ||
for path in depth_paths: | ||
if not os.path.exists(path): | ||
raise FileNotFoundError(f"Depth map file not found: {path}") | ||
|
||
# Load and transform | ||
depth_images = [Image.open(path).convert("L") for path in depth_paths] | ||
|
||
transform = transforms.Compose([ | ||
transforms.Resize((224, 224)), | ||
transforms.ToTensor(), | ||
]) | ||
|
||
return torch.stack([transform(img) for img in depth_images]).to(device) | ||
|
||
except Exception as e: | ||
logger.error(f"🚨 - Error processing depth map: {str(e)}") | ||
raise |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.