Skip to content

source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 28, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a843f13
Add Multimodal RAG with Elasticsearch Gotham City tutorial
salgado Feb 8, 2025
e47c5e7
Add Multimodal RAG with Elasticsearch Gotham City tutorial
salgado Feb 8, 2025
1557fb2
docs: add OpenAI API key setup instructions
salgado Feb 10, 2025
39674b2
docs: exclude licence
salgado Feb 10, 2025
d2b1b19
fix: fixed comments
salgado Feb 10, 2025
47d6240
docs: added env template
salgado Feb 10, 2025
3748cef
issues fixed 1st review
salgado Feb 14, 2025
fc0f06a
foo
codefromthecrypt Feb 25, 2025
4182f10
foo
codefromthecrypt Feb 25, 2025
76475fa
polish-and-docker
codefromthecrypt Feb 26, 2025
3244b2a
env-example
codefromthecrypt Feb 26, 2025
1217ef6
env-example
codefromthecrypt Feb 26, 2025
55904f1
fix glitch
codefromthecrypt Feb 26, 2025
e24ca5b
remove spurios log
codefromthecrypt Feb 26, 2025
fc2b80d
Add Jupyter notebook implementation of Multimodal RAG
salgado Feb 27, 2025
e381c57
Add Jupyter notebook implementation of Multimodal RAG
salgado Feb 27, 2025
312baa4
Add Jupyter notebook implementation of Multimodal RAG
salgado Feb 27, 2025
c90ba3d
Update documentation with simpler README and Docker setup guide
salgado Feb 27, 2025
36cd475
adding changes from review
JessicaGarson Feb 27, 2025
6866d48
Update 01-mmrag-blog-quick-start.ipynb
JessicaGarson Feb 27, 2025
d34ab33
remove wrong folder
salgado Feb 27, 2025
112c8fa
Remove Docker configuration files
salgado Feb 27, 2025
d7f2472
remove coker references
salgado Feb 27, 2025
a3abfb2
fixing first line notebook to test branch
salgado Feb 28, 2025
be1a03b
fixing first line notebook to main repo
salgado Feb 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore everything
**

# Allow specific files and directories
!requirements.txt
!data/
!src/
!stages/
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Use non-slim image due to OS dependencies of python packages. This gives us
# git, build-essential, libglib2 (opencv) and gomp (torchaudio).
FROM python:3.12

COPY /requirements.txt .

# Our python requirements have some OS dependencies beyond the base layer:
#
# * imagebind pulls in cartopy which has OS dependencies on geos and proj
# * opencv has a runtime OS dependency on libgl1-mesa-glx
#
# The dev dependencies are installed temporarily to compile the wheels.
# We leave the only the runtime dependencies, to keep the image smaller.
RUN apt-get update && \
# install build and runtime dependencies
apt-get install -y --no-install-recommends \
libgeos-dev \
libproj-dev \
libgeos-c1v5 \
libproj25 \
libgl1-mesa-glx && \
# Install everything except xformers first
grep -v "\bxformers\b" requirements.txt > /tmp/r.txt && pip install -r /tmp/r.txt && \
# Now, install xformers, as it should be able to see torch now
grep "\bxformers\b" requirements.txt > /tmp/r.txt && pip install -r /tmp/r.txt && \
# remove build dependencies
apt-get purge -y libgeos-dev libproj-dev && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*

WORKDIR /app
RUN mkdir -p ./data ./src ./stages
COPY ./data ./data
COPY ./src ./src
COPY ./stages ./stages

Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Building a Multimodal RAG Pipeline with Elasticsearch: The Story of Gotham City

This repository contains the code for implementing a Multimodal Retrieval-Augmented Generation (RAG) system using Elasticsearch. The system processes and analyzes different types of evidence (images, audio, text, and depth maps) to solve a crime in Gotham City.

## Overview

The pipeline demonstrates how to:
- Generate unified embeddings for multiple modalities using ImageBind
- Store and search vectors efficiently in Elasticsearch
- Analyze evidence using GPT-4 to generate forensic reports

## Prerequisites

- A Docker runtime with 8GB+ free ram
- GPU is optional, but recommended
- Elasticsearch cluster (cloud or local)
- OpenAI API key - Setup an OpenAI account and create a [secret key](https://platform.openai.com/docs/quickstart)

## Quick Start

This example runs four stages as docker compose services:

```mermaid
graph TD
verify-file-structure --> generate-embeddings
generate-embeddings --> index-content
index-content --> search-and-analyze
```

First, copy [env.example](env.example) to `.env` and fill in values noted inside.

Now, enter below to run the pipeline:
```bash
docker compose run --build --rm search-and-analyze
```

The first time takes a while to build the image and download ImageBind weights.

If you want to re-run just one stage, add `--no-deps` like this:
```bash
docker compose run --no-deps --build --rm search-and-analyze
```

## Project Structure

```
├── README.md
├── requirements.txt
├── src/
│ ├── embedding_generator.py # ImageBind wrapper
│ ├── elastic_manager.py # Elasticsearch operations
│ └── llm_analyzer.py # GPT-4 integration
├── stages/
│ ├── 01-stage/ # File organization
│ ├── 02-stage/ # Embedding generation
│ ├── 03-stage/ # Elasticsearch indexing/search
│ └── 04-stage/ # Evidence analysis
└── data/ # Sample data
├── images/
├── audios/
├── texts/
└── depths/
```

## Sample Data

The repository includes sample evidence files:
- Images: Crime scene photos and security camera footage
- Audio: Suspicious sound recordings
- Text: Mysterious notes and riddles
- Depth Maps: 3D scene captures

## How It Works

1. **Evidence Collection**: Files are organized by modality in the `data/` directory
2. **Embedding Generation**: ImageBind converts each piece of evidence into a 1024-dimensional vector
3. **Vector Storage**: Elasticsearch stores embeddings with metadata for efficient retrieval
4. **Similarity Search**: New evidence is compared against the database using k-NN search
5. **Analysis**: GPT-4 analyzes the connections between evidence to identify suspects

Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Why so serious?

The show has just begun and you're already running
While clowns are dancing and the city's stunning
In the abandoned theater, a surprise awaits
Come play with me before it's too late!

HAHAHAHAHA!
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
PRELIMINARY REPORT - GCPD
Date: 01/28/2025
Time: 22:30

Incident: Break-in and Vandalism
Location: Gotham Central Bank
Evidence Found:
- Playing cards scattered
- Smile graffiti on walls
- Suspicious audio recording
- Witnesses report maniacal laughter

Status: Under Investigation
Priority Level: MAXIMUM
Primary Suspect: Unknown (possible Joker involvement)
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
HAHAHA!

Dear Detective,

In a city of endless night, a new game unfolds
Where chaos reigns and fear takes hold
I left a gift at Gotham Central Bank
Time's ticking, your mind goes blank

The clues are there, scattered with care
Each laugh echoes everywhere
Midnight strikes, you won't catch me
In Gotham's heart, chaos runs free!

With a smile,
?
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Incident Log:
1. Gotham Central Bank - 22:15 - Alarm triggered
2. Monarch Theater - 22:45 - Suspicious laughter reported
3. Abandoned Amusement Park - 23:00 - Strange lights
4. Ace Chemical Plant - 23:30 - Suspicious movement
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: gotham-city-crime-analysis

services:
verify-file-structure:
build:
context: .
container_name: verify-file-structure
restart: 'no' # no need to re-verify file structure
env_file:
- .env
command: python stages/01-stage/files_check.py
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
- "localhost:host-gateway"

generate-embeddings:
depends_on:
verify-file-structure:
condition: service_completed_successfully
build:
context: .
container_name: generate-embeddings
restart: 'no' # no need to re-generate embeddings
env_file:
- .env
command: python stages/02-stage/test_embedding_generation.py
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
- "localhost:host-gateway"
volumes:
- torch-checkpoints:/root/cache/torch/checkpoints/

index-content:
depends_on:
generate-embeddings:
condition: service_completed_successfully
build:
context: .
container_name: index-content
restart: 'no' # no need to re-verify file structure
env_file:
- .env
command: python stages/03-stage/index_all_modalities.py
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
- "localhost:host-gateway"

search-and-analyze:
depends_on:
index-content:
condition: service_completed_successfully
build:
context: .
container_name: search-and-analyze
restart: 'no' # no need to re-verify file structure
env_file:
- .env
command: python stages/04-stage/rag_crime_analyze.py
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
- "localhost:host-gateway"

volumes:
# Avoid re-downloading a >4GB model checkpoint
torch-checkpoints:
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
elasticsearch~=8.17.1
torch~=2.6.0
torchvision~=0.21.0
torchaudio~=2.6.0
imagebind @ git+https://github.com/hkchengrex/ImageBind.git
openai~=1.64.0
python-dotenv~=1.0.1
numpy~=2.1.3
pillow~=11.1.0
opencv-python~=4.11.0
librosa~=0.10.2
matplotlib~=3.10.0
wheel~=0.45.1
setuptools
xformers~=0.0.29
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
from elasticsearch import Elasticsearch, helpers
import base64
import os
from dotenv import load_dotenv
import numpy as np


class ElasticsearchManager:
"""Manages multimodal operations in Elasticsearch"""

def __init__(self):
load_dotenv() # Load variables from .env
self.es = self._connect_elastic()
self.index_name = "multimodal_content"
self._setup_index()

def _connect_elastic(self):
"""Connects to Elasticsearch"""
ELASTICSEARCH_URL = os.getenv("ELASTICSEARCH_URL")
ELASTICSEARCH_USER = os.getenv("ELASTICSEARCH_USER")
ELASTICSEARCH_PASSWORD = os.getenv("ELASTICSEARCH_PASSWORD")
ELASTICSEARCH_API_KEY = os.getenv("ELASTICSEARCH_API_KEY")

if ELASTICSEARCH_USER:
return Elasticsearch(
hosts=[ELASTICSEARCH_URL],
basic_auth=(ELASTICSEARCH_USER, ELASTICSEARCH_PASSWORD),
)
elif ELASTICSEARCH_API_KEY:
return Elasticsearch(
hosts=[ELASTICSEARCH_URL], api_key=ELASTICSEARCH_API_KEY
)
else:
raise ValueError(
"Please provide either ELASTICSEARCH_USER or ELASTICSEARCH_API_KEY"
)

def _setup_index(self):
"""Sets up the index if it doesn't exist"""
if not self.es.indices.exists(index=self.index_name):
mapping = {
"mappings": {
"properties": {
"embedding": {
"type": "dense_vector",
"dims": 1024,
"index": True,
"similarity": "cosine",
},
"modality": {"type": "keyword"},
"content": {"type": "binary"},
"description": {"type": "text"},
"metadata": {"type": "object"},
"content_path": {"type": "text"},
}
}
}
self.es.indices.create(index=self.index_name, body=mapping)

def index_content(
self,
embedding,
modality,
content=None,
description="",
metadata=None,
content_path=None,
):
"""Indexes multimodal content"""
doc = {
"embedding": embedding.tolist(),
"modality": modality,
"description": description,
"metadata": metadata or {},
"content_path": content_path,
}

if content:
doc["content"] = (
base64.b64encode(content).decode()
if isinstance(content, bytes)
else content
)

return self.es.index(index=self.index_name, document=doc)

def search_similar(self, query_embedding, modality=None, k=5):
"""Searches for similar contents"""
query = {
"knn": {
"field": "embedding",
"query_vector": query_embedding.tolist(),
"k": k,
"num_candidates": 100,
"filter": [{"term": {"modality": modality}}] if modality else [],
}
}

try:
response = self.es.search(index=self.index_name, query=query, size=k)

# Return both source data and score for each hit
return [
{**hit["_source"], "score": hit["_score"]}
for hit in response["hits"]["hits"]
]

except Exception as e:
print(f"Error: processing search_evidence: {str(e)}")
return "Error generating search evidence"
Loading