-
Notifications
You must be signed in to change notification settings - Fork 223
Supporting blog content/llama index re ranker and elasticsearch re ranker #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/449 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few pieces of feedback. I'm keen to understand the in-memory vector store choice specifically.
"source": [ | ||
"# LlamaIndex re-ranker and Elasticsearch re-ranker: Comparison review\n", | ||
"\n", | ||
"This notebook demonstrates how to use AutoGen with Elasticsearch. This notebook is based on the article [LlamaIndex re-ranker and Elasticsearch re-ranker: Comparison review](https://www.elastic.co/search-labs/blog/llamaIndex-reranker-and-elasticsearch-reranker-comparison-review)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure the description matches the piece? This is using the old title which is misleading as discussed before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I'm so sorry for that mistake, it was fixed in the most recent commit.
}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install llama-index-core llama-index-llms-openai rank-llm llama-index-postprocessor-rankgpt-rerank llama-index-embeddings-openai" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this list of installs complete? If I remember correctly getpass
is available in the Python standard library, but I'm not sure that nest_asyncio
is. Can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the package nest_asyncio is also included in the Python standard library, so there's no need to install any additional packages.
"source": [ | ||
"# LlamaIndex re-ranker and Elasticsearch re-ranker: Comparison review\n", | ||
"\n", | ||
"This notebook demonstrates how to use AutoGen with Elasticsearch. This notebook is based on the article [LlamaIndex re-ranker and Elasticsearch re-ranker: Comparison review](https://www.elastic.co/search-labs/blog/llamaIndex-reranker-and-elasticsearch-reranker-comparison-review)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure the description matches the piece? This is using the old title which is misleading as discussed before.
}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install llama-index-core llama-index-llms-openai rank-llm llama-index-postprocessor-rankgpt-rerank llama-index-embeddings-openai" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to use a requirements.txt instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that using a requerements.txt
file adds unnecessary complexity to the notebook. The standard approach is to use the pip install
command in individual notebooks.
"\n", | ||
" document_objects.append(Document(text=text_content))\n", | ||
"\n", | ||
"index = VectorStoreIndex.from_documents(document_objects)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason that you're using an in-memory vector store at this point and not Elasticsearch? I found this surprising, especially as we are still making reference to comparing the journeys in the piece. It doesn't feel like a fair comparison to me. Should this example be changed to use Elasticsearch as the vector store?
If there's a reason why not and we agree, can you add a comment in this section that you are using the in-memory store at this point and not Elasticsearch? Until I got to that section of the article it wasn't clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to compare how easy is setting up each of the two approaches to do reranking, and explaining that quickstart efforts are similar but with LlamaIndex you end up with a local only implementation, and Elasticsearch unblocks a more robust solution. The focus was on reranking , and not on the data store.
Another way to see it is comparing an existing Elasticsearch implementation, and focus on the data store too. In that case what you say make sense and we should use the Elasticsearch vector store.
After a deep thought I think I like the alternative your comments suggest more and focusing on getting the same implementation and not only comparing reranking will be more valuable and straightforward to follow.
We will make the changes and let you know. Thanks for your feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again, @carlyrichmond. I made some changes to the notebook's source code. Now, both tests use the same Elasticsearch index to store and read the dataset.
Let me know if this makes sense to you.
"source": [ | ||
"## Cleaning environment\n", | ||
"\n", | ||
"Delete the resources used to prevent them from consuming resources." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do any cleanup of resources for the RankGPT reranker stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not needed. RankGPTRerank is just an object instance, and there's no special resource allocation that requires cleanup.
No description provided.