Skip to content

Commit 9efed19

Browse files
committed
tweaks
Signed-off-by: Chris Abraham <[email protected]>
1 parent ffd7eed commit 9efed19

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

_posts/2024-12-19-improve-rag-performance.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ Self CPU time total: 2.537s
182182
```
183183

184184

185-
**Table 4:** Profiler output for HuggingFace sentence-transformer embedding model inference on AWS Graviton3-based m7g.xlarge instance with torch.compile, weights pre-packing, and inference_mode
185+
**Table 4:** Profiler output for HuggingFace sentence-transformer embedding model inference on AWS Graviton3-based m7g.xlarge instance with torch.compile, weights pre-packing, and inference_mode
186186

187187
The following table shows the incremental performance improvements achieved for the standalone embedding model inference.
188188

@@ -240,8 +240,7 @@ The following script is an updated example for the embedding model inference wit
240240

241241
### End-to-End RAG scenario on CPU
242242

243-
** \
244-
**After optimizing the embedding model inference, we started with a PyTorch eager mode based RAG setup, mainly to validate the functionality on the CPU backend. We built the RAG solution with[ HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceEmbeddings.html) from `langchain_community.embeddings`, as shown in the following code snippet.
243+
After optimizing the embedding model inference, we started with a PyTorch eager mode based RAG setup, mainly to validate the functionality on the CPU backend. We built the RAG solution with[ HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceEmbeddings.html) from `langchain_community.embeddings`, as shown in the following code snippet.
245244

246245

247246
```
@@ -419,7 +418,7 @@ We would like to express our gratitude to Eli Uriegas for the support in making
419418

420419
## Authors
421420

422-
**Sunita Nadampalli **is a Principal Engineer and AI/ML expert at AWS. She leads AWS Graviton software performance optimizations for AI/ML and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions for SoCs based on the Arm ISA.
421+
**Sunita Nadampalli** is a Principal Engineer and AI/ML expert at AWS. She leads AWS Graviton software performance optimizations for AI/ML and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions for SoCs based on the Arm ISA.
423422

424423
**Ankith Gunapal** is an AI Partner Engineer at Meta (PyTorch). He leads customer support, evangelizing & release engineering of TorchServe. He is passionate about solving production problems in model inference and model serving. He also enjoys distilling technically complex material in a user friendly format
425424

0 commit comments

Comments
 (0)