Skip to content

Commit 9a0b378

Browse files
committed
test green
Signed-off-by: Chris Abraham <[email protected]>
1 parent f0de27a commit 9a0b378

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

_posts/2024-12-19-improve-rag-performance.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,22 @@ title: "Improve RAG performance with torch.compile on AWS Graviton Processors"
44
author: Sunita Nadampalli(AWS), Ankith Gunapal(Meta), Hamid Shojanazeri(Meta)
55
---
66

7+
```html
8+
<pre><code class="language-python">
9+
<span style="color: green;">print("This line is green")</span>
10+
print("This line is normal")
11+
<span style="color: green;">x = 10</span>
12+
</code></pre>
13+
```
14+
15+
<div class="code-block">
16+
<pre>
17+
<span style="color: green;">let x = 10;</span>
18+
console.log(x);
19+
</pre>
20+
</div>
21+
22+
723
Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to support tasks like answering questions, translating languages, and completing sentences. There are a few challenges when working with LLMs such as domain knowledge gaps, factuality issues, and hallucination, which affect their reliability especially for the fields that require high levels of accuracy, such as healthcare, law, or engineering. Retrieval Augmented Generation (RAG) provides a solution to mitigate some of these issues by augmenting LLMs with a specific domain or an organization's internal knowledge base, without the need to retrain the model.
824

925
The RAG knowledge source is generally business specific databases which are typically deployed on general-purpose CPU infrastructure. So, deploying RAG on general-purpose CPU infrastructure alongside related business services is both efficient and cost-effective. With this motivation, we evaluated RAG deployment on [AWS Graviton](https://aws.amazon.com/ec2/graviton/) based Amazon EC2 instances which have been delivering up to [40% price-performance advantage](https://aws.amazon.com/ec2/graviton/getting-started/) compared to comparable instances for the majority of the workloads including databases, in-memory caches, big data analytics, media codecs, gaming servers, and machine learning inference.

0 commit comments

Comments
 (0)