Add Initial Compile for Llama 3.2 11B: Decoder TransformerSelfAttentionLayer, TransformerCrossAttentionLayer #1287

Jack-Khuu · 2024-10-10T01:49:42Z

Based on https://github.com/pytorch/torchtune/blob/57ab583c84c4a9dcacac23aeabc81f2a679670fe/torchtune/training/_compile.py#L42-L52

Compile as much of the Llama3.2 11B Vision Model from tune as we can

Testing:

 python torchchat.py generate llama3.2-11B --prompt "What's in this image?" --image-prompt assets/dog.jpg --compile

Run it a few times to avoid caching times: On the 4th iteration

What's in this image?The image is of a dog wearing sunglasses and riding a skateboard in the street. The dog is white with some brown spots on its face and ears, and it has its tongue out. It's wearing red sunglasses and a blue collar, and its front paws are on a red penny board with yellow wheels. The background is a road with green trees on either side.2024-10-09:18:43:33,307 INFO     [generate.py:1158] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 74 tokens                 
Time for inference 3: 1.9990 sec total                 
Time to first token: 0.1759 sec with parallel prefill.                

      Total throughput: 37.5180 tokens/sec, 0.0267 s/token                 
First token throughput: 5.6850 tokens/sec, 0.1759 s/token                 
 Next token throughput: 40.5893 tokens/sec, 0.0246 s/token                     
2024-10-09:18:43:33,307 INFO     [generate.py:1169] 
Bandwidth achieved: 798.83 GB/s

pytorch-bot · 2024-10-10T01:49:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1287

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1b83644 with merge base 438ebb1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ebsmothers · 2024-10-10T02:26:09Z

torchchat/generate.py

+            else:
+                self.decode_one_token = torch.compile(
+                    self.decode_one_token, fullgraph=True, **kwargs
+                )


Out of curiosity, how long does it take to compile in this case?

About 2 Minutes for Llama 3.1

iseeyuan

LGTM. Thanks!

Initial pass at adding compile for 11B vision

1b83644

Jack-Khuu requested a review from iseeyuan October 10, 2024 01:49

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2024

Jack-Khuu requested review from larryliu0820, Gasoonjia, ebsmothers and tarun292 October 10, 2024 01:50

ebsmothers reviewed Oct 10, 2024

View reviewed changes

iseeyuan approved these changes Oct 10, 2024

View reviewed changes

Jack-Khuu merged commit 1371a41 into main Oct 10, 2024
52 checks passed

Jack-Khuu deleted the compile_mm branch January 24, 2025 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Initial Compile for Llama 3.2 11B: Decoder TransformerSelfAttentionLayer, TransformerCrossAttentionLayer #1287

Add Initial Compile for Llama 3.2 11B: Decoder TransformerSelfAttentionLayer, TransformerCrossAttentionLayer #1287

Uh oh!

Jack-Khuu commented Oct 10, 2024

Uh oh!

pytorch-bot bot commented Oct 10, 2024 •

edited

Loading

Uh oh!

ebsmothers Oct 10, 2024

Uh oh!

Jack-Khuu Oct 10, 2024

Uh oh!

iseeyuan left a comment

Uh oh!

Uh oh!

Uh oh!

Add Initial Compile for Llama 3.2 11B: Decoder TransformerSelfAttentionLayer, TransformerCrossAttentionLayer #1287

Add Initial Compile for Llama 3.2 11B: Decoder TransformerSelfAttentionLayer, TransformerCrossAttentionLayer #1287

Uh oh!

Conversation

Jack-Khuu commented Oct 10, 2024

Uh oh!

pytorch-bot bot commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1287

✅ No Failures

Uh oh!

ebsmothers Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

iseeyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 10, 2024 •

edited

Loading