Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) #10142

metascroy · 2025-04-14T18:30:57Z

This makes changes to ET so that phi-4 checkpoints saved with:

linear_config = Int8DynamicActivationIntxWeightConfig(
    weight_dtype=torch.int4,
    weight_granularity=PerGroup(32),
    weight_mapping_type=MappingType.SYMMETRIC,
    weight_zero_point_domain=ZeroPointDomain.NONE,
)

can load into ET and export and lower to XNNPACK.

cc @mcr229 for XNNPACK changes. The require changes are if ZeroPointDomain is NONE, then the zero_point is serialized as None, rather than a tensor of zeros.

cc @jackzhxng for phi4-mini changes

Output of phi4 model in ExecuTorch with above quantization:

A California roll is a type of sushi roll that originated in California. It's a variation of the traditional sushi roll, adapted to American preferences and ingredients.

In a classic California roll, you would generally find a combination of ingredients wrapped in a seaweed wrap (nori). Some common ingredients included in a California roll are:

1. Cucumber (or a similar vegetable like a jicama or an English cucumber for a less watery texture)
2. Avocado (for a creamier texture and a slightly milder flavor)
3. Smoked or grilled salmon (or another type

pytorch-bot · 2025-04-14T18:31:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10142

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 74a6aee with merge base cd72ec0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jackzhxng

Looks good on et side, few comments

jackzhxng · 2025-04-14T20:00:45Z

examples/models/phi_4_mini/convert_weights.py

-        output_dir=".",
-        model_type="PHI4",
-    )
+    if os.path.isdir(input_dir_or_checkpoint):


Can we add an comment somewhere explicitly detailing that:

FullModelHFCheckpointer is for directory (which would be straight from HF)

phi_4_hf_to_meta is used for single checkpoint, and the use case is for prequantized checkpoints

jackzhxng · 2025-04-14T20:01:55Z

examples/models/llama/model.py

@@ -257,6 +258,9 @@ def __init__(self, **kwargs):
                strict=False,
                assign=True,
            )  # self.model_ = Transformer(gptconf)
+            for param in self.model_.parameters():
+                if isinstance(param, TorchAOBaseTensor):
+                    param.requires_grad = False


I think might as well requires_grad = False across the board, not just for TorchAOBaseTensor

I'm not sure if on-device training stuff might not want that

mcr229

XNNPACK parts look ok

metascroy · 2025-04-16T03:24:48Z

Fixing compile issues here: pytorch/ao#2063

facebook-github-bot · 2025-04-16T22:21:27Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-17T00:51:40Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-17T02:35:39Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-17T16:28:34Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

… quantize_ (phi4-mini load/export) Differential Revision: D73147002 Pull Request resolved: pytorch#10142

metascroy requested review from jackzhxng, iseeyuan, larryliu0820, swolchok, lucylq, digantdesai and mcr229 as code owners April 14, 2025 18:30

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2025

jackzhxng reviewed Apr 14, 2025

View reviewed changes

mcr229 approved these changes Apr 14, 2025

View reviewed changes

metascroy requested a review from GregoryComer as a code owner April 16, 2025 02:53

metascroy added the topic: not user facing label Apr 16, 2025

metascroy changed the title ~~Updates make phi4-mini load/export with torchao subclass~~ Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) Apr 16, 2025

metascroy mentioned this pull request Apr 16, 2025

Fix compile issues pytorch/ao#2063

Merged

metascroy added the ciflow/trunk label Apr 16, 2025

metascroy force-pushed the phi4-export branch from e66791b to a6e5521 Compare April 16, 2025 18:08

metascroy requested review from JacobSzwejbka and tarun292 as code owners April 16, 2025 22:20

metascroy force-pushed the phi4-export branch from f5e0eb8 to b687fd9 Compare April 17, 2025 02:34

metascroy and others added 6 commits April 17, 2025 09:27

init

d40a182

up

9be299a

up

bebe4ff

up

d39b527

up

d504145

up

0f11985

metascroy added 3 commits April 17, 2025 09:27

up

457a085

up

1218e5a

up

74a6aee

metascroy force-pushed the phi4-export branch from 6bb3801 to 74a6aee Compare April 17, 2025 16:28

facebook-github-bot merged commit ef99fff into main Apr 17, 2025
92 checks passed

facebook-github-bot deleted the phi4-export branch April 17, 2025 23:01

keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025

Bump torchao pin, adjust llama export to support pre-quantization via…

aaef69e

… quantize_ (phi4-mini load/export) Differential Revision: D73147002 Pull Request resolved: pytorch#10142

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) #10142

Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) #10142

Uh oh!

metascroy commented Apr 14, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 14, 2025 •

edited

Loading

Uh oh!

jackzhxng left a comment

Uh oh!

jackzhxng Apr 14, 2025

Uh oh!

jackzhxng Apr 14, 2025

Uh oh!

metascroy Apr 16, 2025

Uh oh!

mcr229 left a comment •

edited

Loading

Uh oh!

metascroy commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) #10142

Bump torchao pin, adjust llama export to support pre-quantization via quantize_ (phi4-mini load/export) #10142

Uh oh!

Conversation

metascroy commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10142

✅ No Failures

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

jackzhxng Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

jackzhxng Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

metascroy Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

mcr229 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metascroy commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

metascroy commented Apr 14, 2025 •

edited

Loading

pytorch-bot bot commented Apr 14, 2025 •

edited

Loading

mcr229 left a comment •

edited

Loading