Switch the order of the to_dtype function and source transform

cccclai · cccclai · commit 1f9a1c01eb46 · 2024-05-28T22:00:49.000-07:00
Pull Request resolved: #3757 We're running quantization during source transform and some quantization infra doesn't support bf16 yet. Move to_dtype one stage earlier so we can choose the dtype fp32 before running quantization transform. ghstack-source-id: 228051128 Differential Revision: [D57883363](https://our.internmc.facebook.com/intern/diff/D57883363/)
diff --git a/examples/models/llama2/export_llama_lib.py b/examples/models/llama2/export_llama_lib.py
@@ -374,8 +374,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:
         )
         .set_output_dir(output_dir_path)
         .set_metadata(args.metadata)
-        .source_transform(transforms)
         .to_dtype(dtype_override)
+        .source_transform(transforms)
     )
 
 

Original file line number	Diff line number	Diff line change
`@@ -374,8 +374,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:`
`374`	`374`	`)`
`375`	`375`	`.set_output_dir(output_dir_path)`
`376`	`376`	`.set_metadata(args.metadata)`
`377`		`- .source_transform(transforms)`
`378`	`377`	`.to_dtype(dtype_override)`
	`378`	`+ .source_transform(transforms)`
`379`	`379`	`)`
`380`	`380`
`381`	`381`