-
Notifications
You must be signed in to change notification settings - Fork 607
Revert default to 8650 for llama #5100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5100
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 17fde41 with merge base e4a2322 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -105,6 +105,7 @@ def get_coreml_partitioner( | |||
|
|||
|
|||
def get_qnn_partitioner( | |||
soc_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soc_model, | |
soc_model = SM8650, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add add a log here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a log entry here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing a default could be error-prone. Users must know which target device to run the model, so to avoid issue due to silent mismatch, it's better to ask users to explicitly provide the chipset info when requesting a qnn_partitioner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how different it is comparing to use SM8650 directly in partitioner_lib.py
. This partitioner_lib.py
is only used in export_llama_lib.py
and it's either we hardcoded inside the code or one level above. Neither of them is configurable by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is partitioner_lib user-facing? My understanding is that when users work on their model, they should get the partitioner matches with the target device in their script. export_llama_lib.py
is dedicated for llama as an example, and you just told me that it won't work for SM8450. So what's the issue of hard-coded SM8650 in export_llama_lib.py
?
from executorch.extension.llm.custom_ops import model_sharding | ||
|
||
partitioners.append( | ||
get_qnn_partitioner( | ||
args.use_kv_cache, args.pt2e_quantize, args.num_sharding | ||
QcomChipset.SM8650, # Llama 2 works only on SM8650 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still default to SM8650 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a arg to export_llama script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_qnn_partitioner
is only used here per https://github.com/search?q=repo%3Apytorch%2Fexecutorch%20get_qnn_partitioner&type=code.
I'm curious how other non-genai models are partitioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, this partitioner_lib
is only used by llm and not used by anywhere. This is controlled by the compile spect when constructing the qnn partitioner. Should we just revert the change and use the default value?
For other non-genai models, they're using this function instead
executorch/examples/qualcomm/utils.py
Line 185 in d23548b
soc_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all llms going to always use SM8650 so you want to hard coded inside the partitioner_lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cccclai As titled this PR is to make it possible to get a qnn parititioner matching with the target device so that users can use it in their own lowering script for other LLMs. Sure today it's only used by the llama example in our codebase, and llama example wants it set to be SM8650 but I fail to understand why you prefer reverting the changes and hard-coded it back inside parititioner_lib after people are experiencing the chipset mismatch issue in #4973 but I will leave the decision up to you as you have more expertise and the PoC of QNN delegate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be fine with that. And notice that extension/llm/export/partitioner_lib.py
is under llm
folder, meaning it's only for llm. I don't think we will have bw to test on devices other than s24 before PTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the real qnn partitioner is defined here
executorch/backends/qualcomm/partition/qnn_partitioner.py
Lines 100 to 114 in b8a2cbd
class QnnPartitioner(Partitioner): | |
def __init__( | |
self, | |
compiler_specs: List[CompileSpec], | |
skip_node_id_set: set = None, | |
skip_node_op_set: set = None, | |
): | |
self.compiler_specs_snapshot = copy.deepcopy(compiler_specs) | |
self.delegation_spec = DelegationSpec( | |
QnnBackend.__name__, self.compiler_specs_snapshot | |
) | |
self.partition_tags: Dict[str, DelegationSpec] = {} | |
self.skip_node_id_set = set() if skip_node_id_set is None else skip_node_id_set | |
self.skip_node_op_set = set() if skip_node_op_set is None else skip_node_op_set |
The file you're changing is just a util function wrapped around qnn partitioner.
ba901f1
to
7edf990
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
7edf990
to
17fde41
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
No description provided.