Skip to content

Revert default to 8650 for llama #5100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Sep 5, 2024

No description provided.

Copy link

pytorch-bot bot commented Sep 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5100

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 17fde41 with merge base e4a2322 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2024
@guangy10 guangy10 changed the base branch from main to match_qnn_chipset_w_device_pool September 5, 2024 01:09
@@ -105,6 +105,7 @@ def get_coreml_partitioner(


def get_qnn_partitioner(
soc_model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
soc_model,
soc_model = SM8650,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add add a log here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a log entry here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing a default could be error-prone. Users must know which target device to run the model, so to avoid issue due to silent mismatch, it's better to ask users to explicitly provide the chipset info when requesting a qnn_partitioner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how different it is comparing to use SM8650 directly in partitioner_lib.py. This partitioner_lib.py is only used in export_llama_lib.py and it's either we hardcoded inside the code or one level above. Neither of them is configurable by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is partitioner_lib user-facing? My understanding is that when users work on their model, they should get the partitioner matches with the target device in their script. export_llama_lib.py is dedicated for llama as an example, and you just told me that it won't work for SM8450. So what's the issue of hard-coded SM8650 in export_llama_lib.py?

from executorch.extension.llm.custom_ops import model_sharding

partitioners.append(
get_qnn_partitioner(
args.use_kv_cache, args.pt2e_quantize, args.num_sharding
QcomChipset.SM8650, # Llama 2 works only on SM8650
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still default to SM8650 right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a arg to export_llama script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_qnn_partitioner is only used here per https://github.com/search?q=repo%3Apytorch%2Fexecutorch%20get_qnn_partitioner&type=code.

I'm curious how other non-genai models are partitioned.

Copy link
Contributor

@cccclai cccclai Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, this partitioner_lib is only used by llm and not used by anywhere. This is controlled by the compile spect when constructing the qnn partitioner. Should we just revert the change and use the default value?

For other non-genai models, they're using this function instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all llms going to always use SM8650 so you want to hard coded inside the partitioner_lib?

Copy link
Contributor Author

@guangy10 guangy10 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cccclai As titled this PR is to make it possible to get a qnn parititioner matching with the target device so that users can use it in their own lowering script for other LLMs. Sure today it's only used by the llama example in our codebase, and llama example wants it set to be SM8650 but I fail to understand why you prefer reverting the changes and hard-coded it back inside parititioner_lib after people are experiencing the chipset mismatch issue in #4973 but I will leave the decision up to you as you have more expertise and the PoC of QNN delegate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be fine with that. And notice that extension/llm/export/partitioner_lib.py is under llm folder, meaning it's only for llm. I don't think we will have bw to test on devices other than s24 before PTC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the real qnn partitioner is defined here

class QnnPartitioner(Partitioner):
def __init__(
self,
compiler_specs: List[CompileSpec],
skip_node_id_set: set = None,
skip_node_op_set: set = None,
):
self.compiler_specs_snapshot = copy.deepcopy(compiler_specs)
self.delegation_spec = DelegationSpec(
QnnBackend.__name__, self.compiler_specs_snapshot
)
self.partition_tags: Dict[str, DelegationSpec] = {}
self.skip_node_id_set = set() if skip_node_id_set is None else skip_node_id_set
self.skip_node_op_set = set() if skip_node_op_set is None else skip_node_op_set

The file you're changing is just a util function wrapped around qnn partitioner.

Base automatically changed from match_qnn_chipset_w_device_pool to main September 5, 2024 01:52
@guangy10 guangy10 force-pushed the get_qnn_partitioner_with_soc_model branch from ba901f1 to 7edf990 Compare September 5, 2024 02:00
@guangy10 guangy10 requested a review from cccclai September 5, 2024 02:00
@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10 guangy10 force-pushed the get_qnn_partitioner_with_soc_model branch from 7edf990 to 17fde41 Compare September 5, 2024 20:06
@guangy10 guangy10 changed the title Get qnn partitioner with soc model Revert default to 8650 for llama Sep 5, 2024
@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@facebook-github-bot facebook-github-bot merged commit 8fb1def into main Sep 5, 2024
37 checks passed
@facebook-github-bot facebook-github-bot deleted the get_qnn_partitioner_with_soc_model branch September 5, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants