Skip to content

Commit dc164f5

Browse files
committed
[LLAVA] Enable 2nd XNNPACK Partition pass for the text model
This is to pick up ops like mul, add, sigmoid etc. which contributes to ghe e2e latency.
1 parent 7b3549b commit dc164f5

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

examples/models/llava/export_llava.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,10 +208,15 @@ def export_all(llava_model: LlavaModel):
208208
partitioner={
209209
"image_encoder": [XnnpackPartitioner()],
210210
"text_model": [
211+
# First partition the DQLinear nodes, then partition the rest of the nodes,
212+
# to avoid multiple DQLinear nodes in the same partition,
213+
# to avoid holding multiple unpacked and packed weight buffers in memory,
214+
# to reduce peak memory footprint.
211215
XnnpackPartitioner(
212216
config_precisions=ConfigPrecisionType.DYNAMIC_QUANT,
213217
per_op_mode=True,
214-
)
218+
),
219+
XnnpackPartitioner(),
215220
],
216221
},
217222
compile_config=EdgeCompileConfig(_check_ir_validity=False),

0 commit comments

Comments
 (0)