You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add pattern + replacement for Embedding with padding_idx
Summary:
This diff adds a pattern/replacement for embedding with padding_idx, which causes embedding in the NLU model to be quantized successfully.
Previously, the embedding op in the NLU model was not being quantized. This was happening because embedding in NLU includes an extra arg, padding_idx, which was not expected by the pattern used to match embedding ops for replacement in model graphs.
This change also reduces the size of the NLU model from 11.4 MB to 4.4 MB since embedding weight tensors are stored in quantized form instead of fp32.
Reviewed By: digantdesai, mcr229
Differential Revision: D48191947
fbshipit-source-id: 47283aa8c4990325238c362d130d7e2d141fcf0f
0 commit comments