fuse fp8 quant in kv copying and add flashinfer decode mla operator in the attention module #737

blueswhen · 2025-02-18T08:03:42Z

No description provided.

2. add flashinfer prefill and decode mla operators in the deepseek2

blueswhen force-pushed the mla_fp8 branch 7 times, most recently from edae8c1 to 5fcc589 Compare February 26, 2025 06:56

feat: 1. fuse fp8 quant in kv coping in the deepseek2

9af2b0c

2. add flashinfer prefill and decode mla operators in the deepseek2

blueswhen force-pushed the mla_fp8 branch 2 times, most recently from 29237df to 6f5545a Compare February 26, 2025 09:35

fix

832eb1f

blueswhen force-pushed the mla_fp8 branch from 6f5545a to 832eb1f Compare February 26, 2025 09:47

hiworldwzj added 2 commits February 26, 2025 17:53

fix

b323da5

fix

461eec9

hiworldwzj merged commit d7c0a4b into ModelTC:main Feb 26, 2025
1 check failed

Provide feedback