fix: fix CohereForAI/c4ai-command-r-plus #1707

OlivierDehaene · 2024-04-04T16:47:28Z

@Narsil @drbh this will update flash attention v2 and vllm.
You will need to re-install them.

drbh · 2024-04-04T17:06:28Z

server/text_generation_server/models/cache_manager.py

@@ -43,7 +43,7 @@ def __init__(
        ]
        self.free_block_mask = torch.ones(num_blocks, dtype=torch.int32, device="cpu")
        self.slots = torch.arange(
-            0, num_blocks * self.block_size, dtype=torch.int32
+            0, num_blocks * self.block_size, dtype=torch.int64


quick question, is there a case where num_blocks is really really big? or maybe there very large block_indices sometimes?

just trying to understand the type change

It's because the vllm kernel now ask for this dtype. I don't know why they changed.
Slots is a very small tensor anyway.

server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py

@Narsil

@Narsil @drbh this will update flash attention v2 and vllm. You will need to re-install them.

thomas-schillaci · 2024-04-30T08:50:51Z

Hello @OlivierDehaene, @drbh this pull request slightly changes decoding, and so breaks my integration pipeline (I'm testing multiple inputs on my models and asserting their outputs don't change with do_sample=False).Do you know why this change was needed and if it's going to stay this way?

@Narsil

@Narsil @drbh this will update flash attention v2 and vllm. You will need to re-install them.

drbh reviewed Apr 4, 2024

View reviewed changes

OlivierDehaene commented Apr 9, 2024

View reviewed changes

server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py Outdated Show resolved Hide resolved

OlivierDehaene added 11 commits April 9, 2024 19:29

fix: fix CohereForAI/c4ai-command-r-plus

5088005

add contiguous

4a02d35

fix

58a7719

update dockerfile

847df60

update dockerfile

d7497f5

remove log_level from python shard

0c88cb6

remove log_level from python shard

91d76a6

fix cohere

946bf44

add py-cpuinfo

0604c5c

use custom vllm with kv_head_mapping

d4da0d4

fix mistral

26da6bf

OlivierDehaene force-pushed the chore/update_flash branch from e0e96d2 to 26da6bf Compare April 9, 2024 17:31

OlivierDehaene added 6 commits April 9, 2024 19:32

remove imports

f4f1e20

fix

87505bf

update vllm version

424e1b4

freaking rotary

2e7f6e8

fixed

07a3050

fix tests

93e7ba5

OlivierDehaene merged commit ad9d628 into main Apr 10, 2024

OlivierDehaene deleted the chore/update_flash branch April 10, 2024 15:20

kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Apr 29, 2024

fix: fix CohereForAI/c4ai-command-r-plus (huggingface#1707)

a1b65e5

@Narsil @drbh this will update flash attention v2 and vllm. You will need to re-install them.

Nilabhra pushed a commit to TII-AI-Research-Center/text-generation-inference that referenced this pull request May 14, 2024

fix: fix CohereForAI/c4ai-command-r-plus (huggingface#1707)

dbde165

@Narsil @drbh this will update flash attention v2 and vllm. You will need to re-install them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix CohereForAI/c4ai-command-r-plus #1707

fix: fix CohereForAI/c4ai-command-r-plus #1707

Uh oh!

OlivierDehaene commented Apr 4, 2024

Uh oh!

drbh Apr 4, 2024 •

edited

Loading

Uh oh!

OlivierDehaene Apr 5, 2024

Uh oh!

Uh oh!

thomas-schillaci commented Apr 30, 2024

Uh oh!

Uh oh!

fix: fix CohereForAI/c4ai-command-r-plus #1707

fix: fix CohereForAI/c4ai-command-r-plus #1707

Uh oh!

Conversation

OlivierDehaene commented Apr 4, 2024

Uh oh!

drbh Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OlivierDehaene Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomas-schillaci commented Apr 30, 2024

Uh oh!

Uh oh!

drbh Apr 4, 2024 •

edited

Loading