Closed
Description
System Info
Microsoft pushed an Phi-3 model update and it seems to have broken TGI support.
on an aws g6.12xlarge, I run:
docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data \
ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct
and I get this error:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 601, in get_model
return FlashLlama(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 78, in __init__
config = AutoConfig.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 958, in from_pretrained
return config_class.from_dict(config_dict, **unused_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 768, in from_dict
config = cls(**config_dict)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/opt/conda/lib/python3.10/site-packages/transformers/models/phi3/configuration_phi3.py", line 186, in _rope_scaling_validation
raise ValueError(f"`rope_scaling`'s type field must be one of ['su', 'yarn'], got {rope_scaling_type}")
ValueError: `rope_scaling`'s type field must be one of ['su', 'yarn'], got longrope
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- spin up a g6.12xlarge
- run
docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct
Expected behavior
TGI should load up normally.
This longrope
change seems to be a straight keyword replacement for su
. For instance, I edited configuration_phi3.py
and config.json
to replace longrope
with su
and the model loaded and some basic inference tests worked.
Metadata
Metadata
Assignees
Labels
No labels