Skip to content

Phi-3's new rope scaling type longrope not supported #2172

Closed
@amihalik

Description

@amihalik

System Info

Microsoft pushed an Phi-3 model update and it seems to have broken TGI support.

on an aws g6.12xlarge, I run:

docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data \
    ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct

and I get this error:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 601, in get_model
    return FlashLlama(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 78, in __init__
    config = AutoConfig.from_pretrained(

  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 958, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)

  File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 768, in from_dict
    config = cls(**config_dict)

  File "/opt/conda/lib/python3.10/site-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
    self._rope_scaling_validation()

  File "/opt/conda/lib/python3.10/site-packages/transformers/models/phi3/configuration_phi3.py", line 186, in _rope_scaling_validation
    raise ValueError(f"`rope_scaling`'s type field must be one of ['su', 'yarn'], got {rope_scaling_type}")

ValueError: `rope_scaling`'s type field must be one of ['su', 'yarn'], got longrope

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. spin up a g6.12xlarge
  2. run docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct

Expected behavior

TGI should load up normally.

This longrope change seems to be a straight keyword replacement for su. For instance, I edited configuration_phi3.py and config.json to replace longrope with su and the model loaded and some basic inference tests worked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions