Skip to content

Server: Do not populate probs array when temperature is 0 #7202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

reuank
Copy link
Contributor

@reuank reuank commented May 10, 2024

Fixes #7197

@mofosyne mofosyne added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server labels May 11, 2024
@JohannesGaessler
Copy link
Collaborator

I think #7203 is a better way to fix it because it results in always the same number of token probabilities being returned.

Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 548 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8593.48ms p(95)=20917.85ms fails=, finish reason: stop=477 truncated=71
  • Prompt processing (pp): avg=94.63tk/s p(95)=362.19tk/s
  • Token generation (tg): avg=32.39tk/s p(95)=46.79tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=server-greedy-probs-fix commit=395df0cc7c1a2b31370ca6c546c819602a62a0de

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715419482 --> 1715420114
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 379.39, 379.39, 379.39, 379.39, 379.39, 831.1, 831.1, 831.1, 831.1, 831.1, 700.97, 700.97, 700.97, 700.97, 700.97, 752.21, 752.21, 752.21, 752.21, 752.21, 797.72, 797.72, 797.72, 797.72, 797.72, 794.37, 794.37, 794.37, 794.37, 794.37, 794.57, 794.57, 794.57, 794.57, 794.57, 817.36, 817.36, 817.36, 817.36, 817.36, 823.21, 823.21, 823.21, 823.21, 823.21, 832.78, 832.78, 832.78, 832.78, 832.78, 834.71, 834.71, 834.71, 834.71, 834.71, 853.15, 853.15, 853.15, 853.15, 853.15, 885.89, 885.89, 885.89, 885.89, 885.89, 879.2, 879.2, 879.2, 879.2, 879.2, 758.12, 758.12, 758.12, 758.12, 758.12, 759.92, 759.92, 759.92, 759.92, 759.92, 758.47, 758.47, 758.47, 758.47, 758.47, 785.23, 785.23, 785.23, 785.23, 785.23, 786.24, 786.24, 786.24, 786.24, 786.24, 785.52, 785.52, 785.52, 785.52, 785.52, 791.84, 791.84, 791.84, 791.84, 791.84, 796.34, 796.34, 796.34, 796.34, 796.34, 808.28, 808.28, 808.28, 808.28, 808.28, 797.43, 797.43, 797.43, 797.43, 797.43, 799.83, 799.83, 799.83, 799.83, 799.83, 801.73, 801.73, 801.73, 801.73, 801.73, 811.91, 811.91, 811.91, 811.91, 811.91, 810.58, 810.58, 810.58, 810.58, 810.58, 808.78, 808.78, 808.78, 808.78, 808.78, 808.98, 808.98, 808.98, 808.98, 808.98, 813.33, 813.33, 813.33, 813.33, 813.33, 811.99, 811.99, 811.99, 811.99, 811.99, 817.11, 817.11, 817.11, 817.11, 817.11, 820.31, 820.31, 820.31, 820.31, 820.31, 831.09, 831.09, 831.09, 831.09, 831.09, 831.39, 831.39, 831.39, 831.39, 831.39, 830.44, 830.44, 830.44, 830.44, 830.44, 830.13, 830.13, 830.13, 830.13, 830.13, 832.4, 832.4, 832.4, 832.4, 832.4, 834.48, 834.48, 834.48, 834.48, 834.48, 834.99, 834.99, 834.99, 834.99, 834.99, 833.13, 833.13, 833.13, 833.13, 833.13, 837.5, 837.5, 837.5, 837.5, 837.5, 838.45, 838.45, 838.45, 838.45, 838.45, 837.01, 837.01, 837.01, 837.01, 837.01, 834.58, 834.58, 834.58, 834.58, 834.58, 836.89, 836.89, 836.89, 836.89, 836.89, 840.06, 840.06, 840.06, 840.06, 840.06, 839.71, 839.71, 839.71, 839.71, 839.71, 844.45, 844.45, 844.45, 844.45, 844.45, 845.4, 845.4, 845.4, 845.4, 845.4, 848.83, 848.83, 848.83, 848.83, 848.83, 847.1, 847.1, 847.1, 847.1, 847.1, 846.15, 846.15, 846.15, 846.15, 846.15, 852.55, 852.55, 852.55, 852.55, 852.55, 853.8, 853.8, 853.8, 853.8, 853.8, 853.51, 853.51, 853.51, 853.51, 853.51, 854.82, 854.82, 854.82, 854.82, 854.82, 856.79, 856.79, 856.79, 856.79, 856.79, 857.08, 857.08, 857.08, 857.08, 857.08, 859.47, 859.47, 859.47, 859.47, 859.47, 859.47, 859.47]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715419482 --> 1715420114
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 45.72, 45.72, 45.72, 45.72, 45.72, 45.28, 45.28, 45.28, 45.28, 45.28, 33.7, 33.7, 33.7, 33.7, 33.7, 32.26, 32.26, 32.26, 32.26, 32.26, 32.5, 32.5, 32.5, 32.5, 32.5, 32.8, 32.8, 32.8, 32.8, 32.8, 33.72, 33.72, 33.72, 33.72, 33.72, 34.38, 34.38, 34.38, 34.38, 34.38, 34.59, 34.59, 34.59, 34.59, 34.59, 34.64, 34.64, 34.64, 34.64, 34.64, 34.43, 34.43, 34.43, 34.43, 34.43, 34.21, 34.21, 34.21, 34.21, 34.21, 33.95, 33.95, 33.95, 33.95, 33.95, 32.88, 32.88, 32.88, 32.88, 32.88, 32.47, 32.47, 32.47, 32.47, 32.47, 32.41, 32.41, 32.41, 32.41, 32.41, 32.76, 32.76, 32.76, 32.76, 32.76, 32.56, 32.56, 32.56, 32.56, 32.56, 32.08, 32.08, 32.08, 32.08, 32.08, 31.8, 31.8, 31.8, 31.8, 31.8, 31.62, 31.62, 31.62, 31.62, 31.62, 31.55, 31.55, 31.55, 31.55, 31.55, 31.71, 31.71, 31.71, 31.71, 31.71, 31.48, 31.48, 31.48, 31.48, 31.48, 31.54, 31.54, 31.54, 31.54, 31.54, 31.73, 31.73, 31.73, 31.73, 31.73, 31.78, 31.78, 31.78, 31.78, 31.78, 31.5, 31.5, 31.5, 31.5, 31.5, 31.23, 31.23, 31.23, 31.23, 31.23, 31.46, 31.46, 31.46, 31.46, 31.46, 31.6, 31.6, 31.6, 31.6, 31.6, 31.67, 31.67, 31.67, 31.67, 31.67, 31.8, 31.8, 31.8, 31.8, 31.8, 31.82, 31.82, 31.82, 31.82, 31.82, 31.76, 31.76, 31.76, 31.76, 31.76, 31.72, 31.72, 31.72, 31.72, 31.72, 31.52, 31.52, 31.52, 31.52, 31.52, 31.57, 31.57, 31.57, 31.57, 31.57, 31.66, 31.66, 31.66, 31.66, 31.66, 31.78, 31.78, 31.78, 31.78, 31.78, 31.86, 31.86, 31.86, 31.86, 31.86, 31.82, 31.82, 31.82, 31.82, 31.82, 31.82, 31.82, 31.82, 31.82, 31.82, 31.26, 31.26, 31.26, 31.26, 31.26, 30.86, 30.86, 30.86, 30.86, 30.86, 29.67, 29.67, 29.67, 29.67, 29.67, 29.58, 29.58, 29.58, 29.58, 29.58, 29.59, 29.59, 29.59, 29.59, 29.59, 29.76, 29.76, 29.76, 29.76, 29.76, 29.83, 29.83, 29.83, 29.83, 29.83, 29.91, 29.91, 29.91, 29.91, 29.91, 29.94, 29.94, 29.94, 29.94, 29.94, 29.93, 29.93, 29.93, 29.93, 29.93, 29.74, 29.74, 29.74, 29.74, 29.74, 29.6, 29.6, 29.6, 29.6, 29.6, 29.69, 29.69, 29.69, 29.69, 29.69, 29.81, 29.81, 29.81, 29.81, 29.81, 29.92, 29.92, 29.92, 29.92, 29.92, 30.04, 30.04, 30.04, 30.04, 30.04, 30.16, 30.16, 30.16, 30.16, 30.16, 30.16, 30.16, 30.16, 30.16, 30.16, 30.21, 30.21]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715419482 --> 1715420114
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09, 0.09, 0.09, 0.09, 0.09, 0.43, 0.43, 0.43, 0.43, 0.43, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25, 0.25, 0.32, 0.32, 0.32, 0.32, 0.32, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.32, 0.32, 0.32, 0.32, 0.32, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.4, 0.4, 0.4, 0.4, 0.4, 0.46, 0.46, 0.46, 0.46, 0.46, 0.59, 0.59, 0.59, 0.59, 0.59, 0.54, 0.54, 0.54, 0.54, 0.54, 0.31, 0.31, 0.31, 0.31, 0.31, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.1, 0.1, 0.1, 0.1, 0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.28, 0.28]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715419482 --> 1715420114
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0]
                    
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server: completion_probabilities (tok_str and prob) seem to be broken
3 participants