Skip to content

Add Q4_3 support to cuBLAS #1086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2023
Merged

Add Q4_3 support to cuBLAS #1086

merged 1 commit into from
Apr 20, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Apr 20, 2023

Also changed the Makefile to link to the cuda dynamic libraries, linking is much faster that way and there is no reason to link statically for local use.

@slaren slaren merged commit 2005469 into ggml-org:master Apr 20, 2023
@slaren slaren deleted the cuda-q4_3 branch April 20, 2023 18:59
@slaren
Copy link
Member Author

slaren commented Apr 20, 2023

7B q4_3 perplexity with cuBLAS: 6.0617

main: seed = 1682015944 llama.cpp: loading model from models/7B/ggml-model-q4_3.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 6 (mostly Q4_3) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 4936267.11 KB llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state) .................................................................................................... llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.65 seconds per pass - ETA 1.03 hours
[1]4.3508,[2]4.7736,[3]5.6662,[4]6.2864,[5]6.4227,[6]6.3703,[7]6.5471,[8]6.6450,[9]6.9846,[10]7.2508,[11]7.4526,[12]7.4782,[13]7.3964,[14]7.4641,[15]7.7127,[16]7.3279,[17]7.2089,[18]7.1596,[19]6.8043,[20]6.7911,[21]6.6988,[22]6.5289,[23]6.5003,[24]6.4051,[25]6.4152,[26]6.2542,[27]6.0810,[28]5.9814,[29]5.8920,[30]5.7339,[31]5.7033,[32]5.7208,[33]5.6628,[34]5.6958,[35]5.7176,[36]5.7589,[37]5.7625,[38]5.7697,[39]5.8023,[40]5.8532,[41]5.8655,[42]5.9053,[43]5.8678,[44]5.9249,[45]5.9253,[46]5.8997,[47]5.9190,[48]5.8934,[49]5.8921,[50]5.8504,[51]5.8450,[52]5.8340,[53]5.8786,[54]5.8589,[55]5.8354,[56]5.8628,[57]5.8831,[58]5.9029,[59]5.9213,[60]5.9619,[61]5.9535,[62]6.0125,[63]6.0403,[64]6.0535,[65]6.0964,[66]6.1040,[67]6.1232,[68]6.1375,[69]6.1625,[70]6.1932,[71]6.2158,[72]6.2465,[73]6.3047,[74]6.3088,[75]6.3243,[76]6.3367,[77]6.3494,[78]6.3360,[79]6.3622,[80]6.3551,[81]6.3681,[82]6.3726,[83]6.3218,[84]6.3046,[85]6.2919,[86]6.2698,[87]6.2102,[88]6.1858,[89]6.1659,[90]6.1492,[91]6.1729,[92]6.1674,[93]6.1685,[94]6.1660,[95]6.1934,[96]6.1921,[97]6.1884,[98]6.1818,[99]6.1686,[100]6.1689,[101]6.1924,[102]6.1878,[103]6.2070,[104]6.2139,[105]6.2127,[106]6.2300,[107]6.2305,[108]6.2437,[109]6.2370,[110]6.2308,[111]6.2521,[112]6.2721,[113]6.2745,[114]6.2706,[115]6.2764,[116]6.2675,[117]6.2732,[118]6.3004,[119]6.3223,[120]6.3569,[121]6.3717,[122]6.3952,[123]6.4325,[124]6.4500,[125]6.4400,[126]6.4794,[127]6.5153,[128]6.5454,[129]6.5295,[130]6.5373,[131]6.5317,[132]6.5242,[133]6.5118,[134]6.5214,[135]6.5170,[136]6.5053,[137]6.4969,[138]6.4791,[139]6.4690,[140]6.4653,[141]6.4379,[142]6.4332,[143]6.4042,[144]6.3833,[145]6.3751,[146]6.3640,[147]6.3670,[148]6.3670,[149]6.3619,[150]6.3581,[151]6.3602,[152]6.3507,[153]6.3348,[154]6.3266,[155]6.3332,[156]6.3287,[157]6.3453,[158]6.3493,[159]6.3545,[160]6.3568,[161]6.3683,[162]6.3404,[163]6.3282,[164]6.3047,[165]6.2742,[166]6.2474,[167]6.2103,[168]6.1802,[169]6.1658,[170]6.1544,[171]6.1274,[172]6.1095,[173]6.0937,[174]6.0644,[175]6.0429,[176]6.0309,[177]6.0117,[178]5.9891,[179]5.9722,[180]5.9629,[181]5.9418,[182]5.9239,[183]5.9101,[184]5.9087,[185]5.9011,[186]5.9014,[187]5.9079,[188]5.9040,[189]5.9216,[190]5.9226,[191]5.9437,[192]5.9593,[193]5.9758,[194]5.9870,[195]6.0082,[196]6.0240,[197]6.0442,[198]6.0589,[199]6.0623,[200]6.0667,[201]6.0619,[202]6.0804,[203]6.0877,[204]6.0862,[205]6.0971,[206]6.1038,[207]6.0998,[208]6.1083,[209]6.1123,[210]6.1173,[211]6.1272,[212]6.1341,[213]6.1443,[214]6.1468,[215]6.1489,[216]6.1636,[217]6.1807,[218]6.1940,[219]6.1937,[220]6.1898,[221]6.1845,[222]6.1826,[223]6.1738,[224]6.1672,[225]6.1636,[226]6.1836,[227]6.1922,[228]6.1972,[229]6.2034,[230]6.2003,[231]6.2163,[232]6.2049,[233]6.1881,[234]6.1737,[235]6.1548,[236]6.1483,[237]6.1385,[238]6.1405,[239]6.1261,[240]6.1158,[241]6.1175,[242]6.1211,[243]6.1193,[244]6.1085,[245]6.1052,[246]6.0943,[247]6.0828,[248]6.0762,[249]6.0737,[250]6.0782,[251]6.0713,[252]6.0677,[253]6.0581,[254]6.0524,[255]6.0405,[256]6.0225,[257]6.0106,[258]6.0025,[259]6.0002,[260]5.9921,[261]5.9880,[262]5.9823,[263]5.9770,[264]5.9585,[265]5.9581,[266]5.9564,[267]5.9498,[268]5.9584,[269]5.9570,[270]5.9575,[271]5.9655,[272]5.9693,[273]5.9692,[274]5.9715,[275]5.9798,[276]5.9858,[277]6.0012,[278]6.0115,[279]6.0208,[280]6.0234,[281]6.0338,[282]6.0396,[283]6.0546,[284]6.0629,[285]6.0712,[286]6.0842,[287]6.0843,[288]6.0899,[289]6.0817,[290]6.0667,[291]6.0517,[292]6.0369,[293]6.0240,[294]6.0261,[295]6.0251,[296]6.0296,[297]6.0280,[298]6.0312,[299]6.0286,[300]6.0177,[301]6.0176,[302]6.0096,[303]6.0007,[304]5.9919,[305]5.9884,[306]5.9764,[307]5.9785,[308]5.9813,[309]5.9656,[310]5.9602,[311]5.9537,[312]5.9559,[313]5.9501,[314]5.9487,[315]5.9329,[316]5.9278,[317]5.9121,[318]5.8926,[319]5.9047,[320]5.9169,[321]5.9211,[322]5.9170,[323]5.9105,[324]5.9076,[325]5.9178,[326]5.9179,[327]5.9201,[328]5.9238,[329]5.9298,[330]5.9332,[331]5.9455,[332]5.9429,[333]5.9499,[334]5.9447,[335]5.9389,[336]5.9426,[337]5.9404,[338]5.9398,[339]5.9350,[340]5.9308,[341]5.9389,[342]5.9418,[343]5.9461,[344]5.9465,[345]5.9471,[346]5.9448,[347]5.9487,[348]5.9521,[349]5.9545,[350]5.9511,[351]5.9518,[352]5.9523,[353]5.9463,[354]5.9476,[355]5.9528,[356]5.9561,[357]5.9526,[358]5.9620,[359]5.9645,[360]5.9613,[361]5.9611,[362]5.9679,[363]5.9788,[364]5.9850,[365]5.9900,[366]5.9913,[367]5.9996,[368]5.9970,[369]5.9978,[370]5.9996,[371]5.9943,[372]5.9991,[373]6.0038,[374]6.0023,[375]6.0024,[376]6.0089,[377]6.0040,[378]6.0068,[379]6.0129,[380]6.0055,[381]6.0024,[382]5.9973,[383]5.9965,[384]5.9961,[385]5.9949,[386]5.9946,[387]5.9943,[388]5.9910,[389]5.9860,[390]5.9792,[391]5.9716,[392]5.9678,[393]5.9660,[394]5.9689,[395]5.9676,[396]5.9600,[397]5.9668,[398]5.9712,[399]5.9791,[400]5.9790,[401]5.9803,[402]5.9812,[403]5.9831,[404]5.9893,[405]5.9802,[406]5.9771,[407]5.9765,[408]5.9782,[409]5.9896,[410]6.0009,[411]6.0122,[412]6.0279,[413]6.0388,[414]6.0465,[415]6.0520,[416]6.0600,[417]6.0719,[418]6.0754,[419]6.0824,[420]6.0911,[421]6.1027,[422]6.1063,[423]6.1133,[424]6.1237,[425]6.1329,[426]6.1392,[427]6.1437,[428]6.1517,[429]6.1569,[430]6.1651,[431]6.1789,[432]6.1825,[433]6.1816,[434]6.1774,[435]6.1783,[436]6.1807,[437]6.1904,[438]6.1979,[439]6.1947,[440]6.1936,[441]6.1887,[442]6.1875,[443]6.1887,[444]6.1895,[445]6.1874,[446]6.1898,[447]6.1928,[448]6.1966,[449]6.1941,[450]6.1948,[451]6.1908,[452]6.1781,[453]6.1700,[454]6.1645,[455]6.1652,[456]6.1699,[457]6.1716,[458]6.1697,[459]6.1704,[460]6.1789,[461]6.1763,[462]6.1750,[463]6.1785,[464]6.1774,[465]6.1748,[466]6.1672,[467]6.1679,[468]6.1676,[469]6.1697,[470]6.1701,[471]6.1654,[472]6.1699,[473]6.1645,[474]6.1658,[475]6.1597,[476]6.1613,[477]6.1543,[478]6.1535,[479]6.1596,[480]6.1640,[481]6.1656,[482]6.1613,[483]6.1571,[484]6.1590,[485]6.1572,[486]6.1515,[487]6.1514,[488]6.1491,[489]6.1444,[490]6.1421,[491]6.1394,[492]6.1339,[493]6.1310,[494]6.1291,[495]6.1287,[496]6.1251,[497]6.1196,[498]6.1181,[499]6.1137,[500]6.1044,[501]6.0980,[502]6.0981,[503]6.0974,[504]6.0887,[505]6.0904,[506]6.0914,[507]6.0861,[508]6.0822,[509]6.0816,[510]6.0849,[511]6.0896,[512]6.0930,[513]6.0952,[514]6.1015,[515]6.0960,[516]6.0952,[517]6.0961,[518]6.0956,[519]6.0985,[520]6.1008,[521]6.1021,[522]6.1049,[523]6.1056,[524]6.1113,[525]6.1145,[526]6.1155,[527]6.1172,[528]6.1121,[529]6.1130,[530]6.1078,[531]6.1064,[532]6.1111,[533]6.1135,[534]6.1117,[535]6.1137,[536]6.1085,[537]6.1062,[538]6.1114,[539]6.1124,[540]6.1159,[541]6.1162,[542]6.1173,[543]6.1188,[544]6.1197,[545]6.1178,[546]6.1188,[547]6.1148,[548]6.1097,[549]6.1099,[550]6.1069,[551]6.1035,[552]6.1013,[553]6.0975,[554]6.0953,[555]6.0921,[556]6.0914,[557]6.0939,[558]6.0902,[559]6.0900,[560]6.0898,[561]6.0902,[562]6.0880,[563]6.0878,[564]6.0921,[565]6.0942,[566]6.0942,[567]6.0920,[568]6.0928,[569]6.0913,[570]6.0941,[571]6.0942,[572]6.0950,[573]6.0947,[574]6.0912,[575]6.0907,[576]6.0906,[577]6.0890,[578]6.0871,[579]6.0876,[580]6.0811,[581]6.0773,[582]6.0764,[583]6.0772,[584]6.0774,[585]6.0699,[586]6.0630,[587]6.0637,[588]6.0683,[589]6.0738,[590]6.0765,[591]6.0788,[592]6.0776,[593]6.0746,[594]6.0755,[595]6.0731,[596]6.0765,[597]6.0743,[598]6.0713,[599]6.0735,[600]6.0727,[601]6.0713,[602]6.0727,[603]6.0754,[604]6.0762,[605]6.0798,[606]6.0821,[607]6.0805,[608]6.0773,[609]6.0782,[610]6.0817,[611]6.0800,[612]6.0825,[613]6.0788,[614]6.0740,[615]6.0667,[616]6.0693,[617]6.0633,[618]6.0586,[619]6.0530,[620]6.0394,[621]6.0326,[622]6.0310,[623]6.0324,[624]6.0328,[625]6.0327,[626]6.0315,[627]6.0339,[628]6.0341,[629]6.0340,[630]6.0372,[631]6.0428,[632]6.0487,[633]6.0472,[634]6.0506,[635]6.0513,[636]6.0478,[637]6.0443,[638]6.0469,[639]6.0438,[640]6.0447,[641]6.0449,[642]6.0515,[643]6.0537,[644]6.0548,[645]6.0529,[646]6.0570,[647]6.0529,[648]6.0539,[649]6.0543,[650]6.0581,[651]6.0635,[652]6.0646,[653]6.0685,[654]6.0622,[655]6.0617,

llama_print_timings: load time = 9033.50 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3328253.06 ms / 335360 tokens ( 9.92 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3366921.57 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants