Skip to content

Benchmark comparison daru vs pandas & numpy

Shekhar Prasad Rajak edited this page Jan 23, 2019 · 29 revisions

DataFrame size = (10 ** n , 2). Means 2 columns and 10 ** n rows , where n is 2,3,4,5, 6, 7, 8

Comparison

Method on DataFrame Vector (Vector access and apply method): MEAN

daru Benchmark

Real times for vector size [10 ** 2, 10 ** 3 , 10 ** 4, 10 ** 5, 10 ** 6]

Method on DataFrame Vector (Vector access and apply method): MEAN

Number of rows Real Time
10 ** 2 0.00005347399928723462
10 ** 3 0.00000930299938772805
10 ** 4 0.00002248199962195940
10 ** 5 0.00002107299951603636
10 ** 6 0.00002180799856432714
10 ** 7 0.00002622999818413518

Method on DataFrame Vector (Vector access and apply method): mode

Number of rows Real Time
10 ** 2 0.00017478399968240410
10 ** 3 0.00008010799865587614
10 ** 4 0.00011613899914664216
10 ** 5 0.00010050600030808710
10 ** 6 0.00016179999875021167
10 ** 7 0.00012378300016280264

Method on DataFrame Vector (Vector access and apply method): median

Number of rows Real Time
10 ** 2 0.00005261199839878827
10 ** 3 0.00002682400008779950
10 ** 4 0.00003368200123077258
10 ** 5 0.00005748499825131148
10 ** 6 0.00003802499850280583
10 ** 7 0.00008999200144899078

Method on DataFrame Vector (Vector access and apply method): sum

Number of rows Real Time
10 ** 2 0.00000746200021239929
10 ** 3 0.00000597000325797126
10 ** 4 0.00000720399839337915
10 ** 5 0.00000745499710319564
10 ** 6 0.00005896600123378448
10 ** 7 0.00000900500162970275

Method on DataFrame Vector (Vector access and apply method): product

Number of rows Real Time
10 ** 2 0.00002019299790845253
10 ** 3 0.00000498999725095928
10 ** 4 0.00000655099938740022
10 ** 5 0.00000666200139676221
10 ** 6 0.00000736299989512190
10 ** 7 0.00000797199754742905

Method on DataFrame Vector (Vector access and apply method): median_absolute_deviation

Number of rows Real Time
10 ** 2 0.00005985799725749530
10 ** 3 0.00005071499981568195
10 ** 4 0.00007257600009324960
10 ** 5 0.00005898799645365216
10 ** 6 0.00011886399806826375
10 ** 7 0.00007509900024160743

Method on DataFrame Vector (Vector access and apply method): sum_of_squared_deviation

Number of rows Real Time
10 ** 2 0.00002594100078567863
10 ** 3 0.00000969399843597785
10 ** 4 0.00001122899993788451
10 ** 5 0.00001170699761132710
10 ** 6 0.00006296100036706775
10 ** 7 0.00001322000025538728

Method on DataFrame Vector (Vector access and apply method): average_deviation_populationa

Number of rows Real Time
10 ** 2 0.00004264399831299670
10 ** 3 0.00003976099833380431
10 ** 4 0.00002845899871317670
10 ** 5 0.00005149999924469739
10 ** 6 0.00003221800216124393
10 ** 7 0.00008974700176622719

Method on DataFrame Vector (Vector access and apply method): create df real time

Number of rows Real Time
10 ** 2 0.00029346900191740133
10 ** 3 0.00270435700076632202
10 ** 4 0.03130596700066234916
10 ** 5 0.24355140899933758192
10 ** 6 2.85959107999951811507
10 ** 7 52.26341757000045618042
Means => ["0.000018035998", "0.000009484000", "0.000018049001", "0.000020885000", "0.000021848999", "0.000023175000"]
mode => ["0.000150011001", "0.000117476000", "0.000117858001", "0.000110462001", "0.000139565002", "0.000111646001"]
median => ["0.000051891000", "0.000053140000", "0.000036403999", "0.000055286000", "0.000061164001", "0.000044535000"]
sum => ["0.000007409000", "0.000008056000", "0.000007533999", "0.000007462000", "0.000007916000", "0.000008029001"]
product => ["0.000024531000", "0.000007930001", "0.000006863002", "0.000006942999", "0.000007171000", "0.000007293002"]
median_absolute_deviation => ["0.000100432999", "0.000113962000", "0.000076615001", "0.000062638999", "0.000064992000", "0.000065991000"]
sum_of_squared_deviation => ["0.000011425000", "0.000016450000", "0.000012232000", "0.000011624001", "0.000012728999", "0.000012634000"]
average_deviation_populationa => ["0.000030231000", "0.000066736000", "0.000031079000", "0.000053027999", "0.000055622000", "0.000032394999"]
create df real time => ["0.000306631999", "0.002730177999", "0.033998219000", "0.250292323000", "2.832325777999", "47.578107261001"]%  

Pandas and Numpy - mean of the one column

Number of rows pandas_mean avg time numpy_mean avg time
10 ** 2 0.00002522777009980928 0.00002991263389994856
10 ** 3 0.00002691261290019611 0.00003136286949993519
10 ** 4 0.00004254218300047796 0.00004586969699812471
10 ** 5 0.00015704863499922794 0.00015137835600035032
10 ** 6 0.00155946647999371645 0.00133805154000583566
10 ** 7 0.01195333830000890919 0.01233988390013109927
10 ** 8 0.12151296670017472379 0.12252252519974718425

Benchmarking function: pandas_mean

Testing with a dataframe of size:  100
Result (seconds):  0.00002590420289998292
Testing with a dataframe of size:  1000
Result (seconds):  0.00002713621290004085
Testing with a dataframe of size:  10000
Result (seconds):  0.00004180985899984080
Testing with a dataframe of size:  100000
Result (seconds):  0.00015845163699941623
Testing with a dataframe of size:  1000000
Result (seconds):  0.00154265971000313576
Testing with a dataframe of size:  10000000
Result (seconds):  0.01233993340001688852
Testing with a dataframe of size:  100000000
Result (seconds):  0.12452101310000215917

Benchmarking function: numpy_mean

Testing with a dataframe of size:  100
Result (seconds):  0.00003039180350006063
Testing with a dataframe of size:  1000
Result (seconds):  0.00003180017919994498
Testing with a dataframe of size:  10000
Result (seconds):  0.00004642649300058111
Testing with a dataframe of size:  100000
Result (seconds):  0.00015341703800004326
Testing with a dataframe of size:  1000000
Result (seconds):  0.00132965861999764464
Testing with a dataframe of size:  10000000
Result (seconds):  0.01232307760001276689
Testing with a dataframe of size:  100000000

Pandas and Numpy - median of the one column

Benchmarking function: pandas_median

Testing with a dataframe of size:  100
Result (seconds):  0.00002582970006000323
Testing with a dataframe of size:  1000
Result (seconds):  0.00002824090917999456
Testing with a dataframe of size:  10000
Result (seconds):  0.00010940286248000120
Testing with a dataframe of size:  100000
Result (seconds):  0.00088729414723999977
Testing with a dataframe of size:  1000000
Result (seconds):  0.01075348846100041521
Testing with a dataframe of size:  10000000
Result (seconds):  0.12858627516000523117
Testing with a dataframe of size:  100000000
Result (seconds):  1.24747154800024873111

Benchmarking function: numpy_median

Testing with a dataframe of size:  100
Result (seconds):  0.00003089331586999833
Testing with a dataframe of size:  1000
Result (seconds):  0.00003515163099999882
Testing with a dataframe of size:  10000
Result (seconds):  0.00012442707055000028
Testing with a dataframe of size:  100000
Result (seconds):  0.00099820312949000251
Testing with a dataframe of size:  1000000
Result (seconds):  0.01108264675299960866
Testing with a dataframe of size:  10000000
Result (seconds):  0.13133820641999591206
Testing with a dataframe of size:  100000000
Result (seconds):  1.21390682299988839077

Pandas and Numpy - Product of the one column elements

Benchmarking function: pandas_prod

Testing with a dataframe of size:  100
Result (seconds):  0.00004491761369999949
Testing with a dataframe of size:  1000
Result (seconds):  0.00004912122980003915
Testing with a dataframe of size:  10000
Result (seconds):  0.00008044129299923952
Testing with a dataframe of size:  100000
Result (seconds):  0.00036101352399964525
Testing with a dataframe of size:  1000000
Result (seconds):  0.00347382971000115498
Testing with a dataframe of size:  10000000
Result (seconds):  0.03563231970001652649
Testing with a dataframe of size:  100000000
Result (seconds):  0.37846179100051813293

Benchmarking function: numpy_prod

Testing with a dataframe of size:  100
Result (seconds):  0.00005671319829998538
Testing with a dataframe of size:  1000
Result (seconds):  0.00005507186369995907
Testing with a dataframe of size:  10000
Result (seconds):  0.00008253360099934071
Testing with a dataframe of size:  100000
Result (seconds):  0.00040807778600083112
Testing with a dataframe of size:  1000000
Result (seconds):  0.00371396845999697692
Testing with a dataframe of size:  10000000
Result (seconds):  0.03488254110006892145
Testing with a dataframe of size:  100000000
Result (seconds):  0.35792863000006036600

Pandas and Numpy - Sum of the one column elements

Benchmarking function: pandas_sum

Testing with a dataframe of size:  100
Result (seconds):  0.00005561769920004735
Testing with a dataframe of size:  1000
Result (seconds):  0.00005946561629998541
Testing with a dataframe of size:  10000
Result (seconds):  0.00009354037799948855
Testing with a dataframe of size:  100000
Result (seconds):  0.00040350849599963114
Testing with a dataframe of size:  1000000
Result (seconds):  0.00499632787000336975
Testing with a dataframe of size:  10000000
Result (seconds):  0.05969556259997262082
Testing with a dataframe of size:  100000000
Result (seconds):  0.71097542200004681945

Benchmarking function: numpy_sum

Testing with a dataframe of size:  100
Result (seconds):  0.00006173165039999731
Testing with a dataframe of size:  1000
Result (seconds):  0.00006634956489997421
Testing with a dataframe of size:  10000
Result (seconds):  0.00010499926699958451
Testing with a dataframe of size:  100000
Result (seconds):  0.00038439752700014651
Testing with a dataframe of size:  1000000
Result (seconds):  0.00517789042000004005
Testing with a dataframe of size:  10000000
Result (seconds):  0.07420776259996272883
Testing with a dataframe of size:  100000000
Result (seconds):  0.64270900600004099434

Pandas and Numpy - Unique elements in the one column

Benchmarking function: pandas_unique

Testing with a dataframe of size:  100
Result (seconds):  0.00003485789000005752
Testing with a dataframe of size:  1000
Result (seconds):  0.00004646338229995308
Testing with a dataframe of size:  10000
Result (seconds):  0.00013272955199954595
Testing with a dataframe of size:  100000
Result (seconds):  0.00180719400100042547
Testing with a dataframe of size:  1000000
Result (seconds):  0.02744935295999311950
Testing with a dataframe of size:  10000000
Result (seconds):  0.36727430050004838957
Testing with a dataframe of size:  100000000
Result (seconds):  7.76852580100057821255

Benchmarking function: numpy_unique

Testing with a dataframe of size:  100
Result (seconds):  0.00001490040839998983
Testing with a dataframe of size:  1000
Result (seconds):  0.00004483109719994900
Testing with a dataframe of size:  10000
Result (seconds):  0.00057470103799914794
Testing with a dataframe of size:  100000
Result (seconds):  0.00743517332999999760
Testing with a dataframe of size:  1000000
Result (seconds):  0.07924894640999809170
Testing with a dataframe of size:  10000000
Result (seconds):  0.90400296050001993642
Testing with a dataframe of size:  100000000
Result (seconds):  10.34286806900036026491

Pandas and Numpy - Sort elements of the one column

Benchmarking function: numpy_sort

Testing with a dataframe of size:  100
Result (seconds):  0.00000912697550002122
Testing with a dataframe of size:  1000
Result (seconds):  0.00003445678490006685
Testing with a dataframe of size:  10000
Result (seconds):  0.00050352683799974329
Testing with a dataframe of size:  100000
Result (seconds):  0.00610998137699971227
Testing with a dataframe of size:  1000000
Result (seconds):  0.07806133034000595217
Testing with a dataframe of size:  10000000
Result (seconds):  0.89660990390002548445
Testing with a dataframe of size:  100000000
Result (seconds):  10.22523978899971552892

Benchmarking function: pandas_sort

Testing with a dataframe of size:  100
Result (seconds):  0.00022274660969997059
Testing with a dataframe of size:  1000
Result (seconds):  0.00028801353039998505
Testing with a dataframe of size:  10000
Result (seconds):  0.00093503032399985385
Testing with a dataframe of size:  100000
Result (seconds):  0.00960024423100003417
Testing with a dataframe of size:  1000000
Result (seconds):  0.14475218630000200037
Testing with a dataframe of size:  10000000
Result (seconds):  2.70540260359994144679
Testing with a dataframe of size:  100000000
Result (seconds):  43.03127763799966487568

Pandas - Create DataFrame

Creating dataframe of size:  100
Result (seconds):  0.00021427773819996220
Creating dataframe of size:  1000
Result (seconds):  0.00020970309380008984
Creating dataframe of size:  10000
Result (seconds):  0.00021675117500126361
Creating dataframe of size:  100000
Result (seconds):  0.00021533839099902252
Creating dataframe of size:  1000000
Result (seconds):  0.00026724909999757072
Creating dataframe of size:  10000000
Result (seconds):  0.00028099869996367488
Creating dataframe of size:  100000000
Result (seconds):  0.00445340500118618365

Note:

Clone this wiki locally