A little AWS EC2 benchmarking of drives #6181

jhughes2112 · 2022-08-23T06:01:57Z

jhughes2112
Aug 23, 2022

I spent most of the day trying out various things on EC2. Since data is time-consuming and hard to come by, but fun to absorb quickly, I'll share what I discovered. Running on an i4i.2xlarge with 8 CPUs, I used 6 for RedPanda and used 20GB of memory, 1 topic with 20 partitions. I re-ran rpk iotune each time I changed the data folder and checked the drive speed with dd as well. Furthermore, I compared Ubuntu 22.04LTS against Amazon's Linux AMI based on Debian. The numbers turned out identical, so pick whichever distro you like. I ran RedPanda like this: sudo /usr/bin/rpk redpanda start --smp 6 --memory 20G --reserve-memory 0M --node-id 1 --kafka-addr bob://0.0.0.0:9092 --advertise-kafka-addr bob://x.x.x.x:9092 and created the topic as such: sudo rpk topic create mytopic -p 20 I tested production of messages with the following: sudo docker run -it --rm salaholabi/rdkafka_performance ./rdkafka_performance -P -b x.x.x.x:9092 -t mytopic -s 275 -c 10000000 -a -1 -u

disk1 -- dd reports 600MB/s -- 40GB io2 with 20,000 IOPS
sudo rpk iotune... failed with out of disk exceptions, but there was plenty of space on / and /var/lib/redpanda/data at all times.
% Sending 10000000 messages of size 275 bytes
|    elapsed |       msgs |      bytes |        rtt |         dr |     dr_m/s |    dr_MB/s |     dr_err |     tx_err |       outq |     offset
|       1000 |    1435154 |  394667350 |          0 |     935916 |     935916 |     257.38 |          0 |         73 |     499238 |      36789
|       2001 |    2389213 |  657033575 |          5 |    1892719 |     945832 |     260.10 |          0 |        347 |     496494 |      84498
|       3001 |    3342413 |  919163575 |          6 |    2845919 |     948210 |     260.76 |          0 |        618 |     496494 |     132554
|       4001 |    4282570 | 1177706750 |          6 |    3784894 |     945902 |     260.12 |          0 |        885 |     497676 |     178636
|       5001 |    5219281 | 1435302275 |          6 |    4721972 |     944138 |     259.64 |          0 |       1153 |     497309 |     225124
|       6002 |    6160692 | 1694190300 |          6 |    5664198 |     943592 |     259.49 |          0 |       1417 |     496494 |     272744
|       7004 |    7097884 | 1951918100 |          6 |    6601390 |     942495 |     259.19 |          0 |       1682 |     496494 |     319562
|       8005 |    8033505 | 2209213875 |          6 |    7537011 |     941490 |     258.91 |          0 |       1944 |     496494 |     362942
|       9005 |    8860969 | 2436766475 |          6 |    8471408 |     940702 |     258.69 |          0 |       2162 |     389561 |     412034
|      10005 |    9909950 | 2725236250 |         11 |    9410601 |     940551 |     258.65 |          0 |       2363 |     499349 |     457621
% 2389 backpressures for 10000000 produce calls: 0.024% backpressure rate
|      10600 |   10000000 | 2750000000 |          6 |   10000000 |     943354 |     259.42 |          0 |       2389 |          0 |     500020

disk2 -- dd reports 900MB/s -- 1800GB NVMe Direct attached
disks:
  - mountpoint: /mnt/disk2
    read_iops: 203926
    read_bandwidth: 1459146112
    write_iops: 111393
    write_bandwidth: 1115677952
% Sending 10000000 messages of size 275 bytes
|    elapsed |       msgs |      bytes |        rtt |         dr |     dr_m/s |    dr_MB/s |     dr_err |     tx_err |       outq |     offset
|       1000 |    1385994 |  381148350 |          0 |     986374 |     986374 |     271.25 |          0 |          0 |     399620 |     561108
|       2000 |    2442646 |  671727650 |          2 |    1944542 |     972271 |     267.37 |          0 |        173 |     498104 |     610304
|       3000 |    3389855 |  932210125 |          2 |    2898454 |     966151 |     265.69 |          0 |        351 |     491401 |     657365
|       4001 |    4335674 | 1192310350 |          3 |    3839180 |     959551 |     263.88 |          0 |        543 |     496495 |     703831
|       5001 |    5274216 | 1450409400 |          2 |    4777722 |     955163 |     262.67 |          0 |        717 |     496494 |     752984
|       6001 |    6232721 | 1713998275 |          7 |    5733991 |     955347 |     262.72 |          0 |        869 |     498730 |     802101
|       7003 |    7165306 | 1970459150 |         15 |    6668813 |     952166 |     261.85 |          0 |       1042 |     496493 |     848761
|       8004 |    8084728 | 2223300200 |          3 |    7588234 |     947945 |     260.69 |          0 |       1273 |     496494 |     894501
|       9005 |    9023181 | 2481374775 |          2 |    8528948 |     947125 |     260.46 |          0 |       1489 |     494233 |     938393
|      10005 |    9954975 | 2737618125 |          2 |    9460888 |     945551 |     260.03 |          0 |       1704 |     494087 |     985170
% 1713 backpressures for 10000000 produce calls: 0.017% backpressure rate
|      10576 |   10000000 | 2750000000 |          2 |   10000000 |     945527 |     260.02 |          0 |       1713 |          0 |     998839

disk3 -- dd reports 338MB/s -- 40GB gp3 with 3000 IOPS and 300 throughput
disks:
  - mountpoint: /mnt/disk3
    read_iops: 3000
    read_bandwidth: 314753280
    write_iops: 3037
    write_bandwidth: 315486336
% Sending 10000000 messages of size 275 bytes
|    elapsed |       msgs |      bytes |        rtt |         dr |     dr_m/s |    dr_MB/s |     dr_err |     tx_err |       outq |     offset
|       1000 |     500000 |  137500000 |          0 |          0 |          0 |       0.00 |          0 |         73 |     500000 |          0
|       2000 |     500000 |  137500000 |          0 |          0 |          0 |       0.00 |          0 |        173 |     500000 |          0
|       3001 |     500000 |  137500000 |          0 |          0 |          0 |       0.00 |          0 |        273 |     500000 |          0
|       4001 |     736524 |  202544100 |        765 |     236524 |      59104 |      16.25 |          0 |        393 |     500000 |     524149
|       5002 |     887329 |  244015475 |        584 |     387329 |      77426 |      21.29 |          0 |        485 |     500000 |     503279
|       6003 |    1045026 |  287382150 |        560 |     545026 |      90786 |      24.97 |          0 |        576 |     500000 |     517192
|       7004 |    1227253 |  337494575 |        553 |     727253 |     103828 |      28.55 |          0 |        666 |     500000 |     531544
|       8005 |    1339276 |  368300900 |        489 |     839276 |     104839 |      28.83 |          0 |        761 |     500000 |     545900
|       9006 |    1535277 |  422201175 |        837 |    1035277 |     114950 |      31.61 |          0 |        849 |     500000 |     541713
|      10006 |    1673415 |  460189125 |        837 |    1247692 |     124691 |      34.29 |          0 |        939 |     425723 |     579360
|      11008 |    1894996 |  521123900 |        450 |    1394996 |     126723 |      34.85 |          0 |       1028 |     500000 |     578137
|      12009 |    2063932 |  567581300 |        529 |    1563932 |     130228 |      35.81 |          0 |       1120 |     500000 |     599305
|      13010 |    2234053 |  614364575 |        496 |    1734053 |     133285 |      36.65 |          0 |       1211 |     500000 |     584160
|      14011 |    2389826 |  657202150 |        496 |    1893322 |     135124 |      37.16 |          0 |       1303 |     496506 |     599061
|      15012 |    2505283 |  688952825 |        552 |    2005283 |     133577 |      36.73 |          0 |       1397 |     500000 |     615307
|      16013 |    2728066 |  750218150 |        733 |    2228066 |     139140 |      38.26 |          0 |       1483 |     500000 |     630127
|      17014 |    2835137 |  779662675 |        729 |    2335137 |     137247 |      37.74 |          0 |       1576 |     500000 |     622914
...

Looks like direct-mounted NVMe is the way to go, followed closely (but expensively) with IO2, and unacceptably gp3 lags behind. Seems like the gp3 drive was fast at first but got slower after a while... not sure what happened there. Anyway, I hope this helps someone.

rkruze · 2022-09-01T00:06:55Z

rkruze
Sep 1, 2022

One issue is that if you are saturating the CPU or IO, Redpanda doesn't handle this well: #608. You are hitting this case in the gp3 scenario above.

0 replies

mattschumpert · 2022-09-19T13:28:05Z

mattschumpert
Sep 19, 2022
Collaborator

@jcsp

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A little AWS EC2 benchmarking of drives #6181

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

A little AWS EC2 benchmarking of drives #6181

jhughes2112 Aug 23, 2022

Replies: 2 comments

rkruze Sep 1, 2022

mattschumpert Sep 19, 2022 Collaborator

jhughes2112
Aug 23, 2022

rkruze
Sep 1, 2022

mattschumpert
Sep 19, 2022
Collaborator