While it is nice that it is faster, ~7 Gb/core-second using a in-process "virtual network", (thus only measuring the protocol implementation itself instead of the rest of the network stack) is not exactly a fast network protocol implementation. That is ~500,000-700,000 full packets per second or ~1.5-2 core-us/packet.
Under those same conditions, you can quite readily do ~100 Gb/core-second (ignoring encryption, encryption will bottleneck you to 30-50 Gb/core-second on modern chips with AES acceleration instructions) in software with feature parity with proper protocol design and implementation.
SCTP isn't just a UDP pipe, It's a message oriented, congestion-controlled, reliability protocol, with bunch of other semantics.
We measured:
1. Association state + Per PATH CC/RTO, timers, RTT tracking, cwnd etc.
2. Selective ACKs and re-transmit logic.
3. chunk framing + tsn sequences.
4. ordered vs unordered delivery, and fragmentation/reassembly.
much more ...
Also our vnet-based implementation isn't just dumb buffer, we have packet on wire validation, SCTP parsing, CRC32c validations. deterministic network conditions emulator. With real time conditions.
Sure you can get 100 GB/Core second if you bypass all of that and just do huge batching
The blog post claim is just under the same SCTP semantics and the same test harness, enabling RACK has a huge win. not the absolute ceilings of in-process "virtual network" sockets :)
Yes, I meant all of that when I explicitly said feature parity at 100 Gb/core-second. Reliable delivery of multiple independent bytestreams (which is actually more than SCTP gives since SCTP still suffers from head-of-line blocking due to SCTP SACKs being by TSN instead of a per-stream identifier) with dynamic stream count (again, more than SCTP gives) over a unreliable network that may reorder or lose packets.
Under those same conditions, you can quite readily do ~100 Gb/core-second (ignoring encryption, encryption will bottleneck you to 30-50 Gb/core-second on modern chips with AES acceleration instructions) in software with feature parity with proper protocol design and implementation.
We measured:
1. Association state + Per PATH CC/RTO, timers, RTT tracking, cwnd etc.
2. Selective ACKs and re-transmit logic.
3. chunk framing + tsn sequences.
4. ordered vs unordered delivery, and fragmentation/reassembly.
much more ...
Also our vnet-based implementation isn't just dumb buffer, we have packet on wire validation, SCTP parsing, CRC32c validations. deterministic network conditions emulator. With real time conditions.
Sure you can get 100 GB/Core second if you bypass all of that and just do huge batching
The blog post claim is just under the same SCTP semantics and the same test harness, enabling RACK has a huge win. not the absolute ceilings of in-process "virtual network" sockets :)