class: center, middle, contrast # P4 in the wild: Line-rate packet forwarding for the SCION future Internet architecture Kamila Součková Master thesis (in progress) SDN Switzerland, 2019-07-05 --- class: contrast, expanded, middle, narrow ###### Background .indenthalf[**What is SCION?**] ###### My project 1. **Aims** 2. **Challenges** 3. **Current status** 4. **Future plans** --- # SCION future Internet architecture -- → changing the Internet is really hard, so… Why? -- .width100[![bgp route leak article screenshot](./img/bgp-leak-article.png)] --- .width100[![bgp route leak article screenshot](./img/bgp-leak-article.png)] .pushuplots[ .small.center[last week's BGP route leak only one of many examples of the current Internet's shortcomings] SCION is designed for **.hl[route control]**, **.hl[failure isolation]**, and **.hl[explicit trust information]** for end-to-end communication ] --- class: no-margins # SCION _.hl[Scalability]_ _.hl[Control]_ _.hl[Isolation]_ _.hl[on Next-generation Networks]_ --- count: false class: no-margins # SCION _.hl[Scalability]_ * packet-carried forwarding state * hierarchical design _.hl[Control]_ * end-host selects path * ISPs decide available paths _.hl[Isolation]_ * Isolation Domains (failures stay within one) * built-in DoS protection .hl[_on Next-generation Networks]_ * new control plane & data plane (replaces IP + BGP) * endhost-controlled multipath for free :-) --- class: expanded # Isolation Domains (ISDs) .floatright.width65[![ISDs](./img/isds.png)] * e.g. ISD ≈ country * PKI per ISD * routing within ISD independent * managed by **core ASes**: PKI, inter-ISD control plane * non-core ASes: only intra-ISD (simple) --- class: expanded1 .floatright.width50[![PCBs](./img/pcbs.png)] # Path discovery * core ASes create *path construction beacons* (PCBs) & flood * hierarchical: * inter-ISD: among core ASes * intra-ISD: only downstream * AS receiving beacon adds itself & forwards it * ⇒ PCB contains known good path to core * adds its signature ⇒ PCBs can be verified --- # Path discovery * AS chooses how to extend / what to forward ⇒ enables routing policies * available *path segments* registered to *path servers* * end hosts request from path servers * segments combined into end-to-end paths: .center[ .displayinlineblock.width30[![segments example 1](./img/segments1.png)] .displayinlineblock.width30[![segments example 2](./img/segments2.png)] .displayinlineblock.width30[![segments example 2](./img/segments3.png)] ] --- # Path discovery every PCB contains *hop fields* for path construction: * ingress, egress interface * timestamp + expiration time * chained cryptographic MACs with AS-specific keys ⇒ paths cannot be "invented" .center.width90[![HF MACs chaining](./img/macs.svg)] --- class: expanded # Packet forwarding .floatright.width60[![forwarding path packet](./img/path.png)] * path in packet header * every border router processes only its own *hop field* * checks to ensure path authenticity --- class: contrast, expanded, middle, narrow ###### SCION border router in P4 1. **Aims** 2. **Challenges** 3. **Current status** 3. **Future plans** --- exclude: true class: expanded # Aims * **Ready-to-deploy SCION border router** forwarding at 40Gbps or more * **SCION as a library** Modular, portable, high-performance P4 code for parsing, verification, and forwarding * **Guidelines for high-speed P4** What have we learned about high-speed P4-based packet processing on FPGAs? * **Optimising the SCION protocol for HW** How can we adjust SCION to enable more efficient implementations in HW? --- class: contrast, expanded, middle, center # Aims --- exclude: true # #1: Prod-ready SCION BR .center.width100[![Parts of the project](./img/scion-breakdown.svg)] --- class: expanded1 # #1: Ready-to-deploy SCION BR * forwarding at **40Gbps or more** * usable with real traffic * integrated with existing SCION infra * control plane * monitoring / metrics --- # #2: SCION as a library .center[Modular, portable, high-performance P4 code for parsing, verification, and forwarding] ??? Now, why is this a good idea? 1. because modular code is easier to deal with 2. [NEXT] because you can base other things on it --- .center.width100[![Parts of the project](./img/scion-breakdown.svg)] ??? you can take my BR and pick and choose which parts you want: --- .center.width100[![Parts of the project](./img/scion-breakdown-swap-aes.svg)] ??? bring your own AES or even use a completely different MAC validation scheme --- .center.width100[![Parts of the project](./img/scion-breakdown-parser.svg)] ??? change parser: swap your L2, use different encaps, or even parse the IFIDs in HFs in some special way --- count: false .center.width100[![Parts of the project](./img/scion-breakdown.svg)] --- .center.width100[![NMS](./img/scion-nms.svg)] .center[Example: Network monitoring system] --- .center.width100[![NMS](./img/scion-smartnic.svg)] .center[Example: Special-purpose end host with a P4-capable SmartNIC] ??? You could have a special-purpose end host with a P4-enabled SmartNIC, and do some of the data processing straight in the NIC => use my parser and deparser in a completely different architecture --- exclude: true # #3: Guidelines for high-speed P4 **a) P4+NetFPGA project template suitable for large projects** * software-engineer-friendly * simple way to add tests * ready for multi-platform usage * already used by 2 other ETH students --- exclude: true # #3: Guidelines for high-speed P4 **
P4+NetFPGA guidelines + project template suitable for large projects
** .center.huge[:-(] Discarded, as the current P4 toolchain for NetFPGA is **not** suitable for production -- see Challenges for details. --- # #3: Guidelines for high-speed P4 **tips that help software engineers write P4 code that performs well on FPGA-based hardware** **meeting timing requirements** a frequent problem ⇒ check the **critical path** in your implemented design * find it in the timing report * you'll have to guess about the correspondence to the P4 code, but it can be done --- # #3: Guidelines for high-speed P4 some examples: * to meet timing, avoid long data paths * `inout` parameters get compiled into long paths ⇒ avoid them in critical spots * rewrite code like this:
.floatleft[ ```sh if [long computation 1] then [long computation 2] ``` ] .floatright[ ```sh [computation 1]; [computation 2]; [combine the results] ``` ]
⇒
* CAM tables (`exact` match) are relatively expensive ⇒ prefer to combine multiple lookups into one or completely avoid tables ??? and where to look to find out which spots are critical --- # #4: Optimising the SCION protocol for HW .center[ How can we adjust SCION to enable more efficient implementations in HW? ⇒ to answer, let's talk about the challenges... ] ??? so, those are the aims, now, how do we meet them? what are the most challenging aspects of the project? --- class: contrast, expanded, middle, center # Challenges ??? --- # Making progress quickly: Reuse, Reduce, Recycle .center.width80.pushup[![Parts of the project](./img/scion-controlplane.svg)] transparently pass unhandled packets to SW through 1:1 "fake" network interfaces (really DMA) ⇒ iteratively move functionality into HW --- layout: true # Implementing the parser SCION header: .floatleft.margin0[ ![header stack](./img/header-stack.svg) ] --- ??? my experience with NetFPGA, this would be different with a different compiler --- .floatright.margin0[ First idea: Use header stacks: ```c struct ScionHeader_t { … HopField_h[32] seg1_hfs; … } ``` ] --- .floatright.margin0[ First idea: Use header stacks: ```c struct ScionHeader_t { … HopField_h[32] seg1_hfs; … } ``` .red[NetFPGA compiler does not support header stacks] ] --- layout: false # Implementing the parser Fix: We don't need to parse the whole path, we just need to save it so we can emit it later ⇒ use a `varbit` field .floatleft.margin0.width30[ ![header stack](./img/header-stack-varbit.svg) ] -- .floatright.margin0.red[ NetFPGA compiler does not support `varbit` ] --- # Implementing the parser Fix: Don't even save the path: use `packet_mod`: ```c parser ExModDeparser(packet_mod p, in headers_t h) { state start { * p.update(h.ethernet); transition select(h.ethernet.ethertype) { ETHERTYPE_IPV4: deparse_ipv4; ETHERTYPE_IPV6: deparse_ipv6; } } … } ``` Xilinx extension, **not** standard P4 — but standard P4 can use `varbit` --- # Implementing the parser Fix: Don't even save the path: use `packet_mod`: 1. modify NetFPGA design to use the `packet_mod`-enabled `XilinxStreamSwitch` architecture .light[\* experimental :-)] 2. skip over the path with `packet.advance(size)` -- .red[NetFPGA requires even `packet.advance(size)` to be a compile-time constant] -- .center[Therefore…] --- # Implementing the parser .light[Fix: Don't even save the path: use `packet_mod`:] 1. .light[modify NetFPGA design to use the `packet_mod`-enabled `XilinxStreamSwitch` architecture] .light[\* experimental :-)] 2. .light[skip over the path with `packet.advance(size)`] 3. **Create a lot of separate sub-parsers: one for skipping 1 HF, one for skipping 2, etc.** Select the correct sub-parser at runtime, but the jump size is fixed at compile time. --- # Implementing the parser Create a lot of separate sub-parsers: up to .hl[max length]. Problem: To support reasonably long paths (~64 HFs), we need a lot of sub-parsers. ⇒ Requires a lot of FPGA area + RAM to build it. --- layout: true # Implementing the parser Create a lot of separate sub-parsers: up to .hl[max length]. Problem: To support reasonably long paths (~64 HFs), we need a lot of sub-parsers. ⇒ Requires a lot of FPGA area + RAM to build it. Fix: Two stages of max length 8: --- count: false .width100[![sqrt1](./img/sqrt1.svg)] --- count: false .width100[![sqrt2](./img/sqrt2.svg)] --- count: false .width100[![sqrt3](./img/sqrt3.svg)] --- count: false .width100[![sqrt4](./img/sqrt4.svg)] --- layout: false class: expanded1 # Portability * keeping the code portable despite the differences in P4 support * P4 should be target-independent, but… * different externs * different compiler limitations (ahem) * making it simple to compile this BR for a different platform --- # Portability Pieces of the solution: * preprocessor `#define`s decouple platform from features; code depends only on features * well-thought-out repo structure => minimise work needed to add new platform * do it & document it Planning to support at least: * NetFPGA * P4lang's software switch * later: other FPGA-based NICs --- class: expanded1 # Performance / Timing * at 200MHz, the timing is very tight * NetFPGA compiler has many bugs * workarounds cost extra resources * complicated by P4 ⇒ HDL translation * no direct control over the resulting design --- exclude: true # Making progress quickly * modular design makes it easier to test incremental changes and locate bugs * tackling the hard bits first ⇒ I can't get stuck * **the NetFPGA's P4 stack is pretty much a PoC**; documentation is generally severely lacking therefore: * try things ASAP * change plan if something doesn't work * **guess** --- # Suggestions for SCION protocol *
require few table lookups
✓ *
variable lengths are bad
✓ * SCION avoids them where possible * define maximum sizes: * max path length * max HF size * make things explicit * include end host address length in header * consider re-thinking MAC chaining for peering paths * further research into the tradeoffs is needed --- class: contrast, expanded, middle, center # Status --- class: bottom .center.width100[![Status](./img/scion-breakdown-status.svg)] --- class: expanded1 # Future plans * SCION packet header redesign * production-readiness * handle all the cases * with a different P4 compiler... * power requirements optimisation * compare with IP * make it faster! * planning 1Tbps using multiple FPGA-enabled NICs --- exclude: true # Current status The look of this slide: cases pic from SCION book that shows which cases I implement and then on click a big number saying 40Gbps (hopefully :D) --- class: contrast, center, middle # Say hi! Email: **scion@kamila.is** Twitter: **@anotherkamila** Matrix: **@kamila:unchat.cat**