P4 in the wild: Line-rate packet forwarding for the SCION future Internet architecture

# P4 in the wild: Line-rate packet forwarding for the SCION future Internet architecture

Kamila Součková  
Master thesis (in progress)  
SDN Switzerland, 2019-07-05

---

###### Background

###### My project

1. **Aims**
2. **Challenges**
3. **Current status**
4. **Future plans**

---

# SCION

future Internet architecture

→ changing the Internet is really hard, so… Why?

---

.pushuplots[
.small.center[last week's BGP route leak only one of many examples  
of the current Internet's shortcomings]

SCION is designed for **.hl[route control]**, **.hl[failure isolation]**, and **.hl[explicit trust information]** for end-to-end communication  
]

---

# SCION

_.hl[Scalability]_

_.hl[Control]_

_.hl[Isolation]_

_.hl[on Next-generation Networks]_

---

# SCION

_.hl[Scalability]_  
 * packet-carried forwarding state
 * hierarchical design

_.hl[Control]_  
 * end-host selects path
 * ISPs decide available paths

_.hl[Isolation]_  
 * Isolation Domains (failures stay within one)
 * built-in DoS protection

.hl[_on Next-generation Networks]_  
 * new control plane & data plane (replaces IP + BGP)  
 * endhost-controlled multipath for free :-)

---

# Isolation Domains (ISDs)

.floatright.width65[![ISDs](./img/isds.png)]

* e.g. ISD ≈ country
* PKI per ISD
* routing within ISD independent

* managed by **core ASes**: PKI, inter-ISD control plane
* non-core ASes: only intra-ISD (simple)

---

.floatright.width50[![PCBs](./img/pcbs.png)]

# Path discovery

* core ASes create *path construction beacons* (PCBs) & flood
* hierarchical:
 * inter-ISD:  
   among core ASes
 * intra-ISD:  
   only downstream
* AS receiving beacon adds itself & forwards it
  * ⇒ PCB contains known good path to core
  * adds its signature ⇒ PCBs can be verified

---

# Path discovery

* AS chooses how to extend / what to forward ⇒ enables routing policies
* available *path segments* registered to *path servers*
* end hosts request from path servers
* segments combined into end-to-end paths:

.center[
.displayinlineblock.width30[![segments example 1](./img/segments1.png)]
.displayinlineblock.width30[![segments example 2](./img/segments2.png)]
.displayinlineblock.width30[![segments example 2](./img/segments3.png)]
]

---

# Path discovery

every PCB contains *hop fields* for path construction:

* ingress, egress interface
* timestamp + expiration time
* chained cryptographic MACs with AS-specific keys ⇒ paths cannot be "invented"  
  .center.width90[![HF MACs chaining](./img/macs.svg)]

---

# Packet forwarding

.floatright.width60[![forwarding path packet](./img/path.png)]

* path in packet header
* every border router processes only its own *hop field*
* checks to ensure path authenticity

---

###### SCION border router in P4

1. **Aims**
2. **Challenges**
3. **Current status**
3. **Future plans**

---

# Aims

* **Ready-to-deploy SCION border router**  
  forwarding at 40Gbps or more
* **SCION as a library**  
  Modular, portable, high-performance P4 code for parsing, verification, and forwarding
* **Guidelines for high-speed P4**  
  What have we learned about high-speed P4-based packet processing on FPGAs?
* **Optimising the SCION protocol for HW**  
  How can we adjust SCION to enable more efficient implementations in HW?

---

# Aims

---

# #1: Prod-ready SCION BR

.center.width100[![Parts of the project](./img/scion-breakdown.svg)]

---

# #1: Ready-to-deploy SCION BR

* forwarding at **40Gbps or more**
* usable with real traffic
* integrated with existing SCION infra
  * control plane
  * monitoring / metrics

---

# #2: SCION as a library

???

Now, why is this a good idea?
1. because modular code is easier to deal with
2. [NEXT] because you can base other things on it

---

.center.width100[![Parts of the project](./img/scion-breakdown.svg)]

???

you can take my BR and pick and choose which parts you want:

---

.center.width100[![Parts of the project](./img/scion-breakdown-swap-aes.svg)]

???

bring your own AES or even use a completely different MAC validation scheme

---

.center.width100[![Parts of the project](./img/scion-breakdown-parser.svg)]

???

change parser: swap your L2, use different encaps, or even parse the IFIDs in HFs in some special way

---

.center.width100[![Parts of the project](./img/scion-breakdown.svg)]

---

.center.width100[![NMS](./img/scion-nms.svg)]

---

.center.width100[![NMS](./img/scion-smartnic.svg)]

???

You could have a special-purpose end host with a P4-enabled SmartNIC, and do some of the data processing straight in the NIC => use my parser and deparser in a completely different architecture

---

# #3: Guidelines for high-speed P4

**a) P4+NetFPGA project template suitable for large projects**

* software-engineer-friendly
* simple way to add tests
* ready for multi-platform usage
* already used by 2 other ETH students

---

# #3: Guidelines for high-speed P4

**<strike>P4+NetFPGA guidelines + project template suitable for large projects</strike>**

.center.huge[:-(]

Discarded, as the current P4 toolchain for NetFPGA is **not** suitable for production --
see Challenges for details.

---

# #3: Guidelines for high-speed P4

**tips that help software engineers write P4 code that performs well on FPGA-based hardware**

**meeting timing requirements** a frequent problem  
⇒ check the **critical path** in your implemented design
 * find it in the timing report
 * you'll have to guess about the correspondence to the P4 code, but it can be done

---

# #3: Guidelines for high-speed P4

some examples:

* to meet timing, avoid long data paths
  * `inout` parameters get compiled into long paths ⇒ avoid them in critical spots
  * rewrite code like this:

<div style="clear: both;"></div>
<div style="position: relative; margin: -1em auto; width: 90%;">
.floatleft[
```sh
if [long computation 1]
then [long computation 2]
```
]
.floatright[
```sh
[computation 1];
[computation 2];
[combine the results]    
```
]
    <div style="position: absolute; top: 1.4em; left: 52.5%;">⇒</div>
</div>
<div style="clear: both; margin-bottom: -1em;"></div>

* CAM tables (`exact` match) are relatively expensive ⇒ prefer to combine multiple lookups into one or completely avoid tables

???

and where to look to find out which spots are critical

---

# #4: Optimising the SCION protocol for HW

⇒ to answer, let's talk about the challenges...
]

???

so, those are the aims, now, how do we meet them? what are the most challenging aspects of the project?

---

# Challenges

???

---

# Making progress quickly: Reuse, Reduce, Recycle

.center.width80.pushup[![Parts of the project](./img/scion-controlplane.svg)]

transparently pass unhandled packets to SW through 1:1 "fake" network interfaces (really DMA)  
⇒ iteratively move functionality into HW

---

# Implementing the parser

SCION header:

.floatleft.margin0[
![header stack](./img/header-stack.svg)
]

---
???
my experience with NetFPGA, this would be different with a different compiler
---

.floatright.margin0[
First idea: Use header stacks:

```c
struct ScionHeader_t {
    …
    HopField_h[32] seg1_hfs;
    …
}
```
]

---

.floatright.margin0[
First idea: Use header stacks:

```c
struct ScionHeader_t {
    …
    HopField_h[32] seg1_hfs;
    …
}
```

---

# Implementing the parser

Fix: We don't need to parse the whole path, we just need to save it so we can emit it later ⇒ use a `varbit` field

.floatleft.margin0.width30[
![header stack](./img/header-stack-varbit.svg)
]

.floatright.margin0.red[
NetFPGA compiler does not  
support `varbit`
]

---

# Implementing the parser

Fix: Don't even save the path: use `packet_mod`:

```c
parser ExModDeparser(packet_mod p, in headers_t h) {
    state start {
*       p.update(h.ethernet);
        transition select(h.ethernet.ethertype) {
            ETHERTYPE_IPV4: deparse_ipv4;
            ETHERTYPE_IPV6: deparse_ipv6;
        }
    }
    …
}
```

Xilinx extension, **not** standard P4 — but standard P4 can use `varbit`

---

# Implementing the parser

Fix: Don't even save the path: use `packet_mod`:

1. modify NetFPGA design to use the `packet_mod`-enabled `XilinxStreamSwitch` architecture  
   .light[\* experimental :-)]
2. skip over the path with `packet.advance(size)`

---

# Implementing the parser

1. .light[modify NetFPGA design to use the `packet_mod`-enabled `XilinxStreamSwitch` architecture]  
   .light[\* experimental :-)]
2. .light[skip over the path with `packet.advance(size)`]
3. **Create a lot of separate sub-parsers: one for skipping 1 HF, one for skipping 2, etc.**
   Select the correct sub-parser at runtime, but the jump size is fixed at compile time.

---

# Implementing the parser

Create a lot of separate sub-parsers: up to .hl[max length].

Problem: To support reasonably long paths (~64 HFs), we need a lot of sub-parsers.  
⇒ Requires a lot of FPGA area + RAM to build it.

---

# Implementing the parser

Create a lot of separate sub-parsers: up to .hl[max length].

Problem: To support reasonably long paths (~64 HFs), we need a lot of sub-parsers.  
⇒ Requires a lot of FPGA area + RAM to build it.

Fix: Two stages of max length 8:

---
count: false
.width100[![sqrt1](./img/sqrt1.svg)]
---
count: false
.width100[![sqrt2](./img/sqrt2.svg)]
---
count: false
.width100[![sqrt3](./img/sqrt3.svg)]
---
count: false
.width100[![sqrt4](./img/sqrt4.svg)]
---
layout: false
class: expanded1

# Portability

* keeping the code portable despite the differences in P4 support
  * P4 should be target-independent, but…
    * different externs
    * different compiler limitations (ahem)
* making it simple to compile this BR for a different platform

---

# Portability

Pieces of the solution:
* preprocessor `#define`s decouple platform from features; code depends only on features
* well-thought-out repo structure => minimise work needed to add new platform
* do it & document it

Planning to support at least:
* NetFPGA
* P4lang's software switch
* later: other FPGA-based NICs

---

# Performance / Timing

* at 200MHz, the timing is very tight
* NetFPGA compiler has many bugs
  * workarounds cost extra resources
* complicated by P4 ⇒ HDL translation
  * no direct control over the resulting design

---

# Making progress quickly

* modular design makes it easier to test incremental changes and locate bugs
* tackling the hard bits first ⇒ I can't get stuck
* **the NetFPGA's P4 stack is pretty much a PoC**; documentation is generally severely lacking  
  therefore:
  * try things ASAP
  * change plan if something doesn't work
  * **guess**

---

# Suggestions for SCION protocol

* <strike>require few table lookups</strike> ✓
* <strike>variable lengths are bad</strike> ✓
  * SCION avoids them where possible
* define maximum sizes:
  * max path length
  * max HF size
* make things explicit
  * include end host address length in header
* consider re-thinking MAC chaining for peering paths
  * further research into the tradeoffs is needed

---

# Status

---

.center.width100[![Status](./img/scion-breakdown-status.svg)]

---

# Future plans

* SCION packet header redesign
* production-readiness
  * handle all the cases
  * with a different P4 compiler...
* power requirements optimisation
  * compare with IP
* make it faster!
  * planning 1Tbps using multiple FPGA-enabled NICs

---

# Current status

The look of this slide: cases pic from SCION book that shows which cases I implement and then on click a big number saying 40Gbps (hopefully :D)

---

# Say hi!

Email: **scion@kamila.is**  
Twitter: **@anotherkamila**  
Matrix: **@kamila:unchat.cat**