Skip to content

Future_Kdriver_Supervisor

E. Scott Daniels edited this page Aug 14, 2018 · 2 revisions

Kernel driver SR-IOV supervisor specification

Currently many VNFs are "migrated" from purposely built HW solutions into virtualized environments and often expect network interfaces (vNICs attached to SR-IOV VFs) to have properties like 802.1Q VLAN trunking, VLAN push/pop,
security, mirroring, statistics, QoS, etc, we used to have in traditional switches. In addition multi-tenant
virtualized environments require SR-IOV implementations where security and network integrity cannot be compromised.

Much in the way that a guest operating system requires a hypervisor to provide accessibility to the underlying real operating system and hardware, as well as to ensure policies are enforced, there is a similar need for a"NIC hypervisor when virtual functions (VFs) are directly
available through SR-IOV. This effort is an attempt to "fill the gap" by adding necessary functionality into NICs Linux kernel driver.

To support SR-IOV hypervisor management interface, the kernel driver is required to support a new sysfs file hierarchy from existing /sys/class/net. Below is proposed sysfs file structure that is needed to support SR-IOV hypervisor management interface.

 /sys/class/net/<interface-name>/device/sriov [1]
 | +-- qos
        +-- [TC, 0-7]            # TC
                 | +-- priority  # list of PCP values mapped to this TC
                 | +-- lsp       # link strict priority
                 | +-- max_bw    # max bandwidth for this class 
                 | +-- min_bw    # min bandwidth for this class
 | +-- egress_mirror             # mirror traffic from this PF to specified VF
 | +-- ingress_mirror
 +-- [VF-id, 0 ... 127] [2]
        | +-- vlan_mirror        # list of VLANs to mirror to this VF
        | +-- trunk              # list of VLANs to filter on (802.1Q trunk)
        | +-- tpid               # TPID of outer (s-tag) 0x8100 | 0x88A8
        | +-- egress_mirror      # mirror traffic from this VF to specified VF
 | +-- ingress_mirror
 | +-- mac_anti_spoof            # enable/disable MAC anti spoofing
 | +-- vlan_anti_spoof           # enable/disable VLAN anti spoofing
 | +-- loopback                  # enable/disable local traffic loopback (VEB/VEPA)
 | +-- default_mac               # default MAC, if not set use random
 | +-- mac_list                  # list of additional MACs (00:11:22:33:44:55, aa:bb:cc:dd:ee:ff)
 | +-- ucast_promisc             # unicast promiscuous
 | +-- mcast_promisc             # multicast promiscuous
 | +-- allow_bcast               # allow/not allow bcast 
 | +-- strip_stag                # strip outer tag (s-tag)
 | +-- enable                    # enable/disable VF
 | +-- link_state                # up/down
 | +-- queue_type                # type of queues 0 " RSS, 1 " QoS
 | +-- num_queues                # num of RSS queues allocated to this VF, if queue_type QoS same as number TCs set in PF
 | +-- max_tx_rate               # ignore if TC in use
 | +-- min_tx_rate               # ignore if TC in use
 | +-- stats                     # 64 bit counters
         | +-- rx_bytes
         | +-- rx_packets
         | +-- rx_dropped
         | +-- tx_bytes
         | +-- tx_packets
         | +-- tx_dropped
         | +-- tx_spoofed
         | +-- reset_stats        # reset VF stats counters
 | +-- qos
        +-- [TC, 0-8]
                  | +-- share     # % share of TC for this VF

[1] kobject started from "sriov" is not available from existing kernel sysfs, and it requires device driver to implement this interface.

[2] assume maximum # of VF supported by a PF is 128. To support a device that supports more than 128 SR-IOV instances, a "vfx" is added to 0..127. With "vfx" kobject, users need to add vf index as the first parameter and followed by ":".

SR-IOV Hypervisor Functions

Below are definitions of SR-IOV hypervisor functions:

vlan_mirror

The vlan_mirror sysfs kobject supports both ingress and egress traffic mirroring.

Example of how a user could mirror traffic based upon VLANs 2,4,6,18-22 to VF 3 of PF p1p1

 # echo add 2,4,6,18-22 > /sys/class/net/p1p1/device/sriov/3/vlan_mirror

Example of how a user could remove VLAN 4, 15-17 from traffic mirroring at destination VF 3.

 # echo rem 15-17 >/sys/class/net/p1p1/device/sriov/3/vlan_mirror

Example of how a user could remove all VLANs from mirroring at VF 3.

 # echo rem 0 - 4095> /sys/class/net/p1p1/device/sriov/3/vlan_mirror

trunk

The trunk sysfs kobject supports two operations: add and rem. The add operator supports users to add one or more VLAN id into VF VLAN filtering. The rem operator supports removing VLAN ids from the VF VLAN filtering list.

Example of how a user can add multiple VLAN tags, VLANs 2,4,5,10-20, by PF, p1p2, on a selected VF, 1, for filtering, with the sysfs support:

  #echo add 2,4,5,10-20 > /sys/class/net/p1p2/device/sriov/1/trunk

Example of how a user could remove VLANs 5, 11-13 from PF p1p2 VF 1 with sysfs:

  #echo rem 5,11-13 > /sys/class/net/p1p2/device/sriov/1/trunk

Note: for rem, if VLAN id is not on the VLAN filtering list, the VLAN id will be ignored.

tpid

The trunk sysfs kobject used to specify TPID ot the outer VLAN tag (s-tag). Default value should be 0x8100. Could be set to 0x88A8 or 0x8100 or decimal equivalent 33024 | 34984"

Example of how a user set TPID to 88a8:

  #echo 0x88a8 > /sys/class/net/p1p2/device/sriov/1/tpid

To show configured value:

 #cat /sys/class/net/p1p2/device/sriov/1/tpid

egress_mirror

The egress_mirror sysfs kobject supports egress traffic mirroring.

Example of how a user could add egress traffic mirroring on PF p1p2 VF 1 to VF 7

 #echo add 7 > /sys/class/net/p1p2/device/sriov/1/egress_mirror

remove egress traffic mirroring on PF p1p2 VF 1 to VF 7

 #echo rem 7 > /sys/class/net/p1p2/device/sriov/1/egress_mirror

ingress_mirror

The ingress_mirror sysfs kobject support ingress traffic mirroring. Example of how a user could mirror ingress traffic on PF p1p2 VF 1 to VF 7

  #echo add 7 > /sys/class/net/p1p2/device/sriov/1/ingress_mirror

Example of how a user could show current ingress mirroring configuration

  #cat /sys/class/net/p1p2/device/sriov/1/ingress_mirror

mac_anti_spoof

The mac_anti_spoof sysfs kobject supports Enable/Disable MAC anti-spoof. Currently ip link controls VLAN/MAC
anti-spoofing together. This feature will allow VFs to transmit packets with any SRC MAC which is needed for some L2 applications as well as vNIC bonding within VMs if set to OFF. Violation have to increment tx_spoof stats counter if set to ON and packets have to be dropped

Example of how a user could enable MAC anti-spoof for PF p2p1 VF 1

 #echo 1 > /sys/class/net/p1p2/device/sriov/1/mac_anti_spoof

Example of how a user could disable MAC anti-spoof for PF p2p1 VF 1

 #echo 0 > /sys/class/net/p1p2/device/sriov/1/mac_anti_spoof

vlan_anti_spoof

The vlan_anti_spoof sysfs kobject supports Enable/Disable VLAN anti-spoof. Currently ip link controls VLAN/MAC
anti-spoofing together. This feature will allow VFs to transmit packets only with VLAN tag specified in "trunk" settings, also will not allow to transmit "untagged" packets if set to ON. Violation have to increment tx_spoof stats counter.

Example of how a user could enable VLAN anti-spoof for PF p2p1 VF 1

 #echo 1 > /sys/class/net/p1p2/device/sriov/1/vlan_anti_spoof

Example of how a user could disable VLAN anti-spoof for PF p2p1 VF 1

 #echo 0 > /sys/class/net/p1p2/device/sriov/1/vlan_anti_spoof

To display current settings

 #cat  /sys/class/net/p1p2/device/sriov/1/vlan_anti_spoof

loopback

The loopback sysfs kobject supports Enable/Disable VEB/VEPA (Local loopback).

Example of how a user could allow traffic switching between VFs on the same PF

 #echo 1 > /sys/class/net/p1p2/device/sriov/loopback

Example of how a user put Hairpin traffic to the switch PF is connected to

 #echo 0 > /sys/class/net/p1p2/device/sriov/loopback

Example of how to show loopback configuration.

 #cat /sys/class/net/p1p2/device/sriov/loopback

default_mac

The mac sysfs kobject supports setting default MAC address, If MAC address is set by this command, PF won't allow VF to change it using MBOX request Example of setting default MAC address to VF 1

  #echo "00:11:22:33:44:55" > /sys/class/net/p1p2/device/sriov/1/default_mac

Example of how to show default MAC address

  #cat /sys/class/net/p1p2/device/sriov/1/default_mac

mac_list

The mac_list sysfs kobject supports adding additional MACs to the VF. Default MAC is taken from "ip link set p1p2 vf 1 mac 00:11:22:33:44:55" if configured. If not configures, random address is assigned to VF by NIC. If mac configured using IP LINK command, it doesn't allow VF to change it via MBOX/AdminQ requests "

Example of how to add mac 00:11:22:33:44:55 and
00:66:55:44:33:22 to PF p1p2 VF 1 #echo add
"00:11:22:33:44:55,00:66:55:44:33:22" > /sys/class/net/p1p2/device/sriov/1/mac_list

Example of how to delete mac 00:11:22:33:44:55 from above VF device

 #echo rem 00:11:22:33:44:55 > /sys/class/net/p1p2/device/sriov/1/mac_list

Example of how to display a VF MAC address list

 #cat /sys/class/net/p1p2/device/sriov/1/mac_list

ucast_promisc

The ucast_promisc sysfs kobject supports setting/unsetting VF device unicast promiscuous mode promiscuous mode

Example of how to set unicast promiscuous on PF p1p2 VF 1

 #echo 1 > /sys/class/net/p1p2/device/sriov/1/ucast_promisc

Example of how to unset unicast promiscuous on PF p1p2 VF 1

 #echo 0 > "/sys/class/net/p1p2/device/sriov/1/ucast_promisc

Example of how to show current promiscuous mode
configuration

 #cat /sys/class/net/p1p2/device/sriov/1/ucast_promisc

mcast_promisc

The mcast_promisc sysfs kobject supports setting/unsetting VF device multicast promiscuous mode promiscuous mode Example of how to set unicast promiscuous on PF p1p2 VF 1

 #echo 1 > /sys/class/net/p1p2/device/sriov/1/mcast_promisc

Example of how to unset unicast promiscuous on PF p1p2 VF 1

 #echo 0 > "/sys/class/net/p1p2/device/sriov/1/mcast_promisc

Example of how to show current promiscuous mode
configuration

 #cat /sys/class/net/p1p2/device/sriov/1/mcast_promisc

allow_bcast

The allow_bcast sysfs kobject supports enabling/disabling VF device to receive promiscuous broadcast packets

Example of how to allow broadcast on PF p1p2 VF 1

 #echo 1 > /sys/class/net/p1p2/device/sriov/1/allow_bcast

Example of how to disable bcast on PF p1p2 VF 1

 #echo 0 > "/sys/class/net/p1p2/device/sriov/1/allow_bcast

Example of how to show current promiscuous mode
configuration

 #cat /sys/class/net/p1p2/device/sriov/1/allow_bcast

strip_stag

The strip_stag sysfs kobject supports enabling/disabling VF device outer VLAN stripping. If vlan is stripped
information have to be posted to RX descriptor. On transmit VLAN id from TX descriptor have to be inserted to packet

Example of how to enable VLAN strip on VF 3

 # echo 1 > /sys/class/net/p1p1/device/sriov/3/strip_stag

Example of how to disable VLAN striping VF 3

 # echo 0 > /sys/class/net/p1p1/device/sriov/3/strip_stag

enable

The enable sysfs kobject supports enabling/disabling VF device

Example of how to enable VF 3

 # echo 1 > /sys/class/net/p1p1/device/sriov/3/enable

Example of how to disable VF 3

 # echo 0 > /sys/class/net/p1p1/device/sriov/3/enable

Show VF 3 enable state

 # cat /sys/class/net/p1p1/device/sriov/3/enable

link_state

The link_state sysfs kobject displays link status
(up/down/disabled) Example of how to enable VF 3

 # cat /sys/class/net/p1p1/device/sriov/3/link_state

stats

The stats sysfs kobject supports getting VF statistics (64bit counters)

Example of how to display stats of VF 1

 #cat /sys/class/net/p1p2/device/sriov/1/stats
 rx_bytes
 rx_dropped
 rx_packets
 tx_bytes
 tx_dropped
 tx_packets
 tx_spoofed

Example of how to display anti-spoofing violations counter for VF 1

  #cat /sys/class/net/p1p2/device/sriov/1/stats/tx_spoofed

reset_stats

The reset_stats sysfs kobject resets VF stats counters

Example of how to reset stats for VF 1

 #echo > 1 /sys/class/net/p1p2/device/sriov/1/stats/reset_stats

queue_type

The queue_type sysfs kobject is used to set type of queues 0- RSS, 1 - QoS, default RSS

Example of how to set queue type RSS for VF 3

 # echo 0 > /sys/class/net/p1p1/device/sriov/3/queue_type

Example of how to set type QoS for VF 3

 # echo 1 > /sys/class/net/p1p1/device/sriov/3/queue_type

Show VF 3 queue type

 # cat /sys/class/net/p1p1/device/sriov/3/queue_type

num_queues

The num_queues sysfs kobject is used to set number of queues for VF, If queue_type is QoS number of queues cannot be set (have to be equal to number of TC set for PF)

Example of how to set 8 queues for VF5 if queue_type is RSS

 # echo 8 > /sys/class/net/p1p1/device/sriov/5/num_queues

Show VF 5 number of queues for VF 5 type

 # cat /sys/class/net/p1p1/device/sriov/5/num_queues

max_tx_rate

The max_tx_rate sysfs kobject used to set MAX transmit rate in Mbps for VF (ignored if TC QoS is in used)

Example of how to set 200Mbps limit for VF 3

 # echo 200  > /sys/class/net/p1p1/device/sriov/3/max_tx_rate

Show VF 3 max_tx_rate

  # cat /sys/class/net/p1p1/device/sriov/3/max_tx_rate

min_tx_rate

The min_tx_rate sysfs kobject used to set MIX transmit rate in Mbps for VF (ignored if TC QoS is in used)

Example of how to set 20Mbps limit for VF 3

 # echo 20  > /sys/class/net/p1p1/device/sriov/3/min_tx_rate

Show VF 3 min_tx_rate

  # cat /sys/class/net/p1p1/device/sriov/3/min_tx_rate

share (vf QoS)

The share sysfs kobject used to set traffic share used for specified traffic class for VF (TC settings are done for PF in relevant QoS section)

Example of how to set 15% of traffic share for TC1 VF 7

 # echo 15  > /sys/class/net/p1p1/device/sriov/7/qos/1/share

To display current setting for TC3 VF 8

 # cat /sys/class/net/p1p1/device/sriov/8/qos/3/share

priority (PF qos settings)

The priority sysfs kobject used to set list of PCP values to map to traffic class

Example to set priority 0 and 1 to traffic class 0

 # echo 0,1  > /sys/class/net/p1p1/device/sriov/qos/0/priority

To display current setting for TC3

 # cat /sys/class/net/p1p1/device/sriov/qos/3/priority

lsp

The lsp sysfs kobject used to set Link Strict Priority

Example to set LSP priority for traffic class 0

 # echo 0,1  > /sys/class/net/p1p1/device/sriov/qos/0/lsp

To display current setting for TC0

 # cat /sys/class/net/p1p1/device/sriov/qos/0/lsp

max_bw

The max_bw sysfs kobject used to set Max bandwidth in Mbps for TC

Example to set Max bandwidth 2Gbps for traffic class 2

 # echo 2000  > /sys/class/net/p1p1/device/sriov/qos/2/max_bw

To display current setting for TC0

 # cat /sys/class/net/p1p1/device/sriov/qos/0/max_bw

min_bw

The mnx_bw sysfs kobject used to set Max bandwidth in Mbps for TC

Example to set Min bandwidth 20Mbps for traffic class 2

 # echo 20  > /sys/class/net/p1p1/device/sriov/qos/2/min_bw

To display current setting for TC0

 # cat /sys/class/net/p1p1/device/sriov/qos/2/min_bw