nndocs:infiniband
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nndocs:infiniband [2024/03/25 17:40] – [Connected vs. Datagram] Elucidate naptastic | nndocs:infiniband [2025/01/21 14:38] (current) – [Networking] correct a thing naptastic | ||
|---|---|---|---|
| Line 7: | Line 7: | ||
| For hardware support, Mellanox provides MLNX_OFED, an overlay for several distributions. Unfortunately, | For hardware support, Mellanox provides MLNX_OFED, an overlay for several distributions. Unfortunately, | ||
| - | ^ MLNX_OFED version | + | ^ Version | 
| | Inbox | | All | All | 3.3.23-2 | | | Inbox | | All | All | 3.3.23-2 | | ||
| - | | 4.9-x | ConnectX-2 | ≤ 11 | ≤ 20.04 | 5.7.2 | | + | | MLNX_OFED | 
| - | | 5.8-x | ConnectX-4 | ≥ 9 | ≥ 18.04 | 5.17.0 | | + | | MLNX_OFED | 
| ===How I'm Getting Around It=== | ===How I'm Getting Around It=== | ||
| Line 29: | Line 29: | ||
| ====The MLNX part==== | ====The MLNX part==== | ||
| - | It's worth investigating other tools provided with MLNX_OFED to see if they offer compelling advantages over inbox versions. I'm not doing that right now because I suspect | + | Old OpenSM has this annoying problem where, | 
| - | MLNX_OFED_LINUX-4.9-7.1.0.0-ubuntu20.04-x86_64/ | + | MLNX_OFED_LINUX-24.07-0.6.1.0-debian12.5-x86_64/ | 
| - | + | ||
| - | Newer versions of MLNX_OFED have newer versions of OpenSM. I haven' | + | |
| There' | There' | ||
| - | # dpkg -i ibdump_6.0.0-1.49710_amd64.deb | + | # dpkg -i ibdump_6.0.0-1.2407061_amd64.deb | 
| =====The Subnet Manager: OpenSM===== | =====The Subnet Manager: OpenSM===== | ||
| Line 80: | Line 78: | ||
| Here's a block for my ATA over Ethernet experiments. Subject to change. IP addresses are necessary for setting up VXLAN tunnels. Checking if IPv6 tunnels perform differently from IPv4 tunnels is on the to-do list. I suspect they perform better. Needs testing. | Here's a block for my ATA over Ethernet experiments. Subject to change. IP addresses are necessary for setting up VXLAN tunnels. Checking if IPv6 tunnels perform differently from IPv4 tunnels is on the to-do list. I suspect they perform better. Needs testing. | ||
| - |  | + |  | 
| mgid=ff12: | mgid=ff12: | ||
| mgid=ff12: | mgid=ff12: | ||
| Line 87: | Line 85: | ||
| ====Partitions: | ====Partitions: | ||
| There' | There' | ||
| - | echo 0xb129 | + |  | 
| - | The sysfs interface | + | Resist the temptation to rename the interface | 
| - | ip link del ib0.b129 | + | |
| - | Resist the temptation to rename the interface to something descriptive. **It's already self-descriptive**. Creative naming is for VXLAN tunnels and bridges. | + | # ip link add vx128 type vxlan id 128 local 172.20.128.13 group 225.172.20.128 | 
| + | # ip link set master aoe1 dev vx128 | ||
| + | |||
| + | The sysfs interface | ||
| + | # ip link del ib0.b128 | ||
| - | If you unset the high bit on the partition number (0x3129 | + | If you unset the high bit on the partition number (0x3128 | 
| It's worth finding out if Netplan can manage IB child interfaces. | It's worth finding out if Netplan can manage IB child interfaces. | ||
| Line 119: | Line 120: | ||
| Since I don't have any newer hardware, I don't have any information about Enhanced IPoIB. | Since I don't have any newer hardware, I don't have any information about Enhanced IPoIB. | ||
| + | |||
| =====SR-IOV===== | =====SR-IOV===== | ||
| ====Hardware Settings==== | ====Hardware Settings==== | ||
| Line 129: | Line 131: | ||
| SRIOV_EN | SRIOV_EN | ||
| - | FPP_EN (Flow Priority something) controls whether the card appears as two PCI devices, or as a single device with two ports. Under mlx4, every VF on a dual-port HCA has both ports, and NUM_OF_VFs is how many dual-port devices to create. Under mlx5, each port gets its own pool of VFs and NUM_OF_VFs is per-port. | + | FPP_EN (Function Per Port ENable) controls whether the card appears as two PCI devices, or as a single device with two ports. Under mlx4, every VF on a dual-port HCA has both ports, and NUM_OF_VFs is how many dual-port devices to create. Under mlx5, each port gets its own pool of VFs and NUM_OF_VFs is per-port. | 
| I haven' | I haven' | ||
| - | After a reboot, there should be a new file, /sys/bus/pci/devices/0000:b:d:f/sriov_numvfs. Try turning it on. If it works, there will be new PCI devices as well as VFs listed under `ip link`: | + | To make VFs exist, put a number <= NUM_OF_VFS into sriov_numvfs for that device. Before doing so, I recommend turning off VF probing. Otherwise the VFs will all make IPoIB interfaces, which probably isn't what you want. This setting is per PF. | 
| + | |||
| + | I'm still checking if there' | ||
| + | |||
| + | # echo 0 > /sys/class/infiniband/ibp13s0f0/device/sriov_drivers_autoprobe | ||
| + | # echo 0 > / | ||
| + | |||
| + | If it works, there will be new PCI devices as well as VFs listed under `ip link`: | ||
| # echo 7 > / | # echo 7 > / | ||
| - |  | ||
| # lspci | grep nfi | # lspci | grep nfi | ||
| 06:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] | 06:00.0 Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] | ||
| Line 168: | Line 176: | ||
| 5: ib1: < | 5: ib1: < | ||
| link/ | link/ | ||
| - | 10: ib2: < | ||
| - | link/ | ||
| - | 11: ib3: < | ||
| - | link/ | ||
| - | 12: ib4: < | ||
| - | link/ | ||
| - | 13: ib5: < | ||
| - | link/ | ||
| - | 14: ib6: < | ||
| - | link/ | ||
| - | 15: ib7: < | ||
| - | link/ | ||
| - | 16: ib8: < | ||
| - | link/ | ||
| ====VF Configuration==== | ====VF Configuration==== | ||
| - | To set the GUID for VFs, set node_guid, port_guid, and state using ip link. Make port_guid == node_guid == unique. (I use the base port guid + VF + 1.) | + | The official documentation covers a sysfs interface | 
| - | Lazy copy-pasta for southpark. Note this only sets up VFs for port 1, but that's the only port plugged in right now anyway, so w/e. | + | GUIDs need to be set before attaching a VF to a VM. It should be possible to change state (simulating unplugging the cable) while a VM is using a VF but I haven't tested this. | 
| - | ip link set dev ib0 vf 0 node_guid 58: | + | |
| - | ip link set dev ib0 vf 0 port_guid 58: | + | |
| - | ip link set dev ib0 vf 0 state enable | + | |
| - | ip link set dev ib0 vf 1 node_guid 58: | + | |
| - | ip link set dev ib0 vf 1 port_guid 58: | + | |
| - | ip link set dev ib0 vf 1 state enable | + | |
| - | ip link set dev ib0 vf 2 node_guid 58: | + | |
| - | ip link set dev ib0 vf 2 port_guid 58: | + | |
| - | ip link set dev ib0 vf 2 state enable | + | |
| - | ip link set dev ib0 vf 3 node_guid 58: | + | |
| - | ip link set dev ib0 vf 3 port_guid 58: | + | |
| - | ip link set dev ib0 vf 3 state enable | + | |
| - | ip link set dev ib0 vf 4 node_guid 58: | + | |
| - | ip link set dev ib0 vf 4 port_guid 58: | + | |
| - | ip link set dev ib0 vf 4 state enable | + | |
| - | ip link set dev ib0 vf 5 node_guid 58: | + | |
| - | ip link set dev ib0 vf 5 port_guid 58: | + | |
| - | ip link set dev ib0 vf 5 state enable | + | |
| - | ip link set dev ib0 vf 6 node_guid 58: | + | |
| - | ip link set dev ib0 vf 6 port_guid 58: | + | |
| - | ip link set dev ib0 vf 6 state enable | + | |
| - | + | ||
| - | Lazy copy-pasta for sadness: | + | |
| - | ip link set dev ib0 vf 0 node_guid 58: | + | |
| - | ip link set dev ib0 vf 0 port_guid 58: | + | |
| - | ip link set dev ib0 vf 0 state enable | + | |
| - | ip link set dev ib0 vf 1 node_guid 58: | + | |
| - | ip link set dev ib0 vf 1 port_guid 58: | + | |
| - | ip link set dev ib0 vf 1 state enable | + | |
| - | ip link set dev ib0 vf 2 node_guid 58: | + | |
| - | ip link set dev ib0 vf 2 port_guid 58: | + | |
| - | ip link set dev ib0 vf 2 state enable | + | |
| - | ip link set dev ib0 vf 3 node_guid 58: | + | |
| - | ip link set dev ib0 vf 3 port_guid 58: | + | |
| - | ip link set dev ib0 vf 3 state enable | + | |
| - | ip link set dev ib0 vf 4 node_guid 58: | + | |
| - | ip link set dev ib0 vf 4 port_guid 58: | + | |
| - | ip link set dev ib0 vf 4 state enable | + | |
| - | ip link set dev ib0 vf 5 node_guid 58: | + | |
| - | ip link set dev ib0 vf 5 port_guid 58: | + | |
| - | ip link set dev ib0 vf 5 state enable | + | |
| - | ip link set dev ib0 vf 6 node_guid 58: | + | |
| - | ip link set dev ib0 vf 6 port_guid 58: | + | |
| - | ip link set dev ib0 vf 6 state enable | + | |
| - | + | ||
| - | Lazy copy-pasta for shark: | + | |
| - | ip link set dev ib0 vf 0 node_guid 58: | + | |
| - | ip link set dev ib0 vf 0 port_guid 58: | + | |
| - | ip link set dev ib0 vf 0 state enable | + | |
| - | + | ||
| - | These should really go on their own page. Or better yet, figure out how to configure them on the host! | + | |
| + | Configuration is managed in / | ||
| =====Upper-Layer Protocols (ULPs)===== | =====Upper-Layer Protocols (ULPs)===== | ||
| RDMA opens all kinds of possibilities for RDMA-aware protocols to be amazing and fast. They probably all deserve their own pages. | RDMA opens all kinds of possibilities for RDMA-aware protocols to be amazing and fast. They probably all deserve their own pages. | ||
| Line 264: | Line 208: | ||
| ====Networking==== | ====Networking==== | ||
| ===VXLAN=== | ===VXLAN=== | ||
| - | VXLAN is not the only way to get an Ethernet device on Infiniband, but as far as I can tell it's the only decent one. Neither ConnectX-3 nor Connect-IB | + | VXLAN is not the only way to get an Ethernet device on Infiniband, but as far as I can tell it's the only decent one. None of my hardware | 
| * VXLAN id can be anything from 0-16777215 inclusive. I make it match the network number. | * VXLAN id can be anything from 0-16777215 inclusive. I make it match the network number. | ||
| Line 292: | Line 236: | ||
| I also want to throw audio frames around with "no latency added" | I also want to throw audio frames around with "no latency added" | ||
| + | |||
| + | =====GUIDs===== | ||
| + | * 5849560e59150301 - shark Connect-IB | ||
| + | * 5849560e53b70b01 - southpark Connect-IB | ||
| + | * 5849560e53660101 - duckling Connect-IB | ||
| + | * 7cfe900300a0a080 - uninstalled Connect-IB | ||
| + | * (there are several more uninstalled Connect-IB cards) | ||
| + | * f4521403002c18b0 - uninstalled ConnectX-3 2014-01-29 | ||
| + | * 0002c90300b37f10 - uninstalled ConnectX-3 with no date on the label | ||
| + | * 001175000079b560 - uninstalled qib | ||
| + | * 001175000079b856 - uninstalled qib | ||
| + | |||
nndocs/infiniband.1711388407.txt.gz · Last modified: 2024/03/25 17:40 by naptastic
                
                