Skip to content

InfiniBand Packet Format Reference

English | 한국어

This document is a bit-level reference for InfiniBand packet headers, intended as a companion to the main analysis report. The report focuses on what was observed in the ib-packets dataset; this reference focuses on what every IB packet header looks like on the wire, with anchor links back to the report’s frame-level evidence wherever a concrete example exists.

Use this reference when:

  • Reading a hex dump and identifying field boundaries
  • Validating which extended header should appear after a given BTH opcode
  • Decoding an AETH syndrome value
  • Looking up a BTH opcode across all transport services

Source material: IBA Architecture Specification Volume 1 (Release 1.5), Wireshark InfiniBand dissector field list, and the Tencent Cloud transport-layer article.

An IB packet on the wire is a strict concatenation of headers, payload, and CRCs. Which extended header appears, and in what order, is fully determined by LRH.LNH (presence of GRH) and the BTH opcode (which extended headers follow).

+-----+-------+-----+-----------------------+----------+------+------+
| LRH | GRH? | BTH | Extended Header(s) | Payload | ICRC | VCRC |
+-----+-------+-----+-----------------------+----------+------+------+
8 40 12 0..28+ bytes variable 4 2

The LRH is the first IB header on every packet, used for fabric-local routing.

Bit layout (big-endian):

ByteBit patternFieldWidth
0VVVV LLLLVL[3:0] / LVer[3:0]4 + 4
1SSSS RR NNSL[3:0] / Reserved / LNH[1:0]4 + 2 + 2
2..3DDDDDDDD DDDDDDDDDLID16
4RRRRR PPPReserved / PktLen[10:8]5 + 3
5PPPPPPPPPktLen[7:0]8
6..7SSSSSSSS SSSSSSSSSLID16

Field meanings:

FieldDescription
VLVirtual Lane (0–15). VL15 is reserved for management traffic
LVerLink version, currently always 0
SLService Level (0–15), maps to QoS class
LNHLink Next Header — selects what follows the LRH
DLID / SLIDDestination / Source Local IDs (assigned by the SM)
PktLenPacket length in 4-byte words, excluding the LRH and VCRC

LNH encoding:

ValueMeaningWhat follows the LRH
0x0Raw IPv6 (legacy)IPv6 header directly
0x1Raw IPv4 (legacy)IPv4 header directly
0x2IBA LocalBTH (no GRH)
0x3IBA GlobalGRH + BTH

Concrete example: every packet in this dataset carries LNH = 0x2, which is why no GRH is decoded. See the worked example for infiniband.pcap frame 10 in the main report’s ERF Capture Anatomy section.

The GRH appears only when LRH.LNH = 0x3, signaling routing across IB subnets. The format mirrors IPv6.

Byte(s)FieldWidth
0 (high 4)IPVer4
0 (low 4) + 1 (high 4)TClass8
1 (low 4) + 2..3FlowLabel20
4..5PayLen16
6NxtHdr8
7HopLmt8
8..23SGID128
24..39DGID128

Notes:

  • IPVer is always 6.
  • NxtHdr = 0x1B (27 decimal) signals an IBA next header (BTH).
  • PayLen counts bytes after the GRH up to the start of ICRC.
  • SGID/DGID are 128-bit GIDs assigned by the SM.

This dataset does not contain any GRH-bearing packets, so this section is purely a reference for future cross-subnet captures.

BTH selects the transport operation, the destination QP, and the packet sequence number. It appears on every IBA packet (i.e., when LNH ∈ {0x2, 0x3}).

Bit layout:

Byte(s)FieldWidth
0OpCode8
1 (bit 7)SE (Solicited Event)1
1 (bit 6)M (Migration request)1
1 (bits 5..4)PadCnt2
1 (bits 3..0)TVer4
2..3P_Key16
4 (bit 7)F (FECN)1
4 (bit 6)B (BECN)1
4 (bits 5..0)Reserved6
5..7DestQP24
8 (bit 7)A (AckReq)1
8 (bits 6..0)Reserved7
9..11PSN24

Field meanings:

FieldNotes
OpCodeHigh 3 bits = transport service, low 5 bits = operation. See the BTH Opcode Master Table
SESolicited Event — set on the last packet of a SEND or RDMA WRITE message that should trigger a CQ event on the responder
MUsed during automatic path migration to signal request / accept
PadCnt0–3 bytes added at the end of the payload to align to 4-byte boundaries
TVerTransport header version, currently always 0
P_KeyPartition key; high bit = full vs limited membership, low 15 bits = partition ID
FECN / BECNForward / Backward Explicit Congestion Notification
DestQP24-bit destination Queue Pair number
AckReqWhen set on RC traffic, the responder must generate an ACK
PSN24-bit Packet Sequence Number; wraps modulo 2²⁴

PSN behaviors worth knowing:

  • The expected PSN is tracked per QP. A packet whose PSN equals the expected value advances the window.
  • A PSN within the duplicate range (older than expected, but within 2²³) is treated as a retransmission and acknowledged without delivering payload again.
  • A PSN beyond the duplicate range but earlier than expected is a sequence error and triggers a NAK with code 0.

Concrete example: infiniband.pcap frame 10 BTH:

Opcode = 4 (RC SEND Only)
SE = 0, M = 1, PadCnt = 0, TVer = 0
P_Key = 0xffff (full membership, default partition)
FECN = 0, BECN = 0
DestQP = <masked>
AckReq = 1
PSN = 13896277

Which extended header(s) follow the BTH is fully determined by the BTH opcode. The IBA spec encodes this as a per-opcode table; the per-operation summary is in Operation → Extended Header Mapping.

Required for UD and RD operations. Also used by all MAD traffic over QP0/QP1.

Byte(s)FieldWidth
0..3Q_Key32
4Reserved8
5..7SrcQP24

Q_Key conventions:

  • QP0 (SMP): Q_Key = 0
  • QP1 (GMP): Q_Key = 0x80010000
  • Other UD QPs: application-defined; high bit set = privileged

Concrete example: infiniband.pcap frame 1 SMP traffic uses DestQP = 0x000000, SrcQP = 0x00000000, Q_Key = 0x00000000.

Present in RDMA READ Request, RDMA WRITE First, RDMA WRITE Only, and the with-Immediate variants.

Byte(s)FieldWidth
0..7VA (Virtual Address)64
8..11R_Key32
12..15DMALen32

The responder must validate the request against the registered MR for R_Key: VA must lie within the MR’s address range, [VA, VA + DMALen) must be within bounds, and the access permissions of the MR must include READ or WRITE as needed.

This dataset contains no RETH-bearing packets.

Present in RC and RD ACK packets and in the first/last/only response packets of an RDMA READ.

Byte(s)FieldWidth
0Syndrome8
1..3MSN (Message Sequence Number)24

Syndrome encoding (8 bits):

BitFieldWidth
7Reserved1
6..5OpCode2
4..0Value5

OpCode values:

OpCodeMeaningValue field interpretation
00ACKCredit Count (0–30); 31 = no credit information supplied
01RNR NAKRNR Timer (selects retry delay from a fixed table; see IBA §9.7.5.2.8)
10Reserved
11NAKNAK code: 0=PSN seq error, 1=invalid request, 2=remote access error, 3=remote operation error, 4=invalid RD request

Concrete example: infiniband.pcap frame 11 has Syndrome = 31 decimal = 0x1F = 0 00 11111. This decodes to OpCode = 00 (ACK), Value = 11111 (no credit info) — a normal acknowledgment with no flow-control hint. See the main report’s frame-11 mapping for context.

Present in RC CmpSwap and FetchAdd request packets.

Byte(s)FieldWidth
0..7VA64
8..11R_Key32
12..19Swap Data (CmpSwap) / Add Data (FetchAdd)64
20..27Compare Data (CmpSwap) / Reserved (FetchAdd)64

Atomic operations are guaranteed at-most-once; retried requests are matched against a per-QP outstanding-atomic queue and replayed without re-executing the read-modify-write.

This dataset contains no atomic operations.

Carries the original (pre-atomic) value back to the requester. Sits after AETH on ATOMIC Acknowledge packets.

Byte(s)FieldWidth
0..7Original Remote Data64

Carries 32 bits of opaque data delivered to the receiver’s CQE. Present on opcodes whose name ends in “with Immediate”. Always sits last among extended headers (after RETH if a RDMA WRITE Only/Last with Immediate).

Byte(s)FieldWidth
0..3Immediate Data32

Carries an R_Key to be invalidated on the responder. Present on SEND Last with Invalidate and SEND Only with Invalidate.

Byte(s)FieldWidth
0..3R_Key32

Used by Reliable Datagram (RD) transport between BTH and DETH/RETH/etc. Carries an EE (End-to-End) context number. Rare in practice.

Byte(s)FieldWidth
0Reserved8
1..3EE Context24

XRCETH — 4 bytes (Extended Reliable Connection ETH)

Section titled “XRCETH — 4 bytes (Extended Reliable Connection ETH)”

Used by XRC transport to identify the SRQ on the receiver.

Byte(s)FieldWidth
0..3XRC SRQ32

Every MAD message begins with this 24-byte common header, regardless of management class. The MAD payload follows; for SMPs the total MAD length is fixed at 256 bytes.

Byte(s)FieldWidth
0BaseVersion8
1MgmtClass8
2ClassVersion8
3Method8
4..5Status16
6..7ClassSpecific16
8..15TID (Transaction ID)64
16..17AttributeID16
18..19Reserved16
20..23AttributeModifier32
24..255MAD data payload232 bytes (SMPs)

Common MgmtClass values seen in this dataset:

ValueClassUsed by
0x01SMP (LID-routed)LID-routed Subnet Management
0x03SubnAdm (SA)Path records, MC member records
0x04Performance ManagementPortCounters, PortCountersExtended, ClassPortInfo
0x32Vendor-specific OUIibping
0x81SMP (Directed Route)Initial fabric discovery (ib_initial_sniffer.pcap)

Common Method values:

ValueMethodNotes
0x01GetRead attribute
0x02SetWrite attribute
0x03SendUnsolicited
0x05TrapAsynchronous notification
0x06ReportSA report
0x07TrapRepressSuppress repeating traps
0x12GetTableSA table query
0x13GetTraceTableSA trace
0x15GetMultiSA multipart
0x81GetRespResponse to Get
0x86ReportRespResponse to Report

Concrete example: infiniband.pcap frame 1 = MgmtClass=0x81 (Directed-route SMP), Method=0x01 (Get), AttributeID=0x0020 (SMInfo). This is a SubnGet(SMInfo) packet.

When MgmtClass = 0x81, additional fields follow the MAD common header to carry the directed-route path. The fields exposed by the Wireshark dissector and visible in this dataset:

FieldWidthMeaning
D (Direction Bit)1 (top bit of the SMP’s status word)0 = outbound, 1 = inbound
Hop Pointer8Current position in the path
Hop Count8Total hops in the path
M_Key64Management protection key
DrSLID16Directed-route source LID; 0xffff = “use path”
DrDLID16Directed-route destination LID; 0xffff = “use path”

Beyond these, the SMP MAD body also carries InitialPath[64] and ReturnPath[64] byte arrays of port numbers, but those are payload fields rather than common SMP-DR header fields.

Concrete example: infiniband.pcap frame 1 has D=0, Hop Pointer=1, Hop Count=2, M_Key=0, DrSLID=0xffff, DrDLID=0xffff — a typical second-hop discovery probe.

IPoIB Encapsulation — 4 bytes (RFC 4391)

Section titled “IPoIB Encapsulation — 4 bytes (RFC 4391)”

When IPoIB carries an IP packet over a UD QP, a small header sits between the BTH/DETH and the IP layer.

Byte(s)FieldWidth
0..1EtherType16
2..3Reserved (must be 0)16

EtherType values seen:

ValueMeaning
0x0800IPv4
0x0806InfiniBand ARP (RFC 4391)
0x86DDIPv6

IPoIB ARP, despite the EtherType, is not the same as Ethernet ARP. RFC 4391 defines a 20-byte hardware address: QPN (24 bits) + Reserved (8 bits) + GID (128 bits). This is why ib_ipping_sniffer.pcap shows ARP records that look familiar but carry IB-specific addressing inside.

Concrete example: infiniband.pcap frame 10 has EtherType=0x0800, Reserved=0x0000, followed directly by an IPv4 ICMP Echo request. See the main report’s worked example.

InfiniBand defines two CRCs at packet level:

CRCWidthComputed overPurpose
ICRC (Invariant CRC)32Everything except mutable fields (variant header bits)End-to-end integrity, immutable across switches
VCRC (Variant CRC)16Entire packet on the linkPer-link integrity, recomputed by switches

Mutable fields excluded from ICRC include:

  • LRH.VL — switches may remap virtual lanes
  • LRH.SL/reserved bits — switches may reset reserved fields
  • GRH.HopLmt — decremented by routers
  • GRH.TClass — may be remarked
  • GRH.FlowLabel — may be remarked
  • BTH.FECN / BECN — set by congestion-notification points
  • BTH reserved variant bits

In the ERF captures both CRCs are exposed as filterable fields (infiniband.invariant.crc and infiniband.variant.crc). The Wireshark dissector does not validate them; trust erf.flags.rxe instead. See the main report’s preservation matrix for details.

The 8-bit OpCode is partitioned: top 3 bits identify the transport service, bottom 5 bits the operation.

Transport-service prefix:

Bits 7..5ServiceRange
000RC (Reliable Connection)0x00–0x1F
001UC (Unreliable Connection)0x20–0x3F
010RD (Reliable Datagram)0x40–0x5F
011UD (Unreliable Datagram)0x60–0x7F
100CNP (Congestion Notification, RoCEv2 only)0x80–0x9F
101XRC (Extended Reliable Connection)0xA0–0xBF

Operation suffix (5-bit, applies within each transport’s range; not all suffixes valid for every service):

SuffixOperation
0x00SEND First
0x01SEND Middle
0x02SEND Last
0x03SEND Last with Immediate
0x04SEND Only
0x05SEND Only with Immediate
0x06RDMA WRITE First
0x07RDMA WRITE Middle
0x08RDMA WRITE Last
0x09RDMA WRITE Last with Immediate
0x0ARDMA WRITE Only
0x0BRDMA WRITE Only with Immediate
0x0CRDMA READ Request
0x0DRDMA READ Response First
0x0ERDMA READ Response Middle
0x0FRDMA READ Response Last
0x10RDMA READ Response Only
0x11Acknowledge
0x12ATOMIC Acknowledge
0x13Compare & Swap
0x14Fetch & Add
0x16SEND Last with Invalidate
0x17SEND Only with Invalidate

Practical operation support per transport service:

ServiceSupported operations
RCAll of the above
UCSEND, RDMA WRITE only (no READ, no Atomic, no ACK)
RDAll except XRC-specific
UDSEND Only, SEND Only with Immediate (no RDMA, no Atomic, no ACK)
XRCRC operations with the addition of an XRCETH between BTH and the operation’s normal extended headers

Concrete examples seen in this dataset:

DecimalHexMeaningWhere
40x04RC SEND Onlyinfiniband.pcap frame 10
170x11RC Acknowledgeinfiniband.pcap frame 11
1000x64UD SEND OnlyAll MAD-bearing packets across this dataset

Which extended header(s) appear after the BTH is determined by the opcode. Use this table when reading a hex dump and asking “what comes next?”

OperationExtended headers (in order, after BTH)
RC/UC SEND First/Middle(none)
RC/UC SEND Last/Only(none)
RC/UC SEND Last/Only with ImmediateImmDt
RC SEND Last/Only with InvalidateIETH
RC/UC RDMA WRITE FirstRETH
RC/UC RDMA WRITE Middle/Last(none)
RC/UC RDMA WRITE Last with ImmediateImmDt
RC/UC RDMA WRITE OnlyRETH
RC/UC RDMA WRITE Only with ImmediateRETH + ImmDt
RC RDMA READ RequestRETH
RC RDMA READ Response First/Last/OnlyAETH
RC RDMA READ Response Middle(none)
RC ACK / NAKAETH
RC CmpSwap / FetchAddAtomicETH
RC ATOMIC AcknowledgeAETH + AtomicAckETH
UD SEND OnlyDETH
UD SEND Only with ImmediateDETH + ImmDt
RD any operationRDETH + DETH + (op-specific)
XRC any operationXRCETH + (RC-equivalent extended headers)

For the management traffic in this dataset (UD SEND Only with MgmtClass-tagged MAD), the layout is:

LRH → BTH → DETH → MAD common header → MAD payload → ICRC → VCRC

For the RC SEND Only carrying IPoIB ICMP in infiniband.pcap frame 10:

LRH → BTH → IPoIB encap (4B) → IPv4 → ICMP → ICRC → VCRC

For a hypothetical RC RDMA READ Request → multi-packet response:

Request: LRH → BTH(RDMA READ Request) → RETH → ICRC → VCRC
First: LRH → BTH(RDMA READ Response First) → AETH → payload → ICRC → VCRC
Middle: LRH → BTH(RDMA READ Response Middle) → payload → ICRC → VCRC
Last: LRH → BTH(RDMA READ Response Last) → AETH → payload → ICRC → VCRC