Based on kernel version 6.11
. Page generated on 2024-09-24 08:21 EST
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | .. SPDX-License-Identifier: GPL-2.0 ============================================== Management Component Transport Protocol (MCTP) ============================================== net/mctp/ contains protocol support for MCTP, as defined by DMTF standard DSP0236. Physical interface drivers ("bindings" in the specification) are provided in drivers/net/mctp/. The core code provides a socket-based interface to send and receive MCTP messages, through an AF_MCTP, SOCK_DGRAM socket. Structure: interfaces & networks ================================ The kernel models the local MCTP topology through two items: interfaces and networks. An interface (or "link") is an instance of an MCTP physical transport binding (as defined by DSP0236, section 3.2.47), likely connected to a specific hardware device. This is represented as a ``struct netdevice``. A network defines a unique address space for MCTP endpoints by endpoint-ID (described by DSP0236, section 3.2.31). A network has a user-visible identifier to allow references from userspace. Route definitions are specific to one network. Interfaces are associated with one network. A network may be associated with one or more interfaces. If multiple networks are present, each may contain endpoint IDs (EIDs) that are also present on other networks. Sockets API =========== Protocol definitions -------------------- MCTP uses ``AF_MCTP`` / ``PF_MCTP`` for the address- and protocol- families. Since MCTP is message-based, only ``SOCK_DGRAM`` sockets are supported. .. code-block:: C int sd = socket(AF_MCTP, SOCK_DGRAM, 0); The only (current) value for the ``protocol`` argument is 0. As with all socket address families, source and destination addresses are specified with a ``sockaddr`` type, with a single-byte endpoint address: .. code-block:: C typedef __u8 mctp_eid_t; struct mctp_addr { mctp_eid_t s_addr; }; struct sockaddr_mctp { __kernel_sa_family_t smctp_family; unsigned int smctp_network; struct mctp_addr smctp_addr; __u8 smctp_type; __u8 smctp_tag; }; #define MCTP_NET_ANY 0x0 #define MCTP_ADDR_ANY 0xff Syscall behaviour ----------------- The following sections describe the MCTP-specific behaviours of the standard socket system calls. These behaviours have been chosen to map closely to the existing sockets APIs. ``bind()`` : set local socket address ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sockets that receive incoming request packets will bind to a local address, using the ``bind()`` syscall. .. code-block:: C struct sockaddr_mctp addr; addr.smctp_family = AF_MCTP; addr.smctp_network = MCTP_NET_ANY; addr.smctp_addr.s_addr = MCTP_ADDR_ANY; addr.smctp_type = MCTP_TYPE_PLDM; addr.smctp_tag = MCTP_TAG_OWNER; int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr)); This establishes the local address of the socket. Incoming MCTP messages that match the network, address, and message type will be received by this socket. The reference to 'incoming' is important here; a bound socket will only receive messages with the TO bit set, to indicate an incoming request message, rather than a response. The ``smctp_tag`` value will configure the tags accepted from the remote side of this socket. Given the above, the only valid value is ``MCTP_TAG_OWNER``, which will result in remotely "owned" tags being routed to this socket. Since ``MCTP_TAG_OWNER`` is set, the 3 least-significant bits of ``smctp_tag`` are not used; callers must set them to zero. A ``smctp_network`` value of ``MCTP_NET_ANY`` will configure the socket to receive incoming packets from any locally-connected network. A specific network value will cause the socket to only receive incoming messages from that network. The ``smctp_addr`` field specifies a local address to bind to. A value of ``MCTP_ADDR_ANY`` configures the socket to receive messages addressed to any local destination EID. The ``smctp_type`` field specifies which message types to receive. Only the lower 7 bits of the type is matched on incoming messages (ie., the most-significant IC bit is not part of the match). This results in the socket receiving packets with and without a message integrity check footer. ``sendto()``, ``sendmsg()``, ``send()`` : transmit an MCTP message ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An MCTP message is transmitted using one of the ``sendto()``, ``sendmsg()`` or ``send()`` syscalls. Using ``sendto()`` as the primary example: .. code-block:: C struct sockaddr_mctp addr; char buf[14]; ssize_t len; /* set message destination */ addr.smctp_family = AF_MCTP; addr.smctp_network = 0; addr.smctp_addr.s_addr = 8; addr.smctp_tag = MCTP_TAG_OWNER; addr.smctp_type = MCTP_TYPE_ECHO; /* arbitrary message to send, with message-type header */ buf[0] = MCTP_TYPE_ECHO; memcpy(buf + 1, "hello, world!", sizeof(buf) - 1); len = sendto(sd, buf, sizeof(buf), 0, (struct sockaddr_mctp *)&addr, sizeof(addr)); The network and address fields of ``addr`` define the remote address to send to. If ``smctp_tag`` has the ``MCTP_TAG_OWNER``, the kernel will ignore any bits set in ``MCTP_TAG_VALUE``, and generate a tag value suitable for the destination EID. If ``MCTP_TAG_OWNER`` is not set, the message will be sent with the tag value as specified. If a tag value cannot be allocated, the system call will report an errno of ``EAGAIN``. The application must provide the message type byte as the first byte of the message buffer passed to ``sendto()``. If a message integrity check is to be included in the transmitted message, it must also be provided in the message buffer, and the most-significant bit of the message type byte must be 1. The ``sendmsg()`` system call allows a more compact argument interface, and the message buffer to be specified as a scatter-gather list. At present no ancillary message types (used for the ``msg_control`` data passed to ``sendmsg()``) are defined. Transmitting a message on an unconnected socket with ``MCTP_TAG_OWNER`` specified will cause an allocation of a tag, if no valid tag is already allocated for that destination. The (destination-eid,tag) tuple acts as an implicit local socket address, to allow the socket to receive responses to this outgoing message. If any previous allocation has been performed (to for a different remote EID), that allocation is lost. Sockets will only receive responses to requests they have sent (with TO=1) and may only respond (with TO=0) to requests they have received. ``recvfrom()``, ``recvmsg()``, ``recv()`` : receive an MCTP message ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An MCTP message can be received by an application using one of the ``recvfrom()``, ``recvmsg()``, or ``recv()`` system calls. Using ``recvfrom()`` as the primary example: .. code-block:: C struct sockaddr_mctp addr; socklen_t addrlen; char buf[14]; ssize_t len; addrlen = sizeof(addr); len = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr_mctp *)&addr, &addrlen); /* We can expect addr to describe an MCTP address */ assert(addrlen >= sizeof(buf)); assert(addr.smctp_family == AF_MCTP); printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr); The address argument to ``recvfrom`` and ``recvmsg`` is populated with the remote address of the incoming message, including tag value (this will be needed in order to reply to the message). The first byte of the message buffer will contain the message type byte. If an integrity check follows the message, it will be included in the received buffer. The ``recv()`` system call behaves in a similar way, but does not provide a remote address to the application. Therefore, these are only useful if the remote address is already known, or the message does not require a reply. Like the send calls, sockets will only receive responses to requests they have sent (TO=1) and may only respond (TO=0) to requests they have received. ``ioctl(SIOCMCTPALLOCTAG)`` and ``ioctl(SIOCMCTPDROPTAG)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These tags give applications more control over MCTP message tags, by allocating (and dropping) tag values explicitly, rather than the kernel automatically allocating a per-message tag at ``sendmsg()`` time. In general, you will only need to use these ioctls if your MCTP protocol does not fit the usual request/response model. For example, if you need to persist tags across multiple requests, or a request may generate more than one response. In these cases, the ioctls allow you to decouple the tag allocation (and release) from individual message send and receive operations. Both ioctls are passed a pointer to a ``struct mctp_ioc_tag_ctl``: .. code-block:: C struct mctp_ioc_tag_ctl { mctp_eid_t peer_addr; __u8 tag; __u16 flags; }; ``SIOCMCTPALLOCTAG`` allocates a tag for a specific peer, which an application can use in future ``sendmsg()`` calls. The application populates the ``peer_addr`` member with the remote EID. Other fields must be zero. On return, the ``tag`` member will be populated with the allocated tag value. The allocated tag will have the following tag bits set: - ``MCTP_TAG_OWNER``: it only makes sense to allocate tags if you're the tag owner - ``MCTP_TAG_PREALLOC``: to indicate to ``sendmsg()`` that this is a preallocated tag. - ... and the actual tag value, within the least-significant three bits (``MCTP_TAG_MASK``). Note that zero is a valid tag value. The tag value should be used as-is for the ``smctp_tag`` member of ``struct sockaddr_mctp``. ``SIOCMCTPDROPTAG`` releases a tag that has been previously allocated by a ``SIOCMCTPALLOCTAG`` ioctl. The ``peer_addr`` must be the same as used for the allocation, and the ``tag`` value must match exactly the tag returned from the allocation (including the ``MCTP_TAG_OWNER`` and ``MCTP_TAG_PREALLOC`` bits). The ``flags`` field must be zero. Kernel internals ================ There are a few possible packet flows in the MCTP stack: 1. local TX to remote endpoint, message <= MTU:: sendmsg() -> mctp_local_output() : route lookup -> rt->output() (== mctp_route_output) -> dev_queue_xmit() 2. local TX to remote endpoint, message > MTU:: sendmsg() -> mctp_local_output() -> mctp_do_fragment_route() : creates packet-sized skbs. For each new skb: -> rt->output() (== mctp_route_output) -> dev_queue_xmit() 3. remote TX to local endpoint, single-packet message:: mctp_pkttype_receive() : route lookup -> rt->output() (== mctp_route_input) : sk_key lookup -> sock_queue_rcv_skb() 4. remote TX to local endpoint, multiple-packet message:: mctp_pkttype_receive() : route lookup -> rt->output() (== mctp_route_input) : sk_key lookup : stores skb in struct sk_key->reasm_head mctp_pkttype_receive() : route lookup -> rt->output() (== mctp_route_input) : sk_key lookup : finds existing reassembly in sk_key->reasm_head : appends new fragment -> sock_queue_rcv_skb() Key refcounts ------------- * keys are refed by: - a skb: during route output, stored in ``skb->cb``. - netns and sock lists. * keys can be associated with a device, in which case they hold a reference to the dev (set through ``key->dev``, counted through ``dev->key_count``). Multiple keys can reference the device. |