About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / ntb.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	===========
2	NTB Drivers
3	===========
4	
5	NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
6	the separate memory systems of two or more computers to the same PCI-Express
7	fabric. Existing NTB hardware supports a common feature set: doorbell
8	registers and memory translation windows, as well as non common features like
9	scratchpad and message registers. Scratchpad registers are read-and-writable
10	registers that are accessible from either side of the device, so that peers can
11	exchange a small amount of information at a fixed address. Message registers can
12	be utilized for the same purpose. Additionally they are provided with with
13	special status bits to make sure the information isn't rewritten by another
14	peer. Doorbell registers provide a way for peers to send interrupt events.
15	Memory windows allow translated read and write access to the peer memory.
16	
17	NTB Core Driver (ntb)
18	=====================
19	
20	The NTB core driver defines an api wrapping the common feature set, and allows
21	clients interested in NTB features to discover NTB the devices supported by
22	hardware drivers.  The term "client" is used here to mean an upper layer
23	component making use of the NTB api.  The term "driver," or "hardware driver,"
24	is used here to mean a driver for a specific vendor and model of NTB hardware.
25	
26	NTB Client Drivers
27	==================
28	
29	NTB client drivers should register with the NTB core driver.  After
30	registering, the client probe and remove functions will be called appropriately
31	as ntb hardware, or hardware drivers, are inserted and removed.  The
32	registration uses the Linux Device framework, so it should feel familiar to
33	anyone who has written a pci driver.
34	
35	NTB Typical client driver implementation
36	----------------------------------------
37	
38	Primary purpose of NTB is to share some peace of memory between at least two
39	systems. So the NTB device features like Scratchpad/Message registers are
40	mainly used to perform the proper memory window initialization. Typically
41	there are two types of memory window interfaces supported by the NTB API:
42	inbound translation configured on the local ntb port and outbound translation
43	configured by the peer, on the peer ntb port. The first type is
44	depicted on the next figure
45	
46	Inbound translation:
47	 Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
48	  ____________
49	 | dma-mapped |-ntb_mw_set_trans(addr)  |
50	 | memory     |        _v____________   |   ______________
51	 | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
52	 |------------|       |--------------|  |  |--------------|
53	
54	So typical scenario of the first type memory window initialization looks:
55	1) allocate a memory region, 2) put translated address to NTB config,
56	3) somehow notify a peer device of performed initialization, 4) peer device
57	maps corresponding outbound memory window so to have access to the shared
58	memory region.
59	
60	The second type of interface, that implies the shared windows being
61	initialized by a peer device, is depicted on the figure:
62	
63	Outbound translation:
64	 Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
65	  ____________                      ______________
66	 | dma-mapped |                |   | MW base addr |<== memory-mapped IO
67	 | memory     |                |   |--------------|
68	 | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
69	 |------------|                |   |--------------|
70	
71	Typical scenario of the second type interface initialization would be:
72	1) allocate a memory region, 2) somehow deliver a translated address to a peer
73	device, 3) peer puts the translated address to NTB config, 4) peer device maps
74	outbound memory window so to have access to the shared memory region.
75	
76	As one can see the described scenarios can be combined in one portable
77	algorithm.
78	 Local device:
79	  1) Allocate memory for a shared window
80	  2) Initialize memory window by translated address of the allocated region
81	     (it may fail if local memory window initialization is unsupported)
82	  3) Send the translated address and memory window index to a peer device
83	 Peer device:
84	  1) Initialize memory window with retrieved address of the allocated
85	     by another device memory region (it may fail if peer memory window
86	     initialization is unsupported)
87	  2) Map outbound memory window
88	
89	In accordance with this scenario, the NTB Memory Window API can be used as
90	follows:
91	 Local device:
92	  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
93	     be allocated for memory windows between local device and peer device
94	     of port with specified index.
95	  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
96	     shared memory region alignment and size. Then memory can be properly
97	     allocated.
98	  3) Allocate physically contiguous memory region in compliance with
99	     restrictions retrieved in 2).
100	  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
101	     the memory window with specified index for the defined peer device
102	     (it may fail if local translated address setting is not supported)
103	  5) Send translated base address (usually together with memory window
104	     number) to the peer device using, for instance, scratchpad or message
105	     registers.
106	 Peer device:
107	  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
108	     device (related to pidx) translated address for specified memory
109	     window. It may fail if retrieved address, for instance, exceeds
110	     maximum possible address or isn't properly aligned.
111	  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
112	     window so to have an access to the shared memory.
113	
114	Also it is worth to note, that method ntb_mw_count(pidx) should return the
115	same value as ntb_peer_mw_count() on the peer with port index - pidx.
116	
117	NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
118	------------------------------------------------------------------
119	
120	The primary client for NTB is the Transport client, used in tandem with NTB
121	Netdev.  These drivers function together to create a logical link to the peer,
122	across the ntb, to exchange packets of network data.  The Transport client
123	establishes a logical link to the peer, and creates queue pairs to exchange
124	messages and data.  The NTB Netdev then creates an ethernet device using a
125	Transport queue pair.  Network data is copied between socket buffers and the
126	Transport queue pair buffer.  The Transport client may be used for other things
127	besides Netdev, however no other applications have yet been written.
128	
129	NTB Ping Pong Test Client (ntb\_pingpong)
130	-----------------------------------------
131	
132	The Ping Pong test client serves as a demonstration to exercise the doorbell
133	and scratchpad registers of NTB hardware, and as an example simple NTB client.
134	Ping Pong enables the link when started, waits for the NTB link to come up, and
135	then proceeds to read and write the doorbell scratchpad registers of the NTB.
136	The peers interrupt each other using a bit mask of doorbell bits, which is
137	shifted by one in each round, to test the behavior of multiple doorbell bits
138	and interrupt vectors.  The Ping Pong driver also reads the first local
139	scratchpad, and writes the value plus one to the first peer scratchpad, each
140	round before writing the peer doorbell register.
141	
142	Module Parameters:
143	
144	* unsafe - Some hardware has known issues with scratchpad and doorbell
145		registers.  By default, Ping Pong will not attempt to exercise such
146		hardware.  You may override this behavior at your own risk by setting
147		unsafe=1.
148	* delay\_ms - Specify the delay between receiving a doorbell
149		interrupt event and setting the peer doorbell register for the next
150		round.
151	* init\_db - Specify the doorbell bits to start new series of rounds.  A new
152		series begins once all the doorbell bits have been shifted out of
153		range.
154	* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
155		then to observe debugging output on the console.
156	
157	NTB Tool Test Client (ntb\_tool)
158	--------------------------------
159	
160	The Tool test client serves for debugging, primarily, ntb hardware and drivers.
161	The Tool provides access through debugfs for reading, setting, and clearing the
162	NTB doorbell, and reading and writing scratchpads.
163	
164	The Tool does not currently have any module parameters.
165	
166	Debugfs Files:
167	
168	* *debugfs*/ntb\_tool/*hw*/
169		A directory in debugfs will be created for each
170		NTB device probed by the tool.  This directory is shortened to *hw*
171		below.
172	* *hw*/db
173		This file is used to read, set, and clear the local doorbell.  Not
174		all operations may be supported by all hardware.  To read the doorbell,
175		read the file.  To set the doorbell, write `s` followed by the bits to
176		set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
177		followed by the bits to clear.
178	* *hw*/mask
179		This file is used to read, set, and clear the local doorbell mask.
180		See *db* for details.
181	* *hw*/peer\_db
182		This file is used to read, set, and clear the peer doorbell.
183		See *db* for details.
184	* *hw*/peer\_mask
185		This file is used to read, set, and clear the peer doorbell
186		mask.  See *db* for details.
187	* *hw*/spad
188		This file is used to read and write local scratchpads.  To read
189		the values of all scratchpads, read the file.  To write values, write a
190		series of pairs of scratchpad number and value
191		(eg: `echo '4 0x123 7 0xabc' > spad`
192		# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
193	* *hw*/peer\_spad
194		This file is used to read and write peer scratchpads.  See
195		*spad* for details.
196	
197	NTB Hardware Drivers
198	====================
199	
200	NTB hardware drivers should register devices with the NTB core driver.  After
201	registering, clients probe and remove functions will be called.
202	
203	NTB Intel Hardware Driver (ntb\_hw\_intel)
204	------------------------------------------
205	
206	The Intel hardware driver supports NTB on Xeon and Atom CPUs.
207	
208	Module Parameters:
209	
210	* b2b\_mw\_idx
211		If the peer ntb is to be accessed via a memory window, then use
212		this memory window to access the peer ntb.  A value of zero or positive
213		starts from the first mw idx, and a negative value starts from the last
214		mw idx.  Both sides MUST set the same value here!  The default value is
215		`-1`.
216	* b2b\_mw\_share
217		If the peer ntb is to be accessed via a memory window, and if
218		the memory window is large enough, still allow the client to use the
219		second half of the memory window for address translation to the peer.
220	* xeon\_b2b\_usd\_bar2\_addr64
221		If using B2B topology on Xeon hardware, use
222		this 64 bit address on the bus between the NTB devices for the window
223		at BAR2, on the upstream side of the link.
224	* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
225	* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
226	* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
227	* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
228	* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
229	* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
230	* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog