About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / virtual / kvm / vcpu-requests.rst


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	=================
2	KVM VCPU Requests
3	=================
4	
5	Overview
6	========
7	
8	KVM supports an internal API enabling threads to request a VCPU thread to
9	perform some activity.  For example, a thread may request a VCPU to flush
10	its TLB with a VCPU request.  The API consists of the following functions::
11	
12	  /* Check if any requests are pending for VCPU @vcpu. */
13	  bool kvm_request_pending(struct kvm_vcpu *vcpu);
14	
15	  /* Check if VCPU @vcpu has request @req pending. */
16	  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
17	
18	  /* Clear request @req for VCPU @vcpu. */
19	  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
20	
21	  /*
22	   * Check if VCPU @vcpu has request @req pending. When the request is
23	   * pending it will be cleared and a memory barrier, which pairs with
24	   * another in kvm_make_request(), will be issued.
25	   */
26	  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
27	
28	  /*
29	   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
30	   * with another in kvm_check_request(), prior to setting the request.
31	   */
32	  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
33	
34	  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
35	  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
36	
37	Typically a requester wants the VCPU to perform the activity as soon
38	as possible after making the request.  This means most requests
39	(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
40	and kvm_make_all_cpus_request() has the kicking of all VCPUs built
41	into it.
42	
43	VCPU Kicks
44	----------
45	
46	The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
47	order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
48	a guest mode exit.  However, a VCPU thread may not be in guest mode at the
49	time of the kick.  Therefore, depending on the mode and state of the VCPU
50	thread, there are two other actions a kick may take.  All three actions
51	are listed below:
52	
53	1) Send an IPI.  This forces a guest mode exit.
54	2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
55	   mode that wait on waitqueues.  Waking them removes the threads from
56	   the waitqueues, allowing the threads to run again.  This behavior
57	   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
58	3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
59	   sleeping, then there is nothing to do.
60	
61	VCPU Mode
62	---------
63	
64	VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
65	guest is running in guest mode or not, as well as some specific
66	outside guest mode states.  The architecture may use ``vcpu->mode`` to
67	ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
68	as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
69	even to ensure IPI acknowledgements are waited upon (see "Waiting for
70	Acknowledgements").  The following modes are defined:
71	
72	OUTSIDE_GUEST_MODE
73	
74	  The VCPU thread is outside guest mode.
75	
76	IN_GUEST_MODE
77	
78	  The VCPU thread is in guest mode.
79	
80	EXITING_GUEST_MODE
81	
82	  The VCPU thread is transitioning from IN_GUEST_MODE to
83	  OUTSIDE_GUEST_MODE.
84	
85	READING_SHADOW_PAGE_TABLES
86	
87	  The VCPU thread is outside guest mode, but it wants the sender of
88	  certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
89	  thread is done reading the page tables.
90	
91	VCPU Request Internals
92	======================
93	
94	VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
95	This means general bitops, like those documented in [atomic-ops]_ could
96	also be used, e.g. ::
97	
98	  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
99	
100	However, VCPU request users should refrain from doing so, as it would
101	break the abstraction.  The first 8 bits are reserved for architecture
102	independent requests, all additional bits are available for architecture
103	dependent requests.
104	
105	Architecture Independent Requests
106	---------------------------------
107	
108	KVM_REQ_TLB_FLUSH
109	
110	  KVM's common MMU notifier may need to flush all of a guest's TLB
111	  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
112	  choose to use the common kvm_flush_remote_tlbs() implementation will
113	  need to handle this VCPU request.
114	
115	KVM_REQ_MMU_RELOAD
116	
117	  When shadow page tables are used and memory slots are removed it's
118	  necessary to inform each VCPU to completely refresh the tables.  This
119	  request is used for that.
120	
121	KVM_REQ_PENDING_TIMER
122	
123	  This request may be made from a timer handler run on the host on behalf
124	  of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
125	
126	KVM_REQ_UNHALT
127	
128	  This request may be made from the KVM common function kvm_vcpu_block(),
129	  which is used to emulate an instruction that causes a CPU to halt until
130	  one of an architectural specific set of events and/or interrupts is
131	  received (determined by checking kvm_arch_vcpu_runnable()).  When that
132	  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
133	  in contrast to when kvm_vcpu_block() returns due to any other reason,
134	  such as a pending signal, which does not indicate the VCPU's halt
135	  emulation should stop, and therefore does not make the request.
136	
137	KVM_REQUEST_MASK
138	----------------
139	
140	VCPU requests should be masked by KVM_REQUEST_MASK before using them with
141	bitops.  This is because only the lower 8 bits are used to represent the
142	request's number.  The upper bits are used as flags.  Currently only two
143	flags are defined.
144	
145	VCPU Request Flags
146	------------------
147	
148	KVM_REQUEST_NO_WAKEUP
149	
150	  This flag is applied to requests that only need immediate attention
151	  from VCPUs running in guest mode.  That is, sleeping VCPUs do not need
152	  to be awaken for these requests.  Sleeping VCPUs will handle the
153	  requests when they are awaken later for some other reason.
154	
155	KVM_REQUEST_WAIT
156	
157	  When requests with this flag are made with kvm_make_all_cpus_request(),
158	  then the caller will wait for each VCPU to acknowledge its IPI before
159	  proceeding.  This flag only applies to VCPUs that would receive IPIs.
160	  If, for example, the VCPU is sleeping, so no IPI is necessary, then
161	  the requesting thread does not wait.  This means that this flag may be
162	  safely combined with KVM_REQUEST_NO_WAKEUP.  See "Waiting for
163	  Acknowledgements" for more information about requests with
164	  KVM_REQUEST_WAIT.
165	
166	VCPU Requests with Associated State
167	===================================
168	
169	Requesters that want the receiving VCPU to handle new state need to ensure
170	the newly written state is observable to the receiving VCPU thread's CPU
171	by the time it observes the request.  This means a write memory barrier
172	must be inserted after writing the new state and before setting the VCPU
173	request bit.  Additionally, on the receiving VCPU thread's side, a
174	corresponding read barrier must be inserted after reading the request bit
175	and before proceeding to read the new state associated with it.  See
176	scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
177	[memory-barriers]_.
178	
179	The pair of functions, kvm_check_request() and kvm_make_request(), provide
180	the memory barriers, allowing this requirement to be handled internally by
181	the API.
182	
183	Ensuring Requests Are Seen
184	==========================
185	
186	When making requests to VCPUs, we want to avoid the receiving VCPU
187	executing in guest mode for an arbitrary long time without handling the
188	request.  We can be sure this won't happen as long as we ensure the VCPU
189	thread checks kvm_request_pending() before entering guest mode and that a
190	kick will send an IPI to force an exit from guest mode when necessary.
191	Extra care must be taken to cover the period after the VCPU thread's last
192	kvm_request_pending() check and before it has entered guest mode, as kick
193	IPIs will only trigger guest mode exits for VCPU threads that are in guest
194	mode or at least have already disabled interrupts in order to prepare to
195	enter guest mode.  This means that an optimized implementation (see "IPI
196	Reduction") must be certain when it's safe to not send the IPI.  One
197	solution, which all architectures except s390 apply, is to:
198	
199	- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
200	  the last kvm_request_pending() check;
201	- enable interrupts atomically when entering the guest.
202	
203	This solution also requires memory barriers to be placed carefully in both
204	the requesting thread and the receiving VCPU.  With the memory barriers we
205	can exclude the possibility of a VCPU thread observing
206	!kvm_request_pending() on its last check and then not receiving an IPI for
207	the next request made of it, even if the request is made immediately after
208	the check.  This is done by way of the Dekker memory barrier pattern
209	(scenario 10 of [lwn-mb]_).  As the Dekker pattern requires two variables,
210	this solution pairs ``vcpu->mode`` with ``vcpu->requests``.  Substituting
211	them into the pattern gives::
212	
213	  CPU1                                    CPU2
214	  =================                       =================
215	  local_irq_disable();
216	  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
217	  smp_mb();                               smp_mb();
218	  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
219	                                              IN_GUEST_MODE) {
220	      ...abort guest entry...                 ...send IPI...
221	  }                                       }
222	
223	As stated above, the IPI is only useful for VCPU threads in guest mode or
224	that have already disabled interrupts.  This is why this specific case of
225	the Dekker pattern has been extended to disable interrupts before setting
226	``vcpu->mode`` to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
227	pedantically implement the memory barrier pattern, guaranteeing the
228	compiler doesn't interfere with ``vcpu->mode``'s carefully planned
229	accesses.
230	
231	IPI Reduction
232	-------------
233	
234	As only one IPI is needed to get a VCPU to check for any/all requests,
235	then they may be coalesced.  This is easily done by having the first IPI
236	sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
237	transitional state, EXITING_GUEST_MODE, is used for this purpose.
238	
239	Waiting for Acknowledgements
240	----------------------------
241	
242	Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
243	be sent, and the acknowledgements to be waited upon, even when the target
244	VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
245	is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
246	is set after disabling interrupts.  To support these cases, the
247	KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
248	checking that the VCPU is IN_GUEST_MODE to checking that it is not
249	OUTSIDE_GUEST_MODE.
250	
251	Request-less VCPU Kicks
252	-----------------------
253	
254	As the determination of whether or not to send an IPI depends on the
255	two-variable Dekker memory barrier pattern, then it's clear that
256	request-less VCPU kicks are almost never correct.  Without the assurance
257	that a non-IPI generating kick will still result in an action by the
258	receiving VCPU, as the final kvm_request_pending() check does for
259	request-accompanying kicks, then the kick may not do anything useful at
260	all.  If, for instance, a request-less kick was made to a VCPU that was
261	just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
262	the VCPU thread may continue its entry without actually having done
263	whatever it was the kick was meant to initiate.
264	
265	One exception is x86's posted interrupt mechanism.  In this case, however,
266	even the request-less VCPU kick is coupled with the same
267	local_irq_disable() + smp_mb() pattern described above; the ON bit
268	(Outstanding Notification) in the posted interrupt descriptor takes the
269	role of ``vcpu->requests``.  When sending a posted interrupt, PIR.ON is
270	set before reading ``vcpu->mode``; dually, in the VCPU thread,
271	vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
272	IN_GUEST_MODE.
273	
274	Additional Considerations
275	=========================
276	
277	Sleeping VCPUs
278	--------------
279	
280	VCPU threads may need to consider requests before and/or after calling
281	functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
282	do or not, and, if they do, which requests need consideration, is
283	architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
284	to check if it should awaken.  One reason to do so is to provide
285	architectures a function where requests may be checked if necessary.
286	
287	Clearing Requests
288	-----------------
289	
290	Generally it only makes sense for the receiving VCPU thread to clear a
291	request.  However, in some circumstances, such as when the requesting
292	thread and the receiving VCPU thread are executed serially, such as when
293	they are the same thread, or when they are using some form of concurrency
294	control to temporarily execute synchronously, then it's possible to know
295	that the request may be cleared immediately, rather than waiting for the
296	receiving VCPU thread to handle the request in VCPU RUN.  The only current
297	examples of this are kvm_vcpu_block() calls made by VCPUs to block
298	themselves.  A possible side-effect of that call is to make the
299	KVM_REQ_UNHALT request, which may then be cleared immediately when the
300	VCPU returns from the call.
301	
302	References
303	==========
304	
305	.. [atomic-ops] Documentation/core-api/atomic_ops.rst
306	.. [memory-barriers] Documentation/memory-barriers.txt
307	.. [lwn-mb] https://lwn.net/Articles/573436/
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog