About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog

Documentation / virtual / kvm / nested-vmx.txt


Based on kernel version 4.16.1. Page generated on 2018-04-09 11:53 EST.

1	Nested VMX
2	==========
3	
4	Overview
5	---------
6	
7	On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
8	to easily and efficiently run guest operating systems. Normally, these guests
9	*cannot* themselves be hypervisors running their own guests, because in VMX,
10	guests cannot use VMX instructions.
11	
12	The "Nested VMX" feature adds this missing capability - of running guest
13	hypervisors (which use VMX) with their own nested guests. It does so by
14	allowing a guest to use VMX instructions, and correctly and efficiently
15	emulating them using the single level of VMX available in the hardware.
16	
17	We describe in much greater detail the theory behind the nested VMX feature,
18	its implementation and its performance characteristics, in the OSDI 2010 paper
19	"The Turtles Project: Design and Implementation of Nested Virtualization",
20	available at:
21	
22		http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
23	
24	
25	Terminology
26	-----------
27	
28	Single-level virtualization has two levels - the host (KVM) and the guests.
29	In nested virtualization, we have three levels: The host (KVM), which we call
30	L0, the guest hypervisor, which we call L1, and its nested guest, which we
31	call L2.
32	
33	
34	Known limitations
35	-----------------
36	
37	The current code supports running Linux guests under KVM guests.
38	Only 64-bit guest hypervisors are supported.
39	
40	Additional patches for running Windows under guest KVM, and Linux under
41	guest VMware server, and support for nested EPT, are currently running in
42	the lab, and will be sent as follow-on patchsets.
43	
44	
45	Running nested VMX
46	------------------
47	
48	The nested VMX feature is disabled by default. It can be enabled by giving
49	the "nested=1" option to the kvm-intel module.
50	
51	No modifications are required to user space (qemu). However, qemu's default
52	emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
53	explicitly enabled, by giving qemu one of the following options:
54	
55	     -cpu host              (emulated CPU has all features of the real CPU)
56	
57	     -cpu qemu64,+vmx       (add just the vmx feature to a named CPU type)
58	
59	
60	ABIs
61	----
62	
63	Nested VMX aims to present a standard and (eventually) fully-functional VMX
64	implementation for the a guest hypervisor to use. As such, the official
65	specification of the ABI that it provides is Intel's VMX specification,
66	namely volume 3B of their "Intel 64 and IA-32 Architectures Software
67	Developer's Manual". Not all of VMX's features are currently fully supported,
68	but the goal is to eventually support them all, starting with the VMX features
69	which are used in practice by popular hypervisors (KVM and others).
70	
71	As a VMX implementation, nested VMX presents a VMCS structure to L1.
72	As mandated by the spec, other than the two fields revision_id and abort,
73	this structure is *opaque* to its user, who is not supposed to know or care
74	about its internal structure. Rather, the structure is accessed through the
75	VMREAD and VMWRITE instructions.
76	Still, for debugging purposes, KVM developers might be interested to know the
77	internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
78	
79	The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we
80	also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS
81	which L0 builds to actually run L2 - how this is done is explained in the
82	aforementioned paper.
83	
84	For convenience, we repeat the content of struct vmcs12 here. If the internals
85	of this structure changes, this can break live migration across KVM versions.
86	VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
87	struct shadow_vmcs is ever changed.
88	
89		typedef u64 natural_width;
90		struct __packed vmcs12 {
91			/* According to the Intel spec, a VMCS region must start with
92			 * these two user-visible fields */
93			u32 revision_id;
94			u32 abort;
95	
96			u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
97			u32 padding[7]; /* room for future expansion */
98	
99			u64 io_bitmap_a;
100			u64 io_bitmap_b;
101			u64 msr_bitmap;
102			u64 vm_exit_msr_store_addr;
103			u64 vm_exit_msr_load_addr;
104			u64 vm_entry_msr_load_addr;
105			u64 tsc_offset;
106			u64 virtual_apic_page_addr;
107			u64 apic_access_addr;
108			u64 ept_pointer;
109			u64 guest_physical_address;
110			u64 vmcs_link_pointer;
111			u64 guest_ia32_debugctl;
112			u64 guest_ia32_pat;
113			u64 guest_ia32_efer;
114			u64 guest_pdptr0;
115			u64 guest_pdptr1;
116			u64 guest_pdptr2;
117			u64 guest_pdptr3;
118			u64 host_ia32_pat;
119			u64 host_ia32_efer;
120			u64 padding64[8]; /* room for future expansion */
121			natural_width cr0_guest_host_mask;
122			natural_width cr4_guest_host_mask;
123			natural_width cr0_read_shadow;
124			natural_width cr4_read_shadow;
125			natural_width cr3_target_value0;
126			natural_width cr3_target_value1;
127			natural_width cr3_target_value2;
128			natural_width cr3_target_value3;
129			natural_width exit_qualification;
130			natural_width guest_linear_address;
131			natural_width guest_cr0;
132			natural_width guest_cr3;
133			natural_width guest_cr4;
134			natural_width guest_es_base;
135			natural_width guest_cs_base;
136			natural_width guest_ss_base;
137			natural_width guest_ds_base;
138			natural_width guest_fs_base;
139			natural_width guest_gs_base;
140			natural_width guest_ldtr_base;
141			natural_width guest_tr_base;
142			natural_width guest_gdtr_base;
143			natural_width guest_idtr_base;
144			natural_width guest_dr7;
145			natural_width guest_rsp;
146			natural_width guest_rip;
147			natural_width guest_rflags;
148			natural_width guest_pending_dbg_exceptions;
149			natural_width guest_sysenter_esp;
150			natural_width guest_sysenter_eip;
151			natural_width host_cr0;
152			natural_width host_cr3;
153			natural_width host_cr4;
154			natural_width host_fs_base;
155			natural_width host_gs_base;
156			natural_width host_tr_base;
157			natural_width host_gdtr_base;
158			natural_width host_idtr_base;
159			natural_width host_ia32_sysenter_esp;
160			natural_width host_ia32_sysenter_eip;
161			natural_width host_rsp;
162			natural_width host_rip;
163			natural_width paddingl[8]; /* room for future expansion */
164			u32 pin_based_vm_exec_control;
165			u32 cpu_based_vm_exec_control;
166			u32 exception_bitmap;
167			u32 page_fault_error_code_mask;
168			u32 page_fault_error_code_match;
169			u32 cr3_target_count;
170			u32 vm_exit_controls;
171			u32 vm_exit_msr_store_count;
172			u32 vm_exit_msr_load_count;
173			u32 vm_entry_controls;
174			u32 vm_entry_msr_load_count;
175			u32 vm_entry_intr_info_field;
176			u32 vm_entry_exception_error_code;
177			u32 vm_entry_instruction_len;
178			u32 tpr_threshold;
179			u32 secondary_vm_exec_control;
180			u32 vm_instruction_error;
181			u32 vm_exit_reason;
182			u32 vm_exit_intr_info;
183			u32 vm_exit_intr_error_code;
184			u32 idt_vectoring_info_field;
185			u32 idt_vectoring_error_code;
186			u32 vm_exit_instruction_len;
187			u32 vmx_instruction_info;
188			u32 guest_es_limit;
189			u32 guest_cs_limit;
190			u32 guest_ss_limit;
191			u32 guest_ds_limit;
192			u32 guest_fs_limit;
193			u32 guest_gs_limit;
194			u32 guest_ldtr_limit;
195			u32 guest_tr_limit;
196			u32 guest_gdtr_limit;
197			u32 guest_idtr_limit;
198			u32 guest_es_ar_bytes;
199			u32 guest_cs_ar_bytes;
200			u32 guest_ss_ar_bytes;
201			u32 guest_ds_ar_bytes;
202			u32 guest_fs_ar_bytes;
203			u32 guest_gs_ar_bytes;
204			u32 guest_ldtr_ar_bytes;
205			u32 guest_tr_ar_bytes;
206			u32 guest_interruptibility_info;
207			u32 guest_activity_state;
208			u32 guest_sysenter_cs;
209			u32 host_ia32_sysenter_cs;
210			u32 padding32[8]; /* room for future expansion */
211			u16 virtual_processor_id;
212			u16 guest_es_selector;
213			u16 guest_cs_selector;
214			u16 guest_ss_selector;
215			u16 guest_ds_selector;
216			u16 guest_fs_selector;
217			u16 guest_gs_selector;
218			u16 guest_ldtr_selector;
219			u16 guest_tr_selector;
220			u16 host_es_selector;
221			u16 host_cs_selector;
222			u16 host_ss_selector;
223			u16 host_ds_selector;
224			u16 host_fs_selector;
225			u16 host_gs_selector;
226			u16 host_tr_selector;
227		};
228	
229	
230	Authors
231	-------
232	
233	These patches were written by:
234	     Abel Gordon, abelg <at> il.ibm.com
235	     Nadav Har'El, nyh <at> il.ibm.com
236	     Orit Wasserman, oritw <at> il.ibm.com
237	     Ben-Ami Yassor, benami <at> il.ibm.com
238	     Muli Ben-Yehuda, muli <at> il.ibm.com
239	
240	With contributions by:
241	     Anthony Liguori, aliguori <at> us.ibm.com
242	     Mike Day, mdday <at> us.ibm.com
243	     Michael Factor, factor <at> il.ibm.com
244	     Zvi Dubitzky, dubi <at> il.ibm.com
245	
246	And valuable reviews by:
247	     Avi Kivity, avi <at> redhat.com
248	     Gleb Natapov, gleb <at> redhat.com
249	     Marcelo Tosatti, mtosatti <at> redhat.com
250	     Kevin Tian, kevin.tian <at> intel.com
251	     and others.
Hide Line Numbers


About Kernel Documentation Linux Kernel Contact Linux Resources Linux Blog