libvirt support for Xen’s new libxenlight toolstack

I had the pleasure of meeting Russell Pavlicek, who shares Xen community management responsibilities with Lars Kurth, at SUSECon last November and he, along with Dario Faggioli of the Xen community, poked me about writing a blog post describing the state of Xen support in libvirt.  I found this to be an excellent idea and good reason to get off my butt and write!  Far too much time passes between my musings.

Xen has had a long history in libvirt.  In fact, it was the first hypervisor supported by libvirt.  I’ve witnessed an incredible evolution of libvirt over the years and now not only does it support managing many hypervisors such as Xen, KVM/QEMU, LXC, VirtualBox, hyper-v, ESX, etc., but it also supports managing a wide range of host subsystems used in a virtualized environment such as storage pools and volumes, networks, network interfaces, etc.  It has really become the swiss army knife of virtualization management on Linux, and Xen has been along for the entire ride.

libvirt supports multiple hypervisors via a hypervisor driver interface, which is defined in $libvirt_root/src/drvier.h – see struct _virDriver.  libvirt’s virDomain* APIs map to functions in the hypervisor driver interface, which are implemented by the various hypervisor drivers.  The drivers are located under $libvirt_root/src/<hypervisor-name>.  Typically, each driver has a $libvirt_root/src/<hypervisor-name>/<hypervisor-name>_driver.c file which defines a static instance of virDriver and fills in the functions it implements.  As an example, see the definition of libxlDriver in $libvirt_root/src/libxl/libxl_driver.c, the firsh few lines of which are

static virDriver libxlDriver = {
.name = “xenlight”,
.connectOpen = libxlConnectOpen, /* 0.9.0 */
.connectClose = libxlConnectClose, /* 0.9.0 */
.connectGetType = libxlConnectGetType, /* 0.9.0 */

The original Xen hypervisor driver is implemented using a variety of Xen tools: xend, xm, xenstore, and the hypervisor domctrl and sysctrl interfaces.  All of these “sub-drivers” are controlled by an “uber driver” known simply as the “xen driver”, which resides in $libvirt_root/src/xen/.  When an API in the hypervisor driver is called on a Xen system, e.g. virDomainCreateXML, it makes its way to the xen driver, which funnels the request to the most appropriate sub-driver.  In most cases, this is the xend sub-driver, although the other sub-drivers are used for some APIs.  And IIRC, there are a few APIs for which the xen driver will iterate over the sub-drivers until the function succeeds.  I like to refer to this xen driver, and its collection of sub-drivers, as the “legacy Xen driver”.  Due to its heavy reliance on xend, and xend’s deprecation in the Xen community, the legacy driver became just that – legacy.  With the introduction of libxenlight (aka libxl), libvirt needed a new driver for Xen.

In 2011 I had a bit of free time to work on a hypervisor driver for libxl, committing the initial driver in 2b84e445.  As mentioned above, this driver resides in $libvirt_root/src/libxl/.  Subsequent work by SUSE, Univention, Redhat, Citrix, Ubuntu, and other community contributors has resulted in a quite functional libvirt driver for the libxl toolstack.

The libxl driver only supports Xen >= 4.2.  The legacy Xen driver should be used on earlier versions of Xen, or installations where the xend toolstack is used.  In fact, if xend is running, the libxl driver won’t even load.  So if you want to use the libxl driver but have xend running, xend must be shutdown followed by a restart of libvirtd to load the libxl driver.  Note that if xend is not running, the legacy Xen driver will not load.

Currently, there are a few differences between the libxl driver and the legacy Xen driver.  First, the libxl driver is clueless about domains created by other libxl applications such as xl.  ‘virsh list’ will not show domains created with ‘xl create …’.  This is not the case with the legacy Xen driver, which is just a broker to xend.  Any domains managed by xend are also manageable with the legacy Xen driver.  Users of the legacy Xen driver in libvirt are probably well aware that ‘virsh list’ will show domains defined with ‘xm new …’ or created with ‘xm create …’, and might be a bit surprised to find this in not the case with the libxl driver.  But this could be addressed by implementing functionality similar to the ‘qemu-attach’ capability supported by the QEMU driver, which allows “importing” a QEMU instance created directly with e.g. ‘qemu -m 1024 -smp …’.  Contributions are warmly welcomed if this functionality is important to you :-).

A second difference between the libxl and legacy Xen drivers is related to the first one.  xend is the stateful service in the legacy stack, maintaining state of defined and running domains.  As a result, the legacy libvirt Xen driver is stateless, generally forwarding requests to xend and allowing xend to maintain state.  In the new stack, however, libxl is stateless.  Thererfore, the libvirt libxl driver itself must now maintain the state of all domains.  An interesting side affect of this is losing all your domains when upgrading from libvirt+xend to libvirt+libxl.  For a smooth upgrade, all running domains should be shutdown and their libvirt domXML configuration exported for post-upgrade import into the libvirt libxl driver.  For example, in psuedo-code

for each domain
virsh shutdown domain
virsh dumpxml > domain-name.xml
perform xend -> libxl upgrade
restart libvirtd
for each domain
virsh define domain-name.xml

It may also be possible to import xend managed domains after upgrading to libxl.  On most installations, the configuration of xend managed domains is stored in /var/lib/xend/domains/<dom-uuid>/config.sxp.  Since the legacy Xen driver already supports parsing SXP, this code could be used read any existing xend managed domains and import those into libvirt.  I will need to investigate the feasibility of this approach, and report any findings in a future blog post.

The last (known) difference between the drivers is the handling of domain0.  The legacy xen driver handles domain0 as any other domain.  The libxl driver currently treats domain0 as part of the host, thus e.g. it is not shown in ‘virsh list’.  This behavior is similar to the QEMU driver, but is not necessarily correct.  Afterall, domain0 is just another domain in Xen, which can have devices attached and detached, memory ballooned, etc., and should probably be handled as such by the libvirt libxl driver.  Contributions welcomed!

Otherwise, the libxl driver should behave the same as the legacy Xen driver, making xend to libxl upgrades quite painless, outside of the statefullness issue discussed above. Any other differences between the legacy Xen driver and the libxl driver are bugs – or missing features.  Afterall, the goal of libvirt is to insulate users from underlying churn in hypervisor-specific tools.

At the time of this writing, the important missing features in the libxl driver relative to the legacy Xen driver are PCI passthrough and migration.  Chunyan Liu has provided patches for both of these features, the first of which is close to committing upstream IMO

The libxl driver is also in need of improved parallelization.  Currently, long running operations such as create, save, restore, core dump, etc. lock the driver, blocking other operations, even those that simply get state.  I have some initial patches that introduce job support in the libxl driver, similar to the QEMU driver.  These patches allow classifying driver operations into jobs that modify state, and thus block any other operations on the domain, and jobs that can run concurrently.  Bamvor Jian Zhang is working on a patch series to make use of libxl’s asynchronous variants of these long running operations.  Together, these patch sets will greatly improve parallelism in the libxl driver, which is certainly important in for example cloud environments where many virtual machine instances can be started in parallel.

Beyond these sorely needed features and improvements, there is quite a bit of work required to reach feature parity with the QEMU driver, where it makes sense.  The hypervisor driver interface currently supports 193 functions, 186 of which are implemented in the QEMU driver.  By contrast, only 86 functions are implemented in the the libxl driver.  To be fair, quite a few of the unimplemented functions don’t apply to Xen and will never be implemented.  Nonetheless, for any enthusiastic volunteers, there is quite a bit of work to be done in the libvirt libxl driver.

Although I thoroughly enjoy working on libvirt and have healthy respect for the upstream community, my available time to work on upstream libvirt is limited.  Currently, I’m the primary maintainer of the Xen drivers, so my limited availability is a bottleneck.  Other libvirt maintainers review and commit Xen stuff, but their primary focus is on the rapid development of other hypervisor drivers and host subsystems.  I’m always looking for help in not only implementation of new features, but also reviewing and testing patches from other contributors.  If you are part of the greater Xen ecosystem, consider lending a hand with improving Xen support in libvirt!


libvirt sanlock integration in openSUSE Factory

A few weeks back I found some time to package sanlock for openSUSE Factory, which subsequently allowed enabling the libvirt sanlock driver.  And how might this be useful?  When running qemu/kvm virtual machines on a pool of hosts that are not cluster-aware, it may be possible to start a virtual machine on more than one host, potentially corrupting the guest filesystem.  To prevent such an unpleasant scenario, libvirt+sanlock can be used to protect the virtual machine’s disk images, ensuring we never have two qemu/kvm processes writing to an image concurrently.  libvirt+sanlock provides protection against starting the same virtual machine on different hosts, or adding the same disk to different virtual machines.

In this blog post I’ll describe how to install and configure sanlock and the libvirt sanlock plugin.  I’ll briefly cover lockspace and resource creation, and show some examples of specifying disk leases in libvirt, but users should become familiar with the wdmd (watchdog multiplexing daemon) and sanlock man pages, as well as the lease element specification in libvirt domainXML.  I’ve used SLES11 SP2 hosts and guests for this example, but have also tested a similar configuration on openSUSE 12.1.

The sanlock and sanlock-enabled libvirt packages can be retrieved from a Factory repository or a repository from the OBS Virtualization project.  (As a side note, for those that didn’t know, Virtualization is the development project for virtualization-related packages in Factory.  Packages are built, tested, and staged in this project before submitting to Factory.)

After configuring the appropriate repository for the target host, update libvirt and install sanlock and libvirt-lock-sanlock.
# zypper up libvirt libvirt-client libvirt-python
# zypper in sanlock libsanlock1 libvirt-lock-sanlock

Enable watchdog daemon and sanlock daemons.
# insserv wdmd
# insserv sanlock

Specify the sanlock lock manager in /etc/libvirt/qemu.conf.
lock_manager = “sanlock”

The suggested libvirt sanlock configuration uses NFS for shared lock space storage.  Mount a share at the default mount point.
# mount -t nfs nfs-server:/export/path /var/lib/libvirt/sanlock

These installation steps need to be performed on each host participating in the sanlock-protected environment.

libvirt provides two modes for configuring sanlock.  The default mode requires a user or management application to manually define the sanlock lockspace and resource leases, and then describe those leases with a lease element in the virtual machine XML configuration.  libvirt also supports an auto disk lease mode, where libvirt will automatically create a lockspace and lease for each fully qualified disk path in the virtual machine XML configuration.  The latter mode removes the administrator burden of configuring lockspaces and leases, but only works if the administrator can ensure stable and unique disk paths across all participating hosts.  I’ll describe both modes here, starting with the manual configuration.

Manual Configuration:
First we need to reserve and initialize host_id leases.  Each host that wants to participate in the sanlock-enabled environment must first acquire a lease on its host_id number within the lockspace.  The lockspace requirements for 2000 leases (2000 possible host_id’s) is 1MB (8MB for 4k sectors).  On one host, create a 1M lockspace file in the default lease directory (/var/lib/libvirt/sanlock/).
# truncate -s 1M /var/lib/libvirt/sanlock/TEST_LS

And then initialize the lockspace for storing host_id leases.
# sanlock direct init -s TEST_LS:0:/var/lib/libvirt/sanlock/TEST_LS:0

On each participating host, start the watchdog and sanlock daemons and restart libvirtd.
# rcwdmd start; rcsanlock start; rclibvirtd restart

On each participating host, we’ll need to tell the sanlock daemon to acquire its host_id in the lockspace, which will subsequently allow resources to be acquired in the lockspace.
# sanlock client add_lockspace -s TEST_LS:1:var/lib/libvirt/sanlock/TEST_LS:0
# sanlock client add_lockspace -s TEST_LS:2:var/lib/libvirt/sanlock/TEST_LS:0
# sanlock client add_lockspace -s TEST_LS:<hostidN>:var/lib/libvirt/sanlock/TEST_LS:0

To see the state of host_id leases read during the last renewal
# sanlock client host_status -s TEST_LS
1 timestamp 50766
2 timestamp 327323

Now that we have the hosts configured, time to move on to configuring a virtual machine resource lease and defining it in the virtual machine XML configuration.  First we need to reserve and initialize a resource lease for the virtual machine disk image.
# truncate -s 1M /var/lib/libvirt/sanlock/sles11sp2-disk-resource-lock
# sanlock direct init -r TEST_LS:sles11sp2-disk-resource-lock:/var/lib/libvirt/sanlock/sles11sp2-disk-resource-lock:0

Then add the lease information to the virtual machine XML configuration
# virsh edit sles11sp2

<target path=’/var/lib/libvirt/sanlock/sles11sp2-disk-resource-lock’/>

Finally, start the virtual machine!
# virsh start sles11sp2
Domain sles11sp2 started

Trying to start same virtual machine on different host will fail since the resource lock is already leased to another host
other-host:~ # virsh start sles11sp2
error: Failed to start domain sles11sp2
error: internal error Failed to acquire lock: error -243

Automatic disk lease configuration:
As can be seen even with the trivial example above, manual disk lease configuration puts quite a burden on the user, particularly in an adhoc environment with only a few hosts and no central management service to coordinate all of the lockspace and resource configuration.  To ease this burden, Daniel Berrange adding support in libvirt for automatically creating sanlock disk leases.  Once the environment is configured for automatic disk leases, libvirt will handle the details of creating lockspace and resource leases.

On each participating host, edit /etc/libvirt/qemu-sanlock.conf, setting auto_disk_leases to 1 and assigning a unique host_id.
auto_disk_leases = 1
host_id = 1

Then restart libvirtd
# rclibvirtd restart

Now libvirtd+sanlock is configured to automatically acquire a resource lease for each virtual machine disk.  No lease configuration is required in the virtual machine XML configuration.  We can simply start the virtual machine and libvirt will handle all the details for us.

host1 # virsh start sles11sp2
Domain sles11sp2 started

libvirt creates a host lease lockspace named __LIBVIRT__DISKS__.  Disk resource leases are named using the MD5 checksum of the fully qualified disk path.  After staring the above virtual machine, the lease directory contained
host1 # ls -l /var/lib/libvirt/sanlock/
total 2064
-rw——-  1 root root 1048576 Mar 13 01:35 3ab0d33a35403d03e3ad10b485c7b593
-rw——-  1 root root 1048576 Mar 13 01:35 __LIBVIRT__DISKS__

Finally, try to start the virtual machine on another participating host
host2 # virsh start sles11sp2
error: Failed to start domain sles11sp2
error: internal error Failed to acquire lock: error -243

Feel free to try the sanlock and sanlock-enabled libvirt packages from openSUSE Factory or our OBS Virtualization project. One thing to keep in mind is that the sanlock daemon protects resources for some process, in this case qemu/kvm.  If the sanlock daemon is terminated, it can no longer protect those resources and kills the processes for which it holds leases.  In other words, restarting the sanlock daemon will terminate your virtual machines!  If the sanlock daemon is SIGKILL’ed, then the watchdog daemon intervenes by resetting the entire host.  With this in mind, it would be wise to consider an appropriate disk cache mode such as ‘none’ or ‘writethrough’ to improve the integrity of your disk images in the event of a mass virtual machine kill off.

Removal of 32-bit Xen from openSUSE

As announced in July 2011 , the openSUSE Xen maintainers intended to discontinue support for 32-bit Xen host in openSUSE12.1. Now that 12.1 has been released, we are hearing complaints from users virtualizing on older P4-based systems. I understand their frustration, but given that the upstream Xen community has ignored the 32-bit host, and no other distros are supporting it, we can no longer justify the effort required to support it. Supported 32-bit Xen packages are going by way of the dodo, and dropping them in openSUSE may very well mean extinction.

That said, users still have a few options. First, we have quite stable 32 and 64-bit Xen packages in openSUSE11.4. The Xen version is 4.0.3, which has all the latest upstream fixes and improvements for the 4.0 branch. In fact, the package sources are shared with SLES11 SP1 and benefit from the broader user-base and QA of the enterprise product. openSUSE11.4 contains kernel version 2.6.37, which has excellent support for older P4-based hardware.

Another option is using the openSUSE Build Service to maintain your own 32-bit Xen packages. In fact, the community itself can maintain 32-bit Xen in the Virtualization project if there is enough interest. We will be happy to accept any patches that do not break 64-bit environments :-). One benefit of this option is that the openSUSE Factory Xen packages are developed in the Virtualization project. A community maintained, 32-bit Xen host in this project would be submitted to Factory, and hence included in the next openSUSE release, as part of the overall Xen package submission done by the openSUSE maintainers.

Updated libvirt for openSUSE12.1 RC1

Last week I updated the libvirt package for openSUSE12.1 RC1 / Factory to version 0.9.6. The package was also submitted for SLE11 SP2 Beta8. Changes since last update include backporting of AHCI controller patch for qemu driver. With this patch it is possible to use SATA drives with qemu instances. The following controller device XML is used to specify an AHCI controller

<controller type='sata' index='0'>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>

The libvirt qemu driver supports many AHCI controllers, each with one bus and 6 units. To attach a SATA disk to a unit on an AHCI controller, use the following disk device XML

<disk type='file' device='disk'>
  <driver name='qemu' type='raw'/>
  <source file='/var/lib/libvirt/images/test/disk0.raw'/>
  <target dev='sda' bus='sata'/>
  <address type='drive' controller='0' bus='0' unit='0'/>

Also new to this libvirt update is opt-in for Apparmor confinement of qemu instances. /etc/libvirt/qemu.conf has been patched to explicitly set the security driver to ‘none’. If Apparmor is enabled on the host, libvirtd is generously confined since it needs access to many utilities and libraries, but users must opt-in to also have qemu instances launched by libvirtd confined. Simply edit /etc/libvirt/qemu.conf and change security_driver to ‘apparmor’. Of course, selinux is also available if users prefer it over Apparmor.