libvirt support for Xen’s new libxenlight toolstack

I had the pleasure of meeting Russell Pavlicek, who shares Xen community management responsibilities with Lars Kurth, at SUSECon last November and he, along with Dario Faggioli of the Xen community, poked me about writing a blog post describing the state of Xen support in libvirt.  I found this to be an excellent idea and good reason to get off my butt and write!  Far too much time passes between my musings.

Xen has had a long history in libvirt.  In fact, it was the first hypervisor supported by libvirt.  I’ve witnessed an incredible evolution of libvirt over the years and now not only does it support managing many hypervisors such as Xen, KVM/QEMU, LXC, VirtualBox, hyper-v, ESX, etc., but it also supports managing a wide range of host subsystems used in a virtualized environment such as storage pools and volumes, networks, network interfaces, etc.  It has really become the swiss army knife of virtualization management on Linux, and Xen has been along for the entire ride.

libvirt supports multiple hypervisors via a hypervisor driver interface, which is defined in $libvirt_root/src/drvier.h – see struct _virDriver.  libvirt’s virDomain* APIs map to functions in the hypervisor driver interface, which are implemented by the various hypervisor drivers.  The drivers are located under $libvirt_root/src/<hypervisor-name>.  Typically, each driver has a $libvirt_root/src/<hypervisor-name>/<hypervisor-name>_driver.c file which defines a static instance of virDriver and fills in the functions it implements.  As an example, see the definition of libxlDriver in $libvirt_root/src/libxl/libxl_driver.c, the firsh few lines of which are

static virDriver libxlDriver = {
.no = VIR_DRV_LIBXL,
.name = “xenlight”,
.connectOpen = libxlConnectOpen, /* 0.9.0 */
.connectClose = libxlConnectClose, /* 0.9.0 */
.connectGetType = libxlConnectGetType, /* 0.9.0 */
….
}

The original Xen hypervisor driver is implemented using a variety of Xen tools: xend, xm, xenstore, and the hypervisor domctrl and sysctrl interfaces.  All of these “sub-drivers” are controlled by an “uber driver” known simply as the “xen driver”, which resides in $libvirt_root/src/xen/.  When an API in the hypervisor driver is called on a Xen system, e.g. virDomainCreateXML, it makes its way to the xen driver, which funnels the request to the most appropriate sub-driver.  In most cases, this is the xend sub-driver, although the other sub-drivers are used for some APIs.  And IIRC, there are a few APIs for which the xen driver will iterate over the sub-drivers until the function succeeds.  I like to refer to this xen driver, and its collection of sub-drivers, as the “legacy Xen driver”.  Due to its heavy reliance on xend, and xend’s deprecation in the Xen community, the legacy driver became just that – legacy.  With the introduction of libxenlight (aka libxl), libvirt needed a new driver for Xen.

In 2011 I had a bit of free time to work on a hypervisor driver for libxl, committing the initial driver in 2b84e445.  As mentioned above, this driver resides in $libvirt_root/src/libxl/.  Subsequent work by SUSE, Univention, Redhat, Citrix, Ubuntu, and other community contributors has resulted in a quite functional libvirt driver for the libxl toolstack.

The libxl driver only supports Xen >= 4.2.  The legacy Xen driver should be used on earlier versions of Xen, or installations where the xend toolstack is used.  In fact, if xend is running, the libxl driver won’t even load.  So if you want to use the libxl driver but have xend running, xend must be shutdown followed by a restart of libvirtd to load the libxl driver.  Note that if xend is not running, the legacy Xen driver will not load.

Currently, there are a few differences between the libxl driver and the legacy Xen driver.  First, the libxl driver is clueless about domains created by other libxl applications such as xl.  ‘virsh list’ will not show domains created with ‘xl create …’.  This is not the case with the legacy Xen driver, which is just a broker to xend.  Any domains managed by xend are also manageable with the legacy Xen driver.  Users of the legacy Xen driver in libvirt are probably well aware that ‘virsh list’ will show domains defined with ‘xm new …’ or created with ‘xm create …’, and might be a bit surprised to find this in not the case with the libxl driver.  But this could be addressed by implementing functionality similar to the ‘qemu-attach’ capability supported by the QEMU driver, which allows “importing” a QEMU instance created directly with e.g. ‘qemu -m 1024 -smp …’.  Contributions are warmly welcomed if this functionality is important to you :-).

A second difference between the libxl and legacy Xen drivers is related to the first one.  xend is the stateful service in the legacy stack, maintaining state of defined and running domains.  As a result, the legacy libvirt Xen driver is stateless, generally forwarding requests to xend and allowing xend to maintain state.  In the new stack, however, libxl is stateless.  Thererfore, the libvirt libxl driver itself must now maintain the state of all domains.  An interesting side affect of this is losing all your domains when upgrading from libvirt+xend to libvirt+libxl.  For a smooth upgrade, all running domains should be shutdown and their libvirt domXML configuration exported for post-upgrade import into the libvirt libxl driver.  For example, in psuedo-code

for each domain
virsh shutdown domain
virsh dumpxml > domain-name.xml
perform xend -> libxl upgrade
restart libvirtd
for each domain
virsh define domain-name.xml

It may also be possible to import xend managed domains after upgrading to libxl.  On most installations, the configuration of xend managed domains is stored in /var/lib/xend/domains/<dom-uuid>/config.sxp.  Since the legacy Xen driver already supports parsing SXP, this code could be used read any existing xend managed domains and import those into libvirt.  I will need to investigate the feasibility of this approach, and report any findings in a future blog post.

The last (known) difference between the drivers is the handling of domain0.  The legacy xen driver handles domain0 as any other domain.  The libxl driver currently treats domain0 as part of the host, thus e.g. it is not shown in ‘virsh list’.  This behavior is similar to the QEMU driver, but is not necessarily correct.  Afterall, domain0 is just another domain in Xen, which can have devices attached and detached, memory ballooned, etc., and should probably be handled as such by the libvirt libxl driver.  Contributions welcomed!

Otherwise, the libxl driver should behave the same as the legacy Xen driver, making xend to libxl upgrades quite painless, outside of the statefullness issue discussed above. Any other differences between the legacy Xen driver and the libxl driver are bugs – or missing features.  Afterall, the goal of libvirt is to insulate users from underlying churn in hypervisor-specific tools.

At the time of this writing, the important missing features in the libxl driver relative to the legacy Xen driver are PCI passthrough and migration.  Chunyan Liu has provided patches for both of these features, the first of which is close to committing upstream IMO

https://www.redhat.com/archives/libvir-list/2014-January/msg00400.html

https://www.redhat.com/archives/libvir-list/2013-September/msg00667.html

The libxl driver is also in need of improved parallelization.  Currently, long running operations such as create, save, restore, core dump, etc. lock the driver, blocking other operations, even those that simply get state.  I have some initial patches that introduce job support in the libxl driver, similar to the QEMU driver.  These patches allow classifying driver operations into jobs that modify state, and thus block any other operations on the domain, and jobs that can run concurrently.  Bamvor Jian Zhang is working on a patch series to make use of libxl’s asynchronous variants of these long running operations.  Together, these patch sets will greatly improve parallelism in the libxl driver, which is certainly important in for example cloud environments where many virtual machine instances can be started in parallel.

Beyond these sorely needed features and improvements, there is quite a bit of work required to reach feature parity with the QEMU driver, where it makes sense.  The hypervisor driver interface currently supports 193 functions, 186 of which are implemented in the QEMU driver.  By contrast, only 86 functions are implemented in the the libxl driver.  To be fair, quite a few of the unimplemented functions don’t apply to Xen and will never be implemented.  Nonetheless, for any enthusiastic volunteers, there is quite a bit of work to be done in the libvirt libxl driver.

Although I thoroughly enjoy working on libvirt and have healthy respect for the upstream community, my available time to work on upstream libvirt is limited.  Currently, I’m the primary maintainer of the Xen drivers, so my limited availability is a bottleneck.  Other libvirt maintainers review and commit Xen stuff, but their primary focus is on the rapid development of other hypervisor drivers and host subsystems.  I’m always looking for help in not only implementation of new features, but also reviewing and testing patches from other contributors.  If you are part of the greater Xen ecosystem, consider lending a hand with improving Xen support in libvirt!

Xen live migration in OpenStack Grizzly

I recently experimented with live migration in an OpenStack Grizzly cluster of Xen compute nodes and thought it useful to write a short blog post for the benefit of others interested in OpenStack Xen deployments.

My OpenStack Grizzly cluster consisted of three nodes: One controller node hosting most of the OpenStack services (rabbitmq, mysql, cinder, keystone, nova-api, nova-scheduler, etc.) and two Xen compute nodes running nova-compute, nova-network, and nova-novncproxy.  All nodes were running fully patched SLES11 SP2.  devstack was used to deploy the OpenStack services.

For the most part, I used the OpenStack documentation for configuring live migration to setup the environment.  The main configuration tasks include

  1. Configuring shared storage on the compute nodes involved in live migration.  I took the simple approach and used NFS for shared storage between the compute nodes, mounting an NFS share at /opt/stack/data/nova/instances in each compute node.
  2. Ensure that the UID and GID of your nova (or stack) and libvirt users are identical between each of your servers. This ensures that the permissions on the NFS mount will work correctly.
  3. Ensure the shared directory has ‘execute/search’ bit set, e.g. chmod o+x /opt/stack/data/nova/instances
  4. Ensure firewall is properly configured to allow migrations.  For Xen, port 8002 needs to be open.  This is the port xend listens on for migration requests.

In addition to the steps described in the OpenStack documentation, the following configuration needs to be performed on the Xen compute nodes

  1. Enable migration (aka relocation) in /etc/xen/xend-config.sxp: (xend-relocation-server yes)
  2. Define a list of hosts allowed to connect to the migration port in /etc/xen/xend-config.sxp.  To allow all hosts, leave the list empty:  (xend-relocation-hosts-allow ”)
  3. Set the ‘live_migration_flag’ option in /etc/nova/nova.conf.  In the legacy xm/xend toolstack, xend implements all of the migration logic.  Unlike the libvirt qemu driver, the libvirt legacy xen driver can only pass the migration request to the Xen toolstack, so the only migration flags needed are VIR_MIGRATE_LIVE and VIR_MIGRATE_UNDEFINE_SOURCE:  live_migration_flag=VIR_MIGRATE_LIVE,VIR_MIGRATE_UNDEFINE_SOURCE
  4. Set the live_migration_uri option in /etc/nova/nova.conf.  The default for this option is ‘qemu+tcp://%s/system’.  For Xen, this needs to be ‘xenmigr://%s/system’:  live_migration_uri = xenmigr://%s/system

After these configuration steps, restart xend and nova-compute on the Xen compute nodes to reload the new configuration.  Your OpenStack Xen cluster should now be able to perform live migration as per the OpenStack Using Migration documentation.

On my small cluster, xen71 is the controller node and xen76 and xen77 are the Xen compute nodes.  I booted an instance of a SLES11 SP2 Xen HVM image that was provisioned on xen76.

stack@xen71:~> nova list
+--------------------------------------+------------------------+--------+--------------------+
|ID                                    | Name                   | Status | Networks           |
+--------------------------------------+------------------------+--------+--------------------+
| 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a | sles11sp2-xen-hvm-test | ACTIVE | private=10.4.128.2 |
+--------------------------------------+------------------------+--------+--------------------+
stack@xen71:~> nova show 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | ACTIVE                                                   |
| updated                             | 2013-04-05T17:27:16Z                                     |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-SRV-ATTR:host                | xen76                                                    |
| key_name                            | None                                                     |
| image                               | SLES11SP2-xen-hvm (5b39e6b3-bc3f-4fb0-81a0-b115cb8ada80) |
| private network                     | 10.4.128.2                                               |
| hostId                              | cca619a77da34c0c26001fb2438d7cce6a5da6408ae8ec111401f627 |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000a                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | xen76.virt.lab.novell.com                                |
| flavor                              | m1.tiny (1)                                              |
| id                                  | 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a                     |
| security_groups                     | [{u'name': u'default'}]                                  |
| user_id                             | 86b77dee688e4eff957865205d27464a                         |
| name                                | sles11sp2-xen-hvm-test                                   |
| created                             | 2013-04-05T17:10:04Z                                     |
| tenant_id                           | 0833047bb70d4b38874328aad83b7140                         |
| OS-DCF:diskConfig                   | MANUAL                                                   |
| metadata                            | {}                                                       |
| accessIPv4                          |                                                          |
| accessIPv6                          |                                                          |
| progress                            | 0                                                        |
| OS-EXT-STS:power_state              | 1                                                        |
| OS-EXT-AZ:availability_zone         | nova                                                     |
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+

Now let’s migrate the instance to the xen77 compute node

stack@xen71:~> nova live-migration 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a xen77

While the migration is in progess, we can see the status and task state as migrating

stack@xen71:~> nova show 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | MIGRATING                                                |
| updated                             | 2013-04-05T20:16:27Z                                     |
| OS-EXT-STS:task_state               | migrating                                                |
| OS-EXT-SRV-ATTR:host                | xen76                                                    |
| key_name                            | None                                                     |
| image                               | SLES11SP2-xen-hvm (5b39e6b3-bc3f-4fb0-81a0-b115cb8ada80) |
| private network                     | 10.4.128.2                                               |
| hostId                              | cca619a77da34c0c26001fb2438d7cce6a5da6408ae8ec111401f627 |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000a                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | xen76.virt.lab.novell.com                                |
| flavor                              | m1.tiny (1)                                              |
| id                                  | 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a                     |
| security_groups                     | [{u'name': u'default'}]                                  |
| user_id                             | 86b77dee688e4eff957865205d27464a                         |
| name                                | sles11sp2-xen-hvm-test                                   |
| created                             | 2013-04-05T17:10:04Z                                     |
| tenant_id                           | 0833047bb70d4b38874328aad83b7140                         |
| OS-DCF:diskConfig                   | MANUAL                                                   |
| metadata                            | {}                                                       |
| accessIPv4                          |                                                          |
| accessIPv6                          |                                                          |
| OS-EXT-STS:power_state              | 1                                                        |
| OS-EXT-AZ:availability_zone         | nova                                                     |
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+

Once the migration completes, we can see that the instance is now running on xen77

stack@xen71:~> nova show 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | ACTIVE                                                   |
| updated                             | 2013-04-05T20:11:37Z                                     |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-SRV-ATTR:host                | xen77                                                    |
| key_name                            | None                                                     |
| image                               | SLES11SP2-xen-hvm (5b39e6b3-bc3f-4fb0-81a0-b115cb8ada80) |
| private network                     | 10.4.128.2                                               |
| hostId                              | cabdc8468130edd0f85440f1b2922419b359b3da36a40de98713dbda |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000000a                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | xen76.virt.lab.novell.com                                |
| flavor                              | m1.tiny (1)                                              |
| id                                  | 6b45baa2-3dc2-420c-a7ab-aad25fc1aa2a                     |
| security_groups                     | [{u'name': u'default'}]                                  |
| user_id                             | 86b77dee688e4eff957865205d27464a                         |
| name                                | sles11sp2-xen-hvm-test                                   |
| created                             | 2013-04-05T17:10:04Z                                     |
| tenant_id                           | 0833047bb70d4b38874328aad83b7140                         |
| OS-DCF:diskConfig                   | MANUAL                                                   |
| metadata                            | {}                                                       |
| accessIPv4                          |                                                          |
| accessIPv6                          |                                                          |
| progress                            | 0                                                        |
| OS-EXT-STS:power_state              | 1                                                        |
| OS-EXT-AZ:availability_zone         | nova                                                     |
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+

You might notice a small bug here in that OS-EXT-SRV-ATTR:hypervisor_hostname is not updated with the xen77 host now running the instance.  A minor issue that I will add to my list of bugs needing investigation.