Tuesday, February 21, 2012

S11 and S10 inside LDOM 2.1 on T4

I've finally managed to get some time to play with live migration on a pair of SPARC T4-2. This post is not really adding any new information but is a walk-trough and initial reflections. I am going to continue to write LDOM instead of Oracle VM for SPARC Domains or something like that, even Oracle people still say LDOM and everyone else knows what is.

An interesting note is that I've used Solaris 10 as I/O and Control domain for the T4 servers while the LDOM is installed with Solaris 11 11/11. The disks for the LDOM are on LUNs over FC and MPxIO is used for multipathing from the I/O domain:

t42-01# dskinfo list-long
disk size lun use p spd type lb
c0t5000CBA015B85D98d0 279G - rpool - - disk y
c0t5000CBA015B93B90d0 279G - - - - disk y
c0t50002870000254901593534030832420d0 33G 0x0 - 4 4Gb fc y
c0t50002870000254901593534030832420d0 33G 0x1 - 4 4Gb fc y
Examples of migrating and reconfiguring the LDOM while running:
t42-01# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- UART 16 16G 0.1% 12d 6h 37m
ldms11-01 active -n---- 5000 16 8G 0.0% 24m

t42-02# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- UART 16 16G 0.1% 12d 1h 26m

ldms11-01:~$ uptime
5:11pm up 19 min(s), 1 user, load average: 0.00, 0.00, 0.01
henrikj@ldms11-01:~$ prtconf -v |grep Mem
Memory size: 8192 Megabytes
henrikj@ldms11-01:~$ psrinfo | wc -l
16

t42-02# ldm set-vcpu 96 ldms11-01
t42-02# ldm set-memory 200G ldms11-01

t42-02# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- UART 16 16G 0.1% 12d 6h 50m
ldms11-01 active -n---- 5000 96 200G 0.1% 24m

ldms11-01:~$ prtconf -v |grep Mem
Memory size: 204800 Megabytes
ldms11-01:~$ psrinfo | wc -l
96
When performing a live migration between the two hosts, running processes and open network connections are as expected intact, there is only a small delay in the network traffic visible. For my initial tests the delay was about 10 ms.

The live migration seems to work very well and the T4 seems to perform several times faster than the T2/T3 for general workloads. The only thing missing is that LDOM 2.1 is unable to dynamically reconfigure memory and CPU resources for a domain after migration. A reboot is then required, hopefully this will be fixed in the 3.0 release, which people at Oracle Open World said would be focused on removing current limitations (including migration between different types of sun4v processors).