Wednesday, December 30, 2009

Looking back a decade, part 1

It's now a new decade and a good time to take a look on the important things that's happened the last 10 years with focus on Sun and Solaris. I'll try reflect on the good things and mention some of the bad things that have forced Solaris and Sun into a series of transitions. I'll split this into three posts. I'll begin around the year 2000, probably in the 1998-2002 range.

The new millennia began with a massive hysteria about the Y2K bug and everybody was upgrading server, storing food and preparing for the final hour. Sun had spent the nineties transforming the business from workstations towards servers and fighting for open standards and network computing. Many Solaris sites where at the time running Solaris 2.6 and applying all the "Y2K" patches to their systems.

Solaris 7 was available, but since it was the first 64-bit release of Solaris, third party development was lagging. Veritas Volume manager (VxVM), which at the time was close to a mandatory part of all enterprise installations was delayed for a very long time. Sun was without a good volume manager for Solaris and UFS was yet to be enhanced for better performance and disk suite had not been kept up to date since Sun even had bundled VxVM. All this was probably enough to have a fresh look at a new storage solution for, i think an early internal name for this project was "pacific", but I might be wrong. Probably as an interim Solution the Solstice volume manager was integrated into the next release, Solaris 8 as the Solaris volume manager.

Apart for the lag in third part drivers the transition to 64-bit was made early in Solaris, and without any of the hassle we are seeing on other platforms today, ten years later. There was one distribution with could run either a 32 or 64 bit kernel and in the later binaries of different bitwidth could execute side by side.

Solaris 2.6 to Solaris 7 where dependable and competitive for the time, but compared to what we are used to today it was a bit slower and more of just an standard System V implementation than todays Solaris. It was also more "bare" with little additional software beside the core OS included. Several administrative tools that we take for granted today was not available, some of them was introduced with Solaris 8 such as prstat, pgrep, pkill and mdb. The Solaris development process was much more closed, no source, no blogs about new features, no public ARC cases and no external development builds to test.

On the hardware side the UltraSPARC II was powering the Ultra Enterprise series. While being the first high volume server series from Sun it was dependable and sold very good. The size ranged from the one socket Ultra Enterprise 1 to the 64 socket Ultra Enterprise 10000 (StarFire). With such large systems even in the late nineties it's no surprise that Solaris have had little problem scaling on todays multi core processors. I remember thinking if I could get my hands on a E3000 when it was time for retirement, over ten years later I now have one as beer table beside my armchair, not quite what I intended back then.

Sun had several very good years late nineties, they where the number one Unix vendor and for a short time I think they even one of the top storage resellers, much due to all internal drives. But when the decline began in 2000 sun was hit hard. The investments into research and development did however continue which bare fruit some years later and is still making a difference, but more on that in the next post.

Looking back, I sometimes wonder what would have happened if Sun made the decision to make Solaris freely available and open already at this time. It would probably have grown the community and general acceptance much faster since Solaris had more presence outside server environments then. It was more widely used at universities and still had some of the workstation business left. Then again, no other of the big vendor have released their source, it's not available for AIX, Irix, HPUX or Windows. The initial lack of commitment to the X86 platform probably also hurt Sun when it became de-facto standard for small servers, but as long as it did not provide any threat to SPARC, too little effort was put into the alternative hardware platform.

Sunday, December 27, 2009

Build 130 available

There is some latency here during the holiday but on christmas eve the latest development version of OpenSolaris 2010.03 was released, it is based on build 130. This new release has move to a new major version of Xorg (1.7.3) and thunderbird (3.0). Later version of python and ruby among other changes. There have been some visible feature enhancement to the installation system also, read more in the full announcement.

"The ability to install on to an iSCSI target[1]

A new "bootable" Automated Install (AI) CD and USB ISO that
does not require an AI server to be setup[2]

Distribution Constructor enhancements to create OVF 1.0 virtual
machine (VM) images[3]

Extended and logical partition support for both the Live CD and
AI installers[4]"


Since we are getting close to the release and focus now is on fixing bugs (and delivering the text based installer), please give this build a spin and if you find any new bugs please report them as soon as possible. Look here for a quick introduction on searching the bug database and reporting new bugs. Be sure read the announcement before installing/upgrading, it includes several known bugs.

Note: there seems to be some problems on graphical workstations with this version, more specifically Nvidia OpenGL, look at the announcement and bug 6912450. It's probably a good idea to disable compiz prior to upgrading to b130 since failure of OpenGL will make the whole desktop unusable if compiz is enabled.

I've also updated the software version table .

ISO/USB images for SPARC/X86 is as usual available at www.genunix.org.

Thursday, December 17, 2009

New and improved CPU support

OpenSolaris 2010.03 will support for two new generations of processors, one SPARC and one AMD X86. Fault management will also be enhanced for the upcoming Intel Nehalem EX.

Rainbow falls or perhaps "UltraSPARC T3"
Support for the next generation of CMT processors from Sun planned for 2010 was added to build 131. This new chip will have 16 cores with 8 (or even 16?) threads each. No public PSARC or bugid is available, but here is the putback:
PSARC 2009/177 Solaris support for Rainbow Falls platforms
Here is a presentation of the CPU from the hotchips conference:
Sun's Next Generation CMT Processor

AMD Magny-Cours processor
A 8-12 core CPU that should appear in 2010, the following changes was integrated into build 128:
6843035 Need support for Magny-Cours processors
6860401 FMA CPU Topology & Memory Topology needs to support Magny Cours (Multi chip Module)

Intel Nehalem EX
Intel next Xeon CPU with 8 cores, 2 threads per core. Is also supported in OpenSolaris 2009.6 but FMA was added in b127:
FMA for Intel Nehalem EX
There is also a whitepaper on Nehalem support in OpenSolaris, worth a read:
The Solaris OS and Intel Nehalem EX

Sun/Oracle merger news

There have been some news regarding the Sun/Oracle merger this week. It is finally looking like the deal will get approved. Oracle announced its plans and commitment for MySQL this monday and yesterday the New York post wrote that the merger got green light.

"Larry Ellison's Oracle won an all-clear from Europe's top antitrust official to proceed with its $7.4 billion purchase of Sun Microsystems after reaching a verbal agreement to protect a piece of software at the heart of a months-long dispute."

Read the whole article here. Lets hope this is true and that we'll see the end of this long story soon.

ZFS SNIA presentation

This is a couple of months old now but it covers many of the new ZFS features, much the same that was discussed at the kernel conference. Many of the feature is now integrated in OpenSolaris such as zpool recovery, deduplication, radidz3 and quotas. Some of the features mentioned that will be released at a later date includes BP rewrite and encryption.

ZFS the next word...

Thursday, December 10, 2009

OpenSolaris software table

I've compiled a version list of some software packages which are part of the installation or available in the default repository for all OpenSolaris releases. There are quite possible different versions available in other repositories, but these are the default versions shipped by Sun and integrated with the rest of the release. If multiple versions are available only the latest is listed. The latest development release is included which should be released in March next year and named 2010.03, this release is of course subject to change so the table will probably be updated at least when the final release is available.

Update: The table has been updated to reflect the current state as of build 132.























































































































































Release2008.052008.112009.062010.031
ON build86101b111b133
Xorg7.2.5/1.3.07.2.5/1.3.07.4/1.5.37.5/1.7.4
Gnome2.20.22.24.02.24.22.28.0
Zpool10131422
ZFS3334
Firefox2.0.0.143.0.43.1B33.5.7
Thunderbird2.0.0.122.0.0.172.0.0.213.0.1
Postgres8.2.68.3.48.3.78.4.2
MySQL5.0.455.0.675.1.305.1.37
Python2.4.42.52.6.12.6.4
Perl5.8.45.8.45.8.45.10.0
PHP5.2.45.2.45.2.95.2.12
Ruby1.8.6p1101.8.6p2871.8.7p721.8.7p174
GCC3.4.33.4.34.3.24.3.2
xVM (XEN)3.13.13.13.4
Java1.6.0_041.6.0_101.6.0_131.6.0_18
Nvidia drivers169.12177.8180.44190.53
Mesa6.5.27.0.47.2.57.4.4
Apache2.2.82.2.92.2.112.2.14
OpenSSL0.9.8a20.9.8a20.9.8a20.9.8l


1. Latest build in development branch, 2010.03 is scheduled for a release in March, target build is 134.
3. OpenSSL 0.9.8a has various security patches applied which differs between the releases.

Tuesday, December 8, 2009

Last open build before 2010.03 closed (b130)

There will not be any more major features added to the OpenSolaris builds before the next release, build 130 was closed today and now the gate will begin to freeze in preparation for the 2010.03 release. The next four builds will focus mainly on bug fixes and device drivers.

Here is the schedule for the remaining planned builds before release:
onnv_131        01/04/2010      01/11/2010 Bug fixes, escalations & device drivers only
** Winter Break (4 week build)
onnv_132 01/19/2010 01/25/2010* Bug fixes, escalations & device drivers only
onnv_133 02/01/2010 02/08/2010 Stopper fixes only
onnv_134 02/16/2010 02/22/2010* Stopper fixes only

heads-up: guidelines for builds 127-130
ON Build and WOS Delivery Schedule

Sunday, December 6, 2009

No more NIS+

It has been planed for years now NIS+ has been removed from the development branch of OpenSolaris, soon all that nistbladm(1) skill will be obsolete. Since it integrated in time for b129 the first Solaris without NIS+ will be OpenSolaris 2010.03. If you still use NIS+ you can continue to use it as long that you stick with Solaris 10, but it might be time to start planing the migration.

NIS+ End-of-Feature (EOF) Announcement FAQ
PSARC/2009/503 Removal of NIS+

Thursday, December 3, 2009

First OpenSolaris build with ZFS dedup available

A new build (128a) in the development branch of OpenSolaris was released today. This build has support for the long awaited ZFS deduplication. Now you can download it and try it for yourself. This build includes other interesting features also such as zfs send dedup and zpool recovery.

While am at it, there have been a few other software releases this week:

Netbeans 6.8 Release Candidate 1 have been released and is available here.

VirtualBox 3.1 made it into a final release and is available for download.

Wednesday, December 2, 2009

OpenSolaris Text based installer update

There are finally some characters on the screen for the Text based installer project, the project pages have been updated. There is now access information for the project gate, build instructions and links to a code review (which does not seem to work right now).

The text installer is a essential part for bringing OpenSolaris into the datacenter, especally for the SPARC server which seldom have any framebuffers. Today the only installation option for OpenSolaris on SPARC is the Automatic Installer (AI) which requires DHCP service, a installation server and either access to external package repositories or a internal mirror.

This project will make it much easier to deploy single instances of OpenSolaris on your SPARC. X86 will also benefit from this, especially when installing inside xVM domains which today requires setup of VNC to perform a graphical installation.

A prototype for X86 was created this summer, read more here. The option for a interactive installation on SPARC have been promised for the next OpenSolaris release.

Tuesday, December 1, 2009

Installing the new Oracle 11g on OpenSolaris

I've already mentioned that Oracle 11g release 2 is now available for Solaris 10 on the X64 platform. I tried to install this release on the latest OpenSolaris development build (b127), it did initially fail but i found a quick workaround. This is of course not suitable to anything else than for pure testing and fun since it's currently only certified for Solaris 10 (Update 6 and later).

The helpful error massage from the installer was "Error in invoking target 'all_no_orcl' of makefile", but the log file shows what have gone wrong: "INFO: ld: fatal: library -lcrypto: not found"

The problem is that the OpenSSL libraries have moved out of /usr/sfw/lib in OpenSolaris, they are now located under /lib. The Oracle installer supplied the linker with the old location "-L/usr/sfw/lib/amd64" thous it failed.

A quick and very dirty fix to this is to provide a link to the library at the old location:
# ln -s /lib/amd64/libcrypto.so /usr/sfw/lib/amd64
Please drop a comment if there is a way of redirecting Oracle instead, but this was good enough for my little test.

After this I got the RDBMS up and running, including the web based Oracle Enterprise Manager if that's your thing.

As with Solaris 10 Oracle works fine inside zones, but in OpenSolaris zones are by default only a bare minimum of packages, therefore the following must be installed in the zone before Oracle:
# pkg install SUNWsprot SUNWmfrun SUNWuiu8 SUNWunzip SUNWbtool

Wednesday, November 25, 2009

Crossbow for the win

Here's a very good paper on the Crossbow network virtualization and resource control which is available in OpenSolaris since 2009.06: Crossbow Virtual Wire: Network in a Box.

This paper even won the best paper award for Usenix LISA 2009. The blog of one of the authors is available here where he writes about the award and the BOF at Usenix Lisa 09.

Well worth a read if you don't know what crossbow is and can do, and if you do it's still worth a read.

A little news summary

All this is available in other blogs at blogs.sun.com, but here's a summary of some Solaris related news.

Oracle 11g Release 2 is released for Solaris 10 X64, a download is available here. It's nice to see even more Solaris commitment from Oracle.

VirtualBox 3.1 Beta 3 is also available, more details here. VirtualBox 3.1 comes with new features such as live migration between hosts, enhanced USB support in OpenSolaris, better snapshot functionality, faster 2D acceleration and support for EFI.

US Senators Go to Bat for Oracle, Sun Merger: 59 senators also thinks it's about time to let the Oracle-Sun deal proceed.

While I'm at it, a beta of NetBeans 6.8 is available, sadly they do not seem to put any effort into the Python parts. Support for interpreted languages in NetBeans is mostly for Ruby and PHP . More focus on python would have been nice, it seems like python is the interpreted language of choice in OpenSolaris, the Image Packaging System, IPS is built with python. That said there is a python module available for NetBeans, but it doesn't get the same development attention.

Monday, November 23, 2009

ZFS crypto pushed to next year

With only a few weeks left of open build it might not come as a surprise that crypto for ZFS is not making it into 2010.03.I noticed that the ZFS crypto page have been updated with a new target date "Integration Target: Q1CY10".

This is probably wise with lots of fixes and new features for ZFS integrated since the last OpenSolaris release. This means that two out of four upcoming ZFS features that I wrote about in Mars made it in time for OSOL 2010.03. Hopefully both crypto and BP rewrite will be ready in time for the next (Open)Solaris release, when and how the new masters of Sun* decides to release it.

* Lets hope that the European Commission finally have come to their senses and freed Sun from this limbo by then. I guess they will at least delay this as long as they possibly can (mid January). Keeping the current pace the next release would probably be at least 6 months after 2009.03, so about a year from now.

Sunday, November 22, 2009

Faster resilver for zpools

Previous to this putback ZFS did not do any prefetching of data when resilering or scrubbing a pool. This made such operations more time consuming that the would need too be. Since resilvering a large pool can take days, anything that can speed up such operations can make quite a difference in time spent without sufficient replication of data. Fortunately faster resileving for zpool is on its way into OpenSolaris with the putback of "6678033 resilver code should prefetch". The gain of this will of course depend on your pool, but I'll try to find time for testing so that I can get back with some numbers in a later post.

Since scrub and resilvering shares the same code, this should improve scrubbing performance as well. Scrub prefetch was mentioned in the KCA 2009 keynote.

Friday, November 20, 2009

xVM sync with xen 3.4 integrated

Good news for those of us who use xVM in OpenSolaris, the sync with Xen 3.4 have been integrated into o ONNV. This means that it should be available in build 129 which should be released mid December.

Changes from the original 3.4 announcement from Xen:
" - Device passthrough improvements, with particular emphasis on support for
client devices (further support is available as part of the XCI project at
http://xenbits.xensource.com/xenclient/)
- RAS features: cpu and memory offlining
- Power management - improved frequency/voltage controls and deep-sleep
support. Scheduler and timers optimised for peak power savings.
- Support for the Viridian (Hyper-V) enlightenment interface
- Many other x86 and ia64 enhancements and fixes"

It does not look like there is support for device passthrought like PCI devices in Solaris yet though, so this part of the above announcement is probably irrelevant to xVM at this point.

More info on the putback is avaiable here: http://hg.genunix.org/onnv-gate.hg/rev/fe619717975a

Thursday, November 19, 2009

Deduplication with zones

One of the major strengths of zones in Solaris is that they are very lightweight, since they share the same kernel they have low CPU, I/O and memory overhead. In Solaris 10 the ability to create "sparse" zones is available, with this option the local zones created shares most of the binaries and libraries with the global zone. This does not only save space, it also saves memory since all zones share the same instances of common binaries and libraries. The downside of sparse zones is that they have a very strong relationship with the global zone and no modifications unique to any zone can be made to the shared filesystems.

In OpenSolaris and later updates of Solaris 10 the ability to clone a zone is available. A zone is installed on a ZFS filesystem of which a clone is created for every new zone. Only minor modifications are made to the cloned filesystem to give the zone it's unique identity. This works much like deduplication until you patch or upgrade the system, which will make all the clones contain their own copies of the new data even if it's common to other zone instances.

Sparse zones are not supported by the new packaging system in OpenSolaris and it might never be. But zones in OpenSolaris only installs a very basic set of packages, which makes a clean install of a zone very small to begin with, they can then be placed on a compressed filesystem, and in OpenSolaris 2010.03 this filesystem can also be deduplicated.

I've done a small test to see how much space will be used by every zone instance with both compression and deduplication. These are freshly installed zones with have been booted once so that everything have been initialized in the zones:

A single zone on a LZJB compressed and deduped ZFS filsystem:
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zdup01 9.94G 241M 9.70G 2% 1.01x ONLINE -
Two zones:
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zdup01 9.94G 253M 9.69G 2% 1.99x ONLINE -
Three zones:
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zdup01 9.94G 263M 9.68G 2% 2.96x ONLINE -
So every zone uses a little more than 10MB of disk space and with deduplication you also get the same benefits in memory footprint as a with a sparse zone since there is only one deduplicated instance of Solaris libraries and binaries for the zones. They are however not shared with the global zone, since it boots from a separate pool without compression and deduplication. Unlike zones on a cloned ZFS filesystem the deduplication will continue to work after upgrading the zones and for software added to the zone from for example the pkg repositories post install time.

It looks like I can continue continue to run 20 zones on my thirteen year old Ultra 2 workhorse even if I upgrade it to OpenSolaris one day.

Monday, November 9, 2009

ZFS send dedup integrated

Moments ago, one week after zpool dedup was integrated, similar functionality was added for zfs send streams. It looks like OSOL 2010.03 is going to get quite a lot of new ZFS features.

zfs send with the new -D option will dedup the streams created and thereby possibly reducing bandwidth or disk space used by the stream. It's not dependent on pool level dedup.

From PSARC/2009/559:

"OVERVIEW:

"Dedup" is an overall term for technologies that eliminate duplicate
copies of data in storage or memory. This specific application of
dedup is for ZFS send streams, i.e., the output of the 'zfs send' command.
For some kinds of data, much of the content of a send stream consists
of blocks for which identical copies have already been sent earlier
in the stream. This technology replaces later copies of a block with
a reference to the earlier copy. This can significantly reduce the
size of a send stream, which reduces the time it takes to transfer
such a stream over a communication channel."

Here is the changeset: http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e

If all goes well this will together with pool level deup (and lots of other changes) be part of build 128 which should arrive early december.

Wednesday, November 4, 2009

Quick spin with ZFS dedup

I've had a quick look at deduplication in ZFS, it works as expected and seems quite fast for my simple tests.

Enable dedup couldn't be easier :
# zfs set dedup=on zdedup01

Simplest case, same file different name gives a dedup factor of 2:
# cp Solaris/sol-nv-b121-x86-dvd.iso /zdedup01
# cp Solaris/sol-nv-b121-x86-dvd.iso /zdedup01/duplicate.iso
# zfs list zdedup01
NAME USED AVAIL REFER MOUNTPOINT
zdedup01 6.91G 55.6G 6.90G /zdedup01
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 3.47G 60.0G 5% 2.00x ONLINE -
# ls -lh /zdedup01
total 6.9G
-rw-r--r-- 1 root root 3.5G 2009-11-04 22:52 duplicate.iso
-rw-r--r-- 1 root root 3.5G 2009-11-04 22:51 sol-nv-b121-x86-dvd.iso

ZFS dedup is block based, that is multiple blocks with the same checksum will point to a single block, so if the exact same data appears more than once but with different block alignment it won't get deduped.

Unarchive a tar-archive, here the block alignment will differ and therefor the checksums of the blocks and no dedup:
# cp sunsudio.tar /zdedup01
# cd /zdedup01
# tar xf sunstudio.tar
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 1.76G 61.7G 2% 1.00x ONLINE -

Empty files will give a quite nice dedup ratio:
# mkfile 5G testfile
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 1.73M 63.5G 0% 40960.00x ONLINE -

In practice it should give a ratio that is on pair with the actual duplication when dealing with ordinary files such as binaries, executables, application installations, zones etc. The ratio is harder to estimate with virtual server disk images (or iSCSI LUNs). A very quick test with two VirtualBox Solaris 10 U8 (core installation) images showed 35 percent saved disk space:
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 984M 62.5G 1% 1.35x ONLINE -

Deduplications of course also works with compression enabled (checksums used for dedup is for compressed data):
# zfs get compressratio zdedup01
NAME PROPERTY VALUE SOURCE
zdedup01 compressratio 1.43x -
# zpool list zdedup01
NAME SIZE USED AVAIL CAP DEDUP HEALTH ALTROOT
zdedup01 63.5G 709M 62.8G 1% 1.25x ONLINE -

Monday, November 2, 2009

ZFS Deduplication!

It looks like ZFS deduplication have finally arrived! Jeff have made the following putback that will be part of build 128:

PSARC 2009/571 ZFS Deduplication Properties
6677093 zfs should have dedup capability

Have a closer look here: http://hg.genunix.org/onnv-gate.hg/rev/e2081f502306

I post more details when I've had some time to look at the change.

Update: No need for me to blog about it, Jeff has his own blog with a brand new entry on dedupliation: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

Friday, October 30, 2009

zpool recovery support integrated!

Zpool recovery was just integrated into the ONNV gate and will be part of OpenSolaris development build 128 in a about a month. Read more about it in this previous entry Zpool recovery support (PSARC/2009/479).

In short, if you have hardware that does not honor cache flush requests or write ordering (some cheap USB-drives for example) combined with for example loss of power a pool can end up being damaged. This option provided a chance of recovering the pool in a automated way by reverting to a older but sane transaction group.

From the new zpool.1m:
"Recovery mode for a non-importable pool. Attempt to
return the pool to an importable state by discarding the
last few transactions. Not all damaged pools can be
recovered by using this option. If successful, the data
from the discarded transactions is irreversibly lost.
This option is ignored if the pool is importable or
already imported."

"# zpool clear -F data
Pool data returned to its state as of Tue Sep 08 13:23:35 2009.
Discarded approximately 29 seconds of transactions."

Here is the changeset: http://hg.genunix.org/onnv-gate.hg/rev/8aac17999e4d

Thursday, October 29, 2009

The curious case of the strange ARC

I've recently encountered some strange ZFS behavior on my OpenSolaris laptop. It was installed about a year ago with OpenSolaris 2008.11, I have since then upgraded to about every developer release available. Every time it's been upgraded lots of updated packages are downloaded. The Image Packaging System caches downloaded data in a directory that can become quite large over time. In my case the "/var/pkg/download" directory had half a million files in it consuming 6.6 gigabytes of disk space.

Traversing trough all these files using du(1B) took about 90 minutes, a terrible long time even for half a million files. Performing the same operation once more took about the same time, which raised two questions. First, why is it so terribly slow to begin with? Second, why doesn't the ARC cache the metadata so that the second run is much faster?

Looking at the activity on the machine the CPU was almost idle but the disk was close to 100 percent busy for the whole run. Arcstat reported that the space used by the ARC was only 1/3 of the target.

ZFS can get problems with fragmentation if a pool is allowed to get close to full. This pool had been upgraded 12 times the last year, every upgrade creates a clone of the filesystem and performs a lot of updates to the /var/pkg/download directory structure. Fragmentation could explain the extremely slow initial run, but the ARC should cache the data for a fast second pass.

Replicating the directory to another machine running the very same OSOL development release(b124) and doing the same test performs much better:

Initial runtime: source ~ 90m repl ~3m
Second runtime: source ~90m repl ~15s
Reads/s initial: source: ~ 200 repl: 5-6K
Reads/s second run: source: ~200 repl: 46k

If we presume that we are correct about the fragmentation problem, all data has now been rewritten at the same time to a pool with plenty of free space. This can explain why the initial pass is now much faster. But why does the ARC speed up the second run on this machine? Both machines are of the same class (~2GHz x86, 2.5" SATA boot disk) but the second machine has more memory. But this shouldn't matter since we have plenty of room left in the ARC even on the slow machine? Some digging shows that there is a limit for the amount of metadata cached in the ARC.

# echo "::arc" | mdb -k |grep arc_meta
arc_meta_used = 854 MB
arc_meta_limit = 752 MB
arc_meta_max = 855 MB

This is something to look into. What's happening is that the ARC is of no use at all in this situation, it is based on Least Recently Used(LRU) and most frequently used lists. Everything under this directory is read the same number of times each pass and in the same order, filling the ARC pushing out entries before they can be used. The arc_metadata_max is set to 1/4 of the ARC which is to small on this 4GB system, lets try raising the limit to 1GB and run the tests again:

# echo "arc_meta_limit/Z 0x4000000" | mdb -kw

Traversing the directory now takes about 10 minutes after the initial run, better but still terribly slow for cached data and the machine is still using the disks extensively. This is caused by access time updates on all files and directories, and remember that the the filesystem is terribly fragmented which of course is a problem for this operation also. We can turn off the atime updates for a filesystem in ZFS:

# zfs set atime=off rpool/ROOT/opensolaris-12

Now, a second pass over all the data takes well under a minute, the ARC problem is solved and the fragmentation problem can be fixed by making some room in the pool and copying the directory so that all data is rewritten. Note that this data could also be removed without any problems, more on this here.

There is no defragmentation tool or function in ZFS (yet? It would depend on bp-rewrite as everything else) and for most of the time it's not needed. ZFS is a Copy On Write(COW) filesystem that is fragmented by design and should deal with it in a good way. It works perfectly fine for most of the time, i've never had any issues with fragmentation on ZFS before. In the future I will make sure to keep some more free space in my pools to minimize risks of something similar happening again.

I plan to write more in detail about the fragmentation part of this issue in a later post, stay tuned.

Thursday, October 22, 2009

Solaris 10 containers integrated

Solaris 10 have a technology known as brandz, which can be seen as a translation layer between the Solaris kernel and a local zone. This layer can be used to provide an execution environment inside the zone that mimics another release of Solaris or even another OS. This have been used to create the LX brand that runs Linux inside a container and it have also been used to crate Solaris 8 and 9 containers.

This have probably helped accelerate Solaris 10 adaptation and made it easier for customers to take advantage of new Solaris features and hardware even if their application environment could not be upgraded. A good example of this is that you can take a Solaris 8 application, put in in a branded zone and use dtrace, which was not available before Solaris 10 to debug the application.

Now that Solaris 10 containers have integrated plan is to provide a similar service for Solaris 10 so that installations and zones can be migrated into Solaris 10 containers on OpenSolaris. This is probably even more important now since all version of Solaris post 10 will feature a whole new packaging system. Upgrades from Solaris 10 to any later Solaris release is not possible and will probably never be in the way we have known upgrades in earlier Solaris releases.

It will be possible migrate both existing Solaris 10 zones(v2v) and whole installations(p2v) into containers on OpenSolaris.

You can take a look at the change here where there is also a link to the PSARC which contains much more information.

Wednesday, October 21, 2009

OSOL 2010.03, the story so far

There have been some time since the OpenSolaris 2009.06 release, and the developers have been busy. I've compiled a list of changes in ONNV that caught my attention along with a few new packages added to the repository.

A few of the major features in so far in my opinion:
  • Tripple parity raidz, offers even better data protection than raidz2, enabling even wider stripes and/or more fault tolerance for zpools.
  • xVM is synced with XEN 3.3, up from version 3.1, offering better compatibility and performance.
  • Crossbow enhancements, bridging, anti-spoofing, Solaris packet capture
  • ZFS User and group quotas

If you find anything you like, you can try it out in the latest development release which is available here.

ZFS/Storage
triple-parity RAID-Z (6854612)
ZFS user/group quotas & space accounting (PSARC/2009/204)
iSCSI initiator tunables (PSARC/2009/369)
zpool autoexpand property (PSARC/2008/353)
COMSTAR Infiniband SRP Target - PSARC/2009/111
ZFS logbias property (PSARC/2009/423)
Removing a slog doesn't work (6574286)
FCoE (Fibre Channel over Ethernet) Target (PSARC/2008/310
FCoE (Fibre Channel over Ethernet) Initiator (PSARC/2008/311)
Multiple disk sector size support (PSARC/2008/769)
zfs snapshot holds (PSARC/2009/297)
SATA Framework Port Multiplier Support (PSARC/2009/394)
zfs checksum ereport payload additions (PSARC/2009/497)
Solaris needs reparse point support (PSARC 2009/387)
ZFS support for Access Based Enumeration (PSARC/2009/246)
If 'zfs destroy' fails, it can leave a zvol device link missing (6438937)
zpool destruction/export should better handle stale zvol links (6573142)
zpool import with 8500 snapshots took 11hours (6761786)
zfs caching performance problem (6859997)
stat() performance on files on zfs should be improved (6775100)

Network
Solaris gldv3/wifi needs to support 802.11n (6814606)
RBridges: Routing Bridges (PSARC/2007/596)
Solaris Bridging (PSARC/2008/055)
Bridging Updates (PSARC/2009/344)
Clearview IP Tunneling (PSARC/2009/373)
Datalink Administration from Non-Global Zones (PSARC/2009/410)
Solaris Packet Capture (PSARC/2009/232)
Anti-spoofing Link Protection (PSARC/2009/436
flowadm(1m) remote_port flow attribute (PSARC/2009/488)

Other
Boomer: Next Generation Solaris Audio (PSARC/2008/318)
ls enhancements (PSARC/2009/228)
Upgrade NTP to Version 4 (PSARC/2009/244)
Solaris on Extended partition (PSARC/2006/379)
Disk IO PM Enhancement (PSARC/2009/310)
System Management Agent (SMA1.0) migration to Net-SNMP 5.4.1 (LSARC/2008/355)
Upgrade OpenSSL to 0.9.8k (6806386)
Need to synch with newer versions of Xen and associated tools (6849090)
LatencyTOP for OpenSolaris (PSARC/2009/339)
Wireless USB support (PSARC/2007/425)

New drivers:
Atmel AT76C50x USB IEEE 802.11b Wireless Device Driver (PSARC/2009/143)
Ralink RT2700/2800 IEEE802.11 a/b/g/n wireless network device (PSARC/2009/167)
RealTek RTL8187L USB 802.11b/g Wireless Driver - (PSARC/2008/754)
add a VIA Rhine Ethernet driver to Solaris (PSARC/2008/619)
audio1575 driver available
audiocmi driver (PSARC/2009/263)
bfe fast ethernet driver (PSARC/2009/242)
Driver for LSI MPT2.0 compliant SAS controller (PSARC/2008/443)
Atheros AR5416/5418/9280/9281/9285 wireless Driver (PSARC/2009/322)
audiovia97 (PSARC/2009/321)
Myricom 10 Gigabit Ethernet Driver (PSARC/2009/185)
Atheros/Attansic Ethernet Gigbit Ethernet Driver (PSARC/2009/405)
audiols driver (PSARC/2009/385)
audiosbp16x audio driver (PSARC/2009/384)
Marvell Yukon Gigabit Ethernet Driver (PSARC/2009/190)
audiosolo driver (PSARC/2009/487)

Some highlights among the over 200 packages added to the repository:
SUNWwireshark Wireshark - Network protocol analyzer
SUNWparted GNU Parted - Partition Editor
SUNWiperf tool for measuring maximum TCP and UDP bandwidth
SUNWiftop iftop - Display bandwidth usage on an interface
SUNWsnort snort - Network Intrusion Detector
SUNWdosbox DosBox - DOS Emulator
SUNWejabberd ejabberd - Jabber/XMPP instant messaging server
SUNWiozone iozone - a filesystem benchmark tool
SUNWrtorrent rtorrent - a BitTorrent client for ncurses
SUNWsynergy Synergy Mouse/Keyboard sharing
SUNWareca Areca backup utilities

Many Python python modules, languages and OpenJDK 7.0 have also been added.

Tuesday, October 13, 2009

OpenSolaris 2010.03

It looks like OpenSolaris 2010.02 have had a slight adjustment in schedule and will now be known as OpenSolaris 2010.03. The target build for this release is now 135 instead of previously build 132. You can read whats available so far here.

Monday, October 12, 2009

Oracle OpenWorld keynote

Ben Rockwood is attending Oracle OpenWorld and have written a good summary of Scotts and Larrys keynote in his blog.

There are indeed some signs that something good might come out of the acquisition after all.

Wednesday, October 7, 2009

Solaris 10 10/09 (Update 8) available

Solaris 10 10/09 is now available for download but it is yet to be announced and have all download links updated.

Anyway, here is a working link. New documentation is also available on docs.sun.com, including the What's New in the Solaris 10 10/09 Release.

The changes are pretty much in line with my earlier predictions.

Update: Joerg Moellenkamp has nice summary of what's new.

Tuesday, September 29, 2009

ZFS keynote available!

The keynote that Jeff and Bill held at the kernel conference in Australia is finally available online.

They talk about new cool features of ZFS such as BP rewrite, pool recovery, deduplication and encryption.

Its a joy to listen to people who really know what they are talking about. I had the opportunity to meet Jeff this spring and i saw his face light up the moment i begun to ask technical questions and he could drop the standard marketing and talk talk about ZFS internals instead ;)

Monday, September 14, 2009

Zpool recovery support (PSARC/2009/479)

Last winter there was a flood of mail on zfs-discuss ignited by this. The user had lost his pool that was residing on a USB device. The used had abused ZFS and pulled the disks without exporting the pool, but the pool should still be usable since ZFS claims to have a always consistent on-disk format. To make a long story short, ZFS base for its consistent disk format is that write-flush is honored when doing the atomic operation of updating the überblock. If the hardware lies about the success of this operation and the disk(s) are disconnected or a power failure occurs the pool can be left in an inconsistent state. This typically only happens with cheaper customer grade hardware like USB-disk, but nevertheless this kind of hardware is used, and remember ZFS loves cheep disk ;)

In the the end Jeff Bonwick himself came in and stated that this was something he was working on, and it seems like this work is finally making its way towards Opensolaris with PSARC 2009/479 being discussed at this weeks ARC meeting (BugID 6667683). This change will make it possible to revert back to the last TXG if the pool is in a inconsistent state by using the new "-F" flag to zpool. I've myself never encountered this problem, but it's always good to have options to recover if something goes wrong. This is probably not an issue for people using ZFS in the enterprise, but if you are out of luck and are using a USB stick/driver for storing your zpool, this might come in handy.

Thursday, September 10, 2009

Flirting with big red

After the good news of Oracles commitment to SPARC© and Solaris, is not late to show some substance of their Oracle commitment ;)

changeset: 10489:180acaca223b
user: agiri
date: Thu Sep 10 11:08:49 2009 -0700
summary: 6879323 RDS fixes for Oracle RAC 11gR1

Thursday, August 20, 2009

Long time no build

It's been quite some time without a build of SXCE that has been released outside of Sun. But SXCE build 121 is now available for download, the links are not updated yet but you can get it here.

It looks like there will be a preview build of OpenSolaris 2010.02 available later this week which will also be based on snv_121.

Among other things this build have LatencyTOP included and xVM is based on XEN 3.3.

Monday, August 17, 2009

Solaris 10 Update 8 (10/09) is coming

Another release of our workhorse Solaris 10 is near completion. I have dug up some of the enhancements from what's available outside of SWAN. There seems to be quite a bit of new stuff going into this release:
  • Turbo-charging SVr4 packaging (speed enhancements for install/patch/upgrade)
  • Zone parallel patching (already available by patch 119254-66)
  • Interrupt resource management
  • ZFS L2ARC
  • ZFS user/group quota
  • Wicked fast memstat
  • Extended VTOC (for booting from > 1TB disks)
  • NTP version 4
  • FMA support for intel Core i7
  • MPT driver enhancements
This will also be the first release that will be transferable into a S10 branded container in Solaris.next.

All this is of course my own educated guesses based on what i could find outside of Sun, but I would be surprised if it is much different from what will be released.

Monday, August 3, 2009

OpenSolaris text based installer

After six weeks four interns have created a prototype of a text based installer for OpenSolaris. It's currently only for the X86 architecture, but it will probably be ported to SPARC® soon, it is planed to be included in the 2009.02 release.

The video presentation.

And I managed to find the ISO they created:
http://dlc.sun.com/osol/install/downloads/text_install/kingston/OpenSolaris-TextInstaller-2009.06.iso

Wednesday, June 3, 2009

Virtualization in 2009.06 (Part 2, VirtualBox)

To continue the "getting started" tutorials on virtualization on OSOL 2009.06 we will take a look at VirtualBox. VirtualBox might be a good Solution for remote desktop installations, fully virtualized machines with graphical console over the network. As with the previous posts this one will be focused on how to deploy it on your server. If you run virtualbox on your desktop it is probably easier to use the supplied GUI, which is on pair with the VMWare workstation GUI.

VirtualBox only supports full virtualization, so the performance will not be as good as a paravirtualized xVM domain or Solaris zone, but it works good for graphical access to virtual machines over network with lighter workloads.

Install VirtualBox

VirtualBox is not hosted in the standard OSOL repositories, it is in the extra repository (along with flash, JavaFX SDK and others). There is some minor hassle to get access to this repository, you will ned to register an account with sun then download and install certificates. You can register here and download the certificate, there are install instructions on the site, but here is the procedure:
$ pfexec mkdir -m 0755 -p /var/pkg/ssl
$ pfexec cp -i ~/Desktop/OpenSolaris_extras.key.pem /var/pkg/ssl
$ pfexec cp -i ~/Desktop/OpenSolaris_extras.certificate.pem /var/pkg/ssl
$ pfexec pkg set-authority \
-k /var/pkg/ssl/OpenSolaris_extras.key.pem \
-c /var/pkg/ssl/OpenSolaris_extras.certificate.pem \
-O https://pkg.sun.com/opensolaris/extra/ extra
Now install VirtualBox:
$ pfexec pkg install virtualbox virtualbox/kernel
Create virtual machines

Create and install a virtual machine:
$ cd /opt/VirtualBox
$ VBoxManage createvm --name "WinXP" --register
$ VBoxManage modifyvm "WinXP" --memory "512MB" \
--acpi on --boot1 dvd --nic1 nat
$ VBoxManage createhd --filename "WinXP.vdi" \
--size 5000 --remember
$ VBoxManage modifyvm "WinXP" --hda "WinXP.vdi"
$ VBoxManage controlvm "WinXP" dvdattach \
/path_to_dvd/winxp.iso
$ VBoxHeadLess --startvm "WinXP"

Connect to the server with an RDP client and perform the installation. There are free RDP Clients available for most Operating systems:
Solaris/OpenSolaris/*BSD: rdekstop (pkg install SUNWrdesktop)
MacOS X: Remote Desktop Connection
After the installation is done, the guest can be controlled with VBoxManage. VBoxManage can control all aspects of the guest, snapshots, power, USB sound. Here are a few basic commands:
Start: VBoxHeadLess -s winxp
Poweroff: VboxManage controlvm WinXP poweroff
Reset: VboxManage controlvm WinXP reset
If you want to install the VirtualBox Guest additions, just inject the DVD:

$ VBoxManage controlvm "WinXP" dvdattach \
/opt/VirtualBox/additions/VBoxGuestAdditions.iso


Tuesday, June 2, 2009

Virtualization in 2009.06 (Part 1, xVM)

Install xVM

It seems that i have started a little tutorial trail for working with OSOL 2009.06. A few of my friends are about to install this release, so i thought i might as well make it a few blog entries, there are probably other people out there that want some help to get a quick start of OSOL 2009.06.

This will be the first entry about virtualization, first we get xVM running, later entries will describe some basic setup of Solaris zones and VirtualBox.

Install the xVM packages:
pfexec pkg install xvm-gui SUNWvdisk
Edit /rpool/boot/grub/menu.lst, copy your current entry and modify it to something similar to this:
title OpenSolaris 2009.06 xVM
findroot (pool_rpool,1,a)
bootfs rpool/ROOT/opensolaris
kernel$ /boot/$ISADIR/xen.gz
module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=text
module$ /platform/i86pc/$ISADIR/boot_archive
Reboot into this grub entry and if everything works set this as your default boot entry:
$ bootadm list-menu
the location for the active GRUB menu is: /rpool/boot/grub/menu.lst
default 0
timeout 30
0 OpenSolaris 2009.06
1 OpenSolaris 2009.06 xVM

$ bootadm set-menu default=1
Enable the xVM services

If you want to be able to connect to VNC over the network to perform the installation, make xen listen on external addresses for VNC connections:
$ pfexec svccfg -s xvm/xend setprop config/vnc-listen = astring: \"0.0.0.0\"
Enable the xVM services and set a password for VNC connections:
$ pfexec svccfg -s xvm/xend setprop config/vncpasswd = astring \"yourpass\"
$ pfexec svcadm refresh xvm/xend
$ pfexec svcadm enable -r xvm/virtd
$ pfexec svcadm enable -r xvm/domains
( Ignore messages about multiple instances for dependencies )

Installing domains

Now we should be able to create our domU instances, First we create a zvol to be used as disk for the domU:
$ pfexec zfs create -o compression=on -V 10G zpool01/myzvol
Install a paravirtualized domain e.g. OSOL 2009.06:
$ pfexec virt-install --nographics -p -r 1024 -n osol0906 -f /dev/zvol/zpool01/myzvol -l /zpool01/dump/osol-0906-x86.iso
Connect to the console and answer the language questions:

$ pfexec xm console osol0906

Back on dom0 get the address, port and password for the VNC console for the OSOL installation, first get the domain id:

$ pfexec virsh domid osol0906

Get the address, port and password using the domain id:
$ pfexec /usr/lib/xen/bin/xenstore-read /local/domain//ipaddr/0
$ pfexec /usr/lib/xen/bin/xenstore-read /local/domain//guest/vnc/port
$ pfexec /usr/lib/xen/bin/xenstore-read /local/domain//guest/vnc/passwd

Connect with a VNC client to address and port, authenticate using the password.

Install an OS without support for paravirtualization, e.g. Windows:
$ pfexec virt-install -v --vnc -n windows -r 1024 -f /dev/zvol/dsk/zpool01/myzvol -c /zpool01/dump/windows.iso --os-type=windows
Connect to the xVM VNC console using the password provided earlier with svccfg/vncpasswd.

When installation is done domains can be listed with xm list and started with xm start.

Monday, June 1, 2009

OpenSolaris 2009.06 quick guide

OpenSolaris 2009.06 was released today! I have written a very quick guide for customizing and adding some basic services to an OSOL 2009.06 server from the shell.

Package operations

Install the storage-nas cluster (CIFS,iSCSI, NDMP etc.)
$ pfexec pkg install storage-nas
Add compilers (sunstudio, it can be replaced with e.g. gcc-dev-4)
$ pfexec pkg install sunstudio
Add the contrib repository for contributed packages
$ pfexec pkg set-publisher -O http://pkg.opensolaris.org/contrib contrib
Other packages that can be of interest:
SUNWmysql51, ruby-dev, SUNWPython26, SUNWapch22m-dtrace, amp-dev, gcc-dev-4

List available and installed packages with search string
$ pkg list -a SUNWgzip
NAME (PUBLISHER) VERSION STATE UFIX
SUNWgzip 1.3.5-0.111 installed ----
$ pkg list -a '*Python26*'
NAME (PUBLISHER) VERSION STATE UFIX
SUNWPython26 2.6.1-0.111 known ----
SUNWPython26-extra 0.5.11-0.111 known ----
Sharing

Create a ZFS filesystem with compression enabled
$ pfexec zfs create -o compression=on rpool/export/share
Share with NFS

Enable NFS service:
$ pfexec svcadm enable -r nfs/server
Enable sharing over NFS for the share filesystem
$ zfs set sharenfs=on rpool/export/share
Share with CIFS
$ pfexec svcadm enable smb/server
$ pfexec zfs set sharesmb=on rpool/export/share
$ pfexec zfs set sharesmb=name=mysharename rpool/export/share
To enable users to access the CIFS share add the following line to /etc/pam.conf and reset users pw with passwd(1):
other password required pam_smb_passwd.so.1 nowarn
Enable auto ZFS-snapshots

Disable snapshots globally for the whole pool:
$ pfexec zfs set com.sun:auto-snapshot=false rpool
Enable snapshots for the share:
$ pfexec zfs set com.sun:auto-snapshot=true rpool/export/share
Enable daily snapshots (can be frequent, hourly, daily, weekly or monthly):
$ pfexec svcsadm enable auto-snapshot:daily
List snapshots:
$ zfs list -t snapshot

If you are unfamiliar with Solaris, read the manual pages for the following commands:
prstat, fsstat, pkg, powertop, zfs, zpool, sharemgr, ipfilter, dladm, fmdump

NOTE: There are graphical options for snapshot setup and the package manager that can be used from graphical console, VNC or forwarded X. Launch them with with "time-slider-setup" or "packagemanager".

Thursday, April 30, 2009

Solaris 10 Update 7

Solaris 10 05/09 was has been released, which probably is the last release from an independent Sun.

Noteworthy changes are:

SSH with PKCS#11 support which means use hardware acceleration for SSH on UltraSPARC T2 and Sun Crypto Accelerator 6000.

Improved power management for Intel CPU (T-State, Power aware dispatcher and deep C-state)

Enhanced observability for intel CPU ( performance counters, Nahalem turbo mode)

Major iSCSI improvements and bug fixes.

IPSec improvements (SMF service, stronger algorithms)

Support for backing out patches with update on attach for zones.

Additional network drivers (NetXen 10, hyper GigE, Intel ICHI 10 and Hartwell)

And lots of bug fixes...

The complete document: Solaris 10 5/09 What's New

Monday, April 20, 2009

The end of the world, as we know it?

I guess you all heard by now that Oracle is buying Sun. I can only hope that Oracle will continue to invest in technologies such as OpenSolaris, VirtualBox, ZFS/OpenStorage and SPARC(R) Rock.

I'm not quite sure exactly what Oracle is after, but at least they can stop developing their own filesystem now that they have ZFS and Java is probably something Oracle would like to have more control over. It must also be convenient to get that little free database, they already have a quite complex licensing model, ready to be applied to another engine ;)

I guess it much better for Solaris than if IBM was the buyer, Oracle have had Solaris as their preferred platform for a long time and doesn't have any own hardware or Unix flavor. We have to hope that all the great minds of Sun stays with the ship and gets to continue their work.

At least some good news has come out of this:

"In our opinion, Sun Solaris is by far the best Unix operating system in the business. With the acquisition of Sun, we will be able to uniquely integrate Sun Solaris and the Oracle database." -Larry Ellison

Perhaps this could be the beginning of a beautiful friendship?

Friday, April 3, 2009

Zones and Parallel Patching

Jeff Victor has a interesting entry on his blog regarding performance of zone patching with soon to be released parallel patching and SSDs compared to serial patching and spinning rust. It's nice to finally see a near time solution to this since it has been a problem for quite a while.

In short he was able to speed up the patching of 16 zones five times with SSDs and parallel patching and 3 times with HDDs and parallel patching:

Tuesday, March 10, 2009

Bug Hunting

The last few weeks I have been a good Solaris citizen and reported several bugs against Solaris Nevada: (I have provided the links if the CR is available online)

6803634: virt-install panics in module mac with Jumbo frames enabled
This bug made the system panic when installing a domU which used a network with jumbo frames enabled, it has now been fixed.

6805659: Phantom volume in ZFS pool
This bug causes phantom device links to be left after a zvol has been destroyed.

6815540: Live upgrade is too picky with datasets in non root pools
This is related to live upgrade, live upgrade keeps track of all filesystems used on a host, if any one of them have been removed lumount and ludelete stops working. If you want to delete your old boot environment a few weeks after upgrade and any filesystem has been removed, it will fail before this is fixed.

6815701: snv_109 hangs in boot with SATA enabled on GeForce 8200
My storage node at home hangs when booting snv_109 and SATA is enabled in the BIOS.

As you probably know, Solaris Nevada is the development branch of Solaris. It's not production software so when you use it your are part of the test process.

Friday, March 6, 2009

Upcoming ZFS features

ZFS is constantly evolving and some upcoming things have caught my attention.

The first one is mentioned in a blog entry by Matthew Ahren which deals with the implementation of the new scrub code. The new code fixes the issue that has forced scrubs to restart every time a snapshot is taken. But more interestingly he mentions that this lays the fundament for what is arguably the most requested ZFS feature of all, vdev removal or pool shrinking! This will probably also pave the way for block rewrite so that existing data can be rewritten with current dataset settings such as compression or block size.

The second one is CR 6667683: "need a way to select an uberblock from a previous txg". This will add the ability to fall back to an earlier überblock in case of such a serious error that the pool have become unusable. This problem was highlighted in a long discussion on zfs-discuss. Jeff begun to work on this problem and he stated the timeframe for a fix as "weeks, not months" in early february. In the best of worlds this should never be an issue since ZFS is designed to have a always consistent on-disk format, and I think is very rare but still it can happen with badly behaved hardware which ignores flush-cache commands while stating otherwise.

The third one have had a long time coming, it's the dataset encryption project that has been pushed back even further due to other changes in ZFS. The current target is now snv_120 (July) which means it will not make it into OSOL 2009.06.

I also noticed that user/group quotas seems to be coming in a near future, CR 6501037 "want user/group quotas on ZFS" has been updated to be fixed in snv_113.