Wednesday, July 11, 2012

Sidestep bugs by running multiple versions of BusyBox on ESXi 5

I recently discovered that ESXi 5 uses BusyBox 1.9.1 which includes a bug in nohup.  Christoph Gysin identified the issue and released a patch 4 years ago:

/bin # nohup

BusyBox v1.19.3 (2012-01-13 00:47:40 UTC) multi-call binary.

Usage: nohup PROG ARGS

Run PROG immune to hangups, with output to a non-tty

Alas, I don't want to straight out replace my busybox on my VMware server, as I have no idea what else will break.

Unfortunately I really needed nohup to work properly, as my scripts would be running for too  long to trust that my session wouldn't close.   Furthermore, walking over to the machine and plugging in a monitor to access the console sounded like a hassle.  I could probably have just written it into a script and kicked it off with cron, but in the end, I decided to fix nohup.

The first thing I tried was scp over my version of nohup from coreutils to the ESXi box.  Alas, the whole idea of busybox is to simply the myrid of prerequisites that a modern linux system requires:

/bin # nohup2

nohup2: /lib64/ version `GLIBC_2.14' not found (required by nohup2)

But the same idea with busybox works great.

/bin # busybox-1.19.3 nohup

BusyBox v1.19.3 (2012-01-13 00:47:40 UTC) multi-call binary.

Usage: nohup PROG ARGS

Run PROG immune to hangups, with output to a non-tty

You can see the broken and working versions here:

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # /bin/busybox nohup esxtop -b -d 3 -n 28800 >output1.csv &

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # nohup: appending to nohup.out

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # /bin/busybox-1.19.3 nohup esxtop -b -d 3 -n 28800 >output2.csv & 

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # ls -l

-rw-r--r--    1 root     root                  0 Jul 11 15:53 output1.csv

-rw-r--r--    1 root     root             159368 Jul 11 15:54 output2.csv

Now I just changed the sym link for nohup

/bin # rm nohup

/bin # ln -s /bin/busybox-1.19.3 nohup

/bin # ls -l nohup

lrwxrwxrwx    1 root     root                 19 Jul 11 15:55 nohup -> /bin/busybox-1.19.3

And we're in business with a working nohup, and everything else stock.

Saturday, March 10, 2012

Multiboot ESXi 5, Windows 2008 R2, RHEL 6, XenServer 6

Congratulations, you decided to be the ultimate hypervisor egalitarian and install every mainstream hypervisor + KVM onto a single server.  You setup Windows 2000 and Redhat to dual boot 11 years ago, how much can things have changed?

I could have accomplished this project via constantly swapping harddrives when I needed to change hypervisors, but that seemed more like accepting the problem than a solution.

My initial plan was to divide my six harddrives into 4 logical volumes at the raid controller level.  Unfortunately the raid controller for servers I was using (I'm going to refrain from plugging any one vendor, at least as long as I'm operating on other people's hardware) doesn't provide this functionality.  For a brief moment I flirted with the idea of cutting up the disk using partitions, until a closer look at ESXi and Xen's installation revealed they don't provide any granularity in setup of the partition tables beyond selecting an installation volume.

Instead of telling this story as it happened, I'm going to first go over the hiccups I encountered on the way.  That way, incase you're like me and already started this project before doing your research, you can plan a bit before reading the whole article.

VMware Boot Bank Corruption
Not sure how I accomplished this on the first server, because I couldn't replicate it on the second.  I assumed it was because Microsoft was automounting the VMware Boot Banks (they are formatted fat32).  Anywho, for quite a long time I was stuck with the error "Not a VMware Boot Bank".  VMware provides very little information on this error (and not applicable to my situation).  Unfortunately VMware also doesn't seem to provide a way to fix a corrupted Boot Bank, so I was forced to reinstall.  Afterwards I ran this command to prevent Windows from auto-mounting these volumes:

automount disable

Windows installing onto the wrong drives
While Microsoft is nice enough to ask you which drive you'd like to install Windows on, that doesn't mean it'll listen.  Unfortunately while it will install your C: drive to the selected partition, it will install the 100MB System Reserved partition which it boots from to where it sees fit.   On a system with lots of unformatted drives like mine, and where I was installing Windows to not the first volume, it chose to put the system reserved partition and the MBR for windows on the first volume set, ensuring that it would be booted automatically.  I ended up fixing this by only leaving in the disks for Windows during it's installation process, then inserting the other disks after it was done.

XenServer is scared of VMFS volumes
Citrix, it's ok to be scared, but you need to face your fears.  Every time I booted the Xen installer it would die when it got the point of scanning the drives with the error "Could not parse Sgdisk".  From what I found via a quick google, this is due to an inability of the Xen 6 installer to handle VMFS volumes.  Citrix, this is no way to convert followers away from VMware.  I resolved this by making sure the disk I was installing Xen to was formatted and pulling the other disks out of the server during the installation of XenServer.

XenServer is scared of GPT volumes
Not sure if this was just from me, or why this was occuring, but I had to follow these instructions to get past a GPT error which occurred during the installation of Xen 6:

1. Boot from the XenServer 6.0.0 install CDROM.
2. At the Xen 6.0.0 install prompt type menu.c32 and press enter.
3. When you get the new menu screen press TAB
4. add disable-gpt just before the last --- then press enter.

BCDedit won't modify the configuration if it's not the primary disk
I promoted this to its own blog post.

The actual project
For these four hypervisors, there are three different boot loaders used, and two major versions of one of those boot loaders:

Windows 2008 R2Bootmgr
RedHat Enterprise LinuxGrub
VMware ESXi 5Syslinux 3.86
Citrix XenServer 6Syslinux 4.04

My original goal was to use Grub with RHEL 6 to load everything else.  This turned out to be a much simpler task than the hours of time I burned on it made it appear.  I thought the errors I was receiving in ESXi due to a corrupted boot bank were from some problem with Grub chainloading Syslinux, turns out i had really just corrupted my boot bank.

Boot everything from Grub
This turned out to be rather simple due to support for something called "chainloading".  My final working grub configuration is:

# grub.conf generated by anaconda
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_ccskvmvmh01-lv_root
#          initrd /initrd-[generic-]version.img
title Red Hat Enterprise Linux (2.6.32-220.el6.x86_64)
 root (hd0,0)
 kernel /vmlinuz-2.6.32-220.el6.x86_64 ro root=/dev/mapper/vg_ccskvmvmh01-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_NO_DM rd_LVM_LV=vg_ccskvmvmh01/lv_root rd_LVM_LV=vg_ccskvmvmh01/lv_swap
 initrd /initramfs-2.6.32-220.el6.x86_64.img
title Windows 2008 R2
 rootnoverify (hd1)
 chainloader +1

title ESXi 5
 rootnoverify (hd2)
 chainloader +1

title XenServer 6
 rootnoverify (hd3)
 chainloader +1

Grub simply hands off booting to the MBR on the respective drives, and the boot loaders for the other hypervisors takes it from there.

Of course since this didn't initially work for me, I went through all the additional effort to also make this for from ESXi's and Xen's Syslinux to boot everything, and I made an effort (but did not finish) making Bootmgr boot everything else.

Boot everything from ESXi 5 (Syslinux 3.86)
This works pretty much the same way, with two major caveats.  First, VMware was not nice enough to ship menu.c32 or chain.c32 with the installation of ESXi 5.  Therefore you need to go here, download Syslinux 3.86 and extract those two files.

Second, ESXi doesn't mount it's first partition, so you can't do any of this from within ESXi's console.  I'd recommend a linux boot disk, or if you chose to dual boot with RHEL 6, you can mount it from there.  You'll know you have the right partition if you see the files:


Copy chain.c32 and menu.c32 into this partition.  Now backup syslinux.cfg and replace it with the following:

ui menu.c32

label esxi
menu label ESXi 5
COM32 safeboot.c32

label win
menu label Windows 2008 R2
COM32 chain.c32

label xen
menu label XenServer 6
COM32 chain.c32

label linux
menu label RedHat Enterprise 6
COM32 chain.c32
Pretty simple.

Boot everything from Xen 6 (Syslinux 4.04)
Citrix was nice enough to include both menu.c32 and chain.c32 in their boot partition.  I didn't check to see if they mounted this within their OS or not, and simply edited it out of band anyways.  Since I was already done with most everything at this point, I didn't change it to use the menu config but instead left it with requiring you to enter the label at the boot prompt.

The file you want to edit this time is extlinux.conf.  Again, I suggest making a backup first:

# location mbr
serial 0 115200
default xe
prompt 1
timeout 50

label xe
  # XenServer
  kernel mboot.c32
  append /boot/xen.gz dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M console= vga=mode-0x0311 --- /boot/vmlinuz-2.6-xen root=LABEL=root-ckxrtmdk ro xencons=hvc console=hvc0 console=tty0 quiet vga=785 splash --- /boot/initrd-2.6-xen.img

label xe-serial
  # XenServer (Serial)
  kernel mboot.c32
  append /boot/xen.gz com1=115200,8n1 console=com1,vga dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz-2.6-xen root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-2.6-xen.img

label safe
  # XenServer in Safe Mode
  kernel mboot.c32
  append /boot/xen.gz nosmp noreboot noirqbalance acpi=off noapic dom0_mem=752M com1=115200,8n1 console=com1,vga --- /boot/vmlinuz-2.6-xen nousb root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-2.6-xen.img

label fallback
  # XenServer (Xen 4.1.1 / Linux
  kernel mboot.c32
  append /boot/xen-4.1.1.gz dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz- root=LABEL=root-ckxrtmdk ro xencons=hvc console=hvc0 console=tty0 --- /boot/initrd-

label fallback-serial
  # XenServer (Serial, Xen 4.1.1 / Linux
  kernel mboot.c32
  append /boot/xen-4.1.1.gz com1=115200,8n1 console=com1,vga dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz- root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-

label redhat
com32 chain.c32
append hd0

label win
com32 chain.c32
append hd1

label esxi
com32 chain.c32
append hd2

Boot everything from Bootmgr
First note, I never actually go this working.  Booting Grub from Bootmgr is easy.  I'm sure it's possible somehow to boot Syslinux from Bootmgr, but after accomplishing the above tasks this seemed like a trivial exercise.  Still, I figured I'd list what I did.

First I pulled the mbr off each of the disks:

dd if=/dev/sda of=/tmp/linux.bin bs=512 count=1
dd if=/dev/sdc of=/tmp/syslinux.bin bs=512 count=1
dd if=/dev/sdd of=/tmp/xen.bin bs=512 count=1

Then I copied the files I created to the root of the C: drive for Windows.  Then within BCD edit I ran:

bcdedit /create /d “Linux” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \linux.bin
bcdedit /displayorder {ID} /addlast

bcdedit /create /d “ESXi 5” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \syslinux.bin
bcdedit /displayorder {ID} /addlast

bcdedit /create /d “Xen 6” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \xen.bin
bcdedit /displayorder {ID} /addlast

bcdedit /timeout 30
{ID} should be replaced in each entry here by the ID which BCDedit gives you for that entry after you run the /create command.   When I was finished the configuration looked like this:

Windows Boot Manager
identifier              {bootmgr}
device                  partition=E:
description             Windows Boot Manager
locale                  en-US
inherit                 {globalsettings}
default                 {default}
resumeobject            {69db61ae-5dc5-11e1-8bc1-bb2a8301fdf5}
displayorder            {default}
toolsdisplayorder       {memdiag}
timeout                 10

Windows Boot Loader
identifier              {default}
device                  partition=C:
path                    \Windows\system32\winload.exe
description             Windows Server 2008 R2
locale                  en-US
inherit                 {bootloadersettings}
recoverysequence        {69db61b0-5dc5-11e1-8bc1-bb2a8301fdf5}
recoveryenabled         Yes
osdevice                partition=C:
systemroot              \Windows
resumeobject            {69db61ae-5dc5-11e1-8bc1-bb2a8301fdf5}
nx                      OptOut

Real-mode Boot Sector
identifier              {53d01ece-5e3e-11e1-94cf-9e5144662cd0}
device                  partition=C:
path                    \syslinux.bin
description             ESXi

Real-mode Boot Sector
identifier              {657ad556-5e3e-11e1-94cf-9e5144662cd0}
device                  partition=C:
path                    \linux.bin
description             RedHat

Real-mode Boot Sector
identifier              {c4a9ea4a-6a30-11e1-8e1a-f59982b714d6}
device                  partition=C:
path                    \xen.bin
description             Xen

This seemed to boot Grub just fine, but everything else gave an error about no operating system found. I have not spent more time to troubleshoot it.

End Result
Everything available from Grub.   I know it's cheesy to take a picture of a monitor, but until someone gets me some test boxes with out of band management (hint hint, Cisco and/or Fujitsu), this is how it's going to be.

Credit: Information for this solution was gathered from numerous websites in addition to independent research.  In addition to the websites listed above, some of the others which provided instrumental information are:

Use BCDedit when Windows is not the primary disk

Congratulations, you've decided to use some other boot loader to as your primary and simply chainload bootmgr for Windows.  Unfortunately you still want to modify the configuration of bootmgr, but BCDedit doesn't do work unless you admit it's god of your boot process.

BCDedit by default opens the configuration for the very first disk in the system.  If that doesn't happen to be Windows, or the Windows installation you want, then it just won't open it.  To get around this you need to mount the system reserved partition which windows creates for the BCD configuration to live on, and then specify that you want to modify that store in your configurations.

First, under disk management, find the 100MB system reserved partition and give it a drive letter.

The BCD configuration will be at (drive letter):\boot\bcd.  To specify this in BCDedit use the following syntax:

bcdedit /store (drive letter):\boot\bcd (your command)

Credit: Information for this was gathered from numerous websites in addition to independent research.  This project came up as a component of my plan to run ESXi 5, Windows 2008 R2, RHEL 6 and XenServer 6 all on one box.  Booting ESXi and Xen (both which use syslinux) did NOT work, despite being referenced in the image above.  Further details can be found in the blog post on that topic.

Sunday, February 26, 2012

RD Web Access + RD Gateway + Multiple ISPs

Congratulations, you've successfully implemented a RD Web Access + RD Gateway solution.  Perhaps  you're trying to emulate Remote Web Workplace or Citrix XenApp.  The accolades of your peers continued until that day when the primary internet line went down for 8 hours.  Suddenly you've become the guy who implemented a business critical solution with a single point of failure in Baby Bell.

The problem, while RD Web Access (TS Web Access in the non-R2 version of 2008) will give you a warning about non-matching certificate which you can ignore, RD Gateway (TS Gateway in non-R2) simply fails.  Changing DNS records to point to the secondary isp could take hours, and manually programming the DNS entry in every client's computer would likely take just as long (not to mention prevent you from failing back).

Microsoft doesn't seem to provide any obvious answer for this situation.  However, the need for a certificate to work from multiple urls is not new.  Star certs and UCC certs provide exactly this functionality.  Furthermore there are millions of IIS and Apache servers out there which have successfully implemented multi-tenanted solutions which require the ability to determine which url the end user requested to provide the correct webpage.

The first step is go acquire a UCC or Star cert for the domain.  Since it's still one site in IIS, it can't handle providing a different certificate based on each domain.  If you do choose a UCC cert, I suggest not just putting one alternative name on there, but filling up whatever the allotment is for your price point with generic names.  Trust me, you'll need a certificate again at some point, and wouldn't it be handy if you didn't need to go get another purchase approved?

When you install the certificate, make sure to do it both through the role services.  You'll need to specify the certificate in IIS for RD Web Access and in under the RD Gateway Role settings.  See the information here for setting this certificate for RD Gateway:

Now RD Web Access and RD Gateway should work fine on the primary internet line, and they will appear to work from the backup internet line, IF the primary internet line is up, but it won't work without the primary internet line.  The reason is because the DNS entry for the RD Gateway server is hard coded into IIS.  Even though you connected to the RD Web Access on the backup isp, the RD Gateway session will be initiated on the primary isp because that entry is specified as the DefaultTSGateway

You can see more detailed instructions on how to set this setting here:

However this only provides for a single hard-coded entry.  The trick to give us support for multiple isps simultaneously is to modify the underlying asp coding.  Open IIS, browse to Default Web Site -> RDWeb -> Pages -> "en-us" and select "Explore" from the action pane on the right.  Finally edit the file "Desktops.aspx"

Find this line:
DefaultTSGateway = ConfigurationManager.AppSettings[“DefaultTSGateway”].ToString();

And change it to:
DefaultTSGateway = Request.ServerVariables[“SERVER_NAME”].ToString();

This tells IIS to use the name of the server as requested from the client.  As long as RD Web Access and RD Gateway are running on the same server, this should be correct.

These instructions vary slightly for 2008 non-R2, but should be close enough to still follow.  The skinny is that everything is named TS instead of Remote Desktop and the IIS folder is TS instead of RDWeb.

In addition to the websites listed above in this article, information was gathered from numerous other unmentioned websites.  This question, and the solution, were originally posted by myself in the Microsoft Partner Forums.  You can find the original thread here:

Thursday, February 23, 2012

Restore from formatted offline files database

Congratulations, you fixed offline files syncing at the expense of the last six months of work for a company executive.  They've given you an  hour to restore their files or to pack up your stuff.  Your frantic internet searches give you some false hopes, but you start to realize your screwed.

Not quite yet!  Highly Unsupported has one last option for you that might just save you from having to get your interview suit dry cleaned.

Windows stores offline files in the folder: C:\Windows\CSC.  The folder is locked down to prevent access from any interactive user.  However if your files are still in there, you can follow numerous instructions online to simply take control of this folder and browse to the files you need.  Or alternatively you can use the psexec method below to hack into it without needing to forcibly take control.

But if you're gone through the trouble of formatting the offline files database, your files won't be there anyways. Your one last hope is one feature Microsoft added with Windows ME, System Restore.  It's never done me any good to actually fix a broken operating system, but it can save you now.

First off, from the afflicted computer, run "vssadmin list shadows" to see if you even have any restore points.

If you have some restore points from before you blew away the offline files cache, and after they created the files, you are in business.

You'll need volrest.exe and psexec for these next steps.  volrest.exe comes from the Windows Server 2003 Resource Kit Tools.  Don't worry though, it'll install and work just fine on Windows 7 (just ignore the incompatibility error).  You can download the Resource Kit here:

Grab psexec off .  If you don't know what PSEXEC is you should spend sometime to find out after you've saved your job.

The reason we need psexec is because while you could take ownership of the CSC folder which is presently in the operating system, you have no way to take control of the one inside of the restore point.  Microsoft, trying to provide security through obscurity, doesn't let you restore folders you don't have access to, but as administrator that's merely a hurtle.  Run this command to create a cmd window running as nt authority\system

psexec.exe -i -s -d cmd

In XP you could have used the "at" command with the /interactive flag to have accomplished this same thing, but again, Microsoft made it slightly more difficult for "security".

Now the fun part.  Use volrest to restore the CSC directory.  Volrest only works with UNC paths, but that's not an issue, the administrative share provides you with the unc path you need.

volrest \\localhost\c$\windows\CSC /s /e /sct /r:C:\temp\directory

This will restore a copy of every file under the path C:\Windows\CSC for every restore point which has those files in it.  If you have a lot of restore points, you could end up with a lot of files.  The /sct flag date stamps all the files, so you can quickly sort out which is the newest.

Now, copy the files back to the proper locations and make sure the offline files sync is working properly.

Information for this solution was collected from numerous websites to generate a complete solution.  Special thanks also needs to be extended to Jim Banach whom not only created this problem in the first place, but was the primary force in discovering this solution.

Wednesday, February 22, 2012

Ode to the Blog

Congratulations, via a matter of hapless clicks, you've discovered your browser inexplicably displaying this web page.  Let me be the first to assure you that your back button is fully operational, and escape is just a few panicked clicks away at any time.

Being as this is my inaugural post, I suppose I should provide some context concerning my decision to begin a blog.  Though, I think possibly it's better to start with why I've waited so long to begin.  The rampant narcissist teenage angst which pervaded my and many of my friend's LiveJournals left me with a poor perspective on blogs as a communication mechanism.

A decade later my natural aptitude for problem solving lead me into a career as a fixer of problems, specifically IT issues.  The complexity and ever changing nature appeals to the side of me which craves challenges, while the constant deluge of catastrophe provides plenty of opportunities to play hero.  In this process, I've spent countless hours on the internet, exploiting google to replace the need for me to actually be knowledgeable on a subject. Unfortunately, over and over again I found someone whom posted an issue almost identical to mine, only to follow it up with "Thanks, I fixed it", if they followed up at all.

And I realized, I'm one of the worst offenders.  On many an occasion I've found something after countless hours of putting together bits and pieces of poor documentation, or went off on original research.  However soon as I finished, I made no attempt to record the answer for posterity.

That doesn't mean I'm restricting this blog purely to paying my dues to the collective knowledge.  I expect I'll throw in some additional material with questionable correlation to the central theme based on my whims at that moment.  Hopefully it'll all be interesting, but I can assure you that it will all be highly unsupported.