Thursday, December 8, 2016

Hiera hierarchies and the custom facts everyone needs

It's been awhile since I've updated this blog, but my tech journey has never ended.  For the last two years I've been working as a Senior Technical Solutions Engineer for Puppet covering the Silicon Valley and advocating for DevOps and IT Automation.  I also haven't been absent from blogging in that time, I've just been forunate enough to have my posts on the official Puppet Blog.  I have decided however to begin cross posting here to have a single collection of my digital work

Originally published at https://puppet.com/blog/hiera-hierarchies-and-custom-facts-everyone-needs on 8 July 2016

---

NOTE: This article is targeted at versions of Puppet which utilize Hiera 3 or higher and Facter 3 or higher — generally Puppet 4, or Puppet Enterprise 2015.2 and later versions. Some of this information will also apply to older versions.

For the last year and a half, I've been representing Puppet as the technical solutions engineer covering all the accounts headquartered in Silicon Valley. This has been a fantastic opportunity to evangelize configuration management to clients both new and old. One of the areas I've noticed every new Puppet user runs into quite quickly is how to utilize Hiera effectively to manage separation of data from code. On a fresh install of Puppet Enterprise 2015.2 or greater the hierarchy is pretty simple:
:hierarchy:
  - nodes/%{::trusted.certname}
  - common
Basically a scalpel or a shotgun. Not exactly taking advantage of the power of the tool, but for a good reason, as for any additional useful layers, custom facts are required.

On the plus side, Hiera, like nearly all of Puppet, is very customizable and can be tweaked to each individual organization's needs. For new adopters, all of that power can be confusing. I've noticed I nearly always recommend the same ideas, so it seemed only fair to share those as a blog.

Hiera is powerful, but it has some limitations. Notably, there is a functional limit to the number of layers which can be added until performance begins to take a hit, and in Puppet Enterprise versions before 2016.2, only one hierarchy could be used at a time for the entire compile master.

Because of this, it makes sense to focus the hierarchy on generic concepts instead of specifically referencing unique items of the business unit or workflow. The hierarchy which I've recommended the most is along the lines of this:
:hierarchy:
  - nodes/%{::trusted.certname}
  - team/%{::team}
  - application/%{::application}
  - datacenter/%{::datacenter}
  - common
The actual names of each layer can change, but these represent what I find to be generally the most important pieces of metadata about a system, that is:
  • The name of the node
  • Who owns the node
  • What the node does
  • Where the node is
  • Metadata common to all nodes

The first objection I usually hear is something along the lines of, "But we need a layer for X." To that, I challenge clients to consider whether they really need the additional layer, or if it fits into one of the existing layers. The vast majority of the differentiation they want tends to fit into the application layer. Having a ton of different applications with some data overlap is preferable to having too many layers with too little differentiation.

The conversation moves quickly from there to how to create the facts for %{::team}, %{::application} and %{::datacenter}. While there isn't a universal answer for how to create these facts, often one of a few possible methods will solve this problem for the vast majority of organizations.

Ultimately, the goal is to find a way to programmatically determine the answer to, "What is the X for this node?" To do this, we look to the pieces of information that are already a part of or attached to the node. I'll outline these approaches below.

Parse existing fact

Sysadmins have been attaching metadata to servers for a very long time in the form of hostnames. Many organizations still place information such as data center, application and team in the system's name. Facter by default already creates a fact for hostname, so we can parse that existing fact to generate new facts.

These examples are custom Ruby facts; they can be added into any module in the <module>/lib/facter directory as .rb files, and will be copied to all of the nodes via pluginsync and executed. The first example here simply takes the first four characters and turns them into a new fact.
Facter.add(:datacenter) do
  setcode do
    Facter.value(:hostname)[0..3]
  end
end
If we wanted to get everything from the fifth character to the eighth, we could modify the third line as follows:
Facter.value(:hostname)[4..7]
Or the fifth character to the end of the line:
Facter.value(:hostname)[4..-1]
Below is a more complicated example that takes from the sixth character to where there is a -, or to the end of the hostname, whichever comes first:
Facter.value(:hostname)[5..-1][/(.*?)(\-|\z)/,1]

Match value to table

This is ugly, but sometimes it's the best option, particularly when the only way to determine the data center is via IP address. The shortcoming of this approach is that it requires the fact to be updated whenever there is a new potential value. Utilizing a known fact such as the IP network and case statements, we can match up with a value such as data center. Where avoidable, this shouldn't be used just to replace shorthand metadata with full names, as it adds unnecessary complication (such as matching pdx in the hostname and replacing with portland):
Facter.add(:datacenter) do
  setcode do
    network=Facter.value(:network)
    case network
    when '10.0.2.0'
      'portland'
    when '10.0.3.0'
      'sydney'
    when '192.168.0.0'
      'home'
    else
      'unknown'
    end
  end
end

Read metadata

AWS allows VMs to be tagged with metadata that can be read elsewhere. Often, these types of tags are created already for the purpose of charging back the cost of the node to the group responsible for them. This metadata can be turned into custom facts and used with Puppet as well.

The following code creates Facter facts for the EC2 region and the EC2 tags (credit to Chris Barker and Adrien Thebo for this code snippet):
Facter.add(:ec2_region) do
  confine do
    Facter.value(:ec2_metadata)
  end
  setcode do
    region = Facter.value(:ec2_metadata)['placement']['availability-zone'][0..-2]
    region
  end
end

Facter.add(:ec2_tags) do
  confine do
    begin
      require 'aws-sdk-core'
      true
    rescue LoadError
      false
    end
  end

  confine do
    Facter.value(:ec2_metadata)['iam']['info']
  end

  setcode do
    instance_id = Facter.value('ec2_metadata')['instance-id']
    region = Facter.value(:ec2_metadata)['placement']['availability-zone'][0..-2]
    ec2 = Aws::EC2::Client.new(region: region)
    instance = ec2.describe_instances(instance_ids: [instance_id])
    tags = instance.reservations[0].instances[0].tags
    taghash = { }
    tags.each do |tag|
      taghash[tag['key'].downcase] = tag['value'].downcase
    end
    taghash
  end
end

Write a custom fact script

Any script that returns a value can be turned into a custom fact. With a Ruby fact, it’s simply a matter of wrapping the script in a setcode and exec statement:
Facter.add('hardware_platform') do
  setcode do
    Facter::Core::Execution.exec('/bin/uname --hardware-platform')
  end
end
However, Facter also supports running any script in its native format, such as Bash or PowerShell. Simply ensure the script returns key value pairs, and place in the facts.d folder of any module. Pluginsync will copy the fact, the same as for facts in the lib/facter directory of a module. A simple example:
#!/usr/bin/bash
echo testfact=fluffy

Drop facts file

When there is no programmatic way to determine the appropriate value, Facter supports the creation of this type of metadata via fact files in /etc/puppetlabs/facter/facts.d. (Note: This folder isn’t pre-created on a default Puppet Enterprise install.) Many organizations actually prefer this method over the deterministic methods above, because it avoids potential collisions. These files can be in yaml, json or txt format. There can even be executable scripts in this directory, as long as they return key value pairs. Generally I consider txt to be easiest as it's simply:
key=value
There can be any number of files in this directory, each containing any number of key value pairs. Files in the same directory can have different formats, as well. Choosing how to break up facts between multiple files, or whether to consolidate them, tends to relate more to how they are created.

Generally, facts files work best when they are created by the provisioning system at initial provisioning. Most workflows with VMware vRealize Automation or Cisco UCS Director use this method to pass information from the provisioning system to Puppet. It's relatively easy in any of these systems to create a file with the appropriate values on the provisioned system, then install Puppet and let it handle the rest. An example of a file that might get created is below:
facts.txt
datacenter=portland
application=doc
team=TSE

Conclusion

Hiera is one of the most powerful parts of Puppet for enabling reusability of code, but custom facts are a critical component you need to take advantage of that capability. Starting with a sane but simple hierarchy and building a few simple custom facts can greatly accelerate the ability to adopt Puppet across your organization. It's quite likely that you'll need to mix and match, and tweak the examples here to fulfill your organization's needs, but my hope is that this blog will provide sufficient guidance to get you started with Hiera and custom facts.

Chris Matteson is a senior technical solutions engineer at Puppet.

Learn more

Wednesday, July 11, 2012

Sidestep bugs by running multiple versions of BusyBox on ESXi 5

I recently discovered that ESXi 5 uses BusyBox 1.9.1 which includes a bug in nohup.  Christoph Gysin identified the issue and released a patch 4 years ago:

http://lists.busybox.net/pipermail/busybox/2008-January/064029.html

/bin # nohup

BusyBox v1.19.3 (2012-01-13 00:47:40 UTC) multi-call binary.



Usage: nohup PROG ARGS



Run PROG immune to hangups, with output to a non-tty


Alas, I don't want to straight out replace my busybox on my VMware server, as I have no idea what else will break.

Unfortunately I really needed nohup to work properly, as my scripts would be running for too  long to trust that my session wouldn't close.   Furthermore, walking over to the machine and plugging in a monitor to access the console sounded like a hassle.  I could probably have just written it into a script and kicked it off with cron, but in the end, I decided to fix nohup.

The first thing I tried was scp over my version of nohup from coreutils to the ESXi box.  Alas, the whole idea of busybox is to simply the myrid of prerequisites that a modern linux system requires:

/bin # nohup2

nohup2: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by nohup2)


But the same idea with busybox works great.

/bin # busybox-1.19.3 nohup

BusyBox v1.19.3 (2012-01-13 00:47:40 UTC) multi-call binary.



Usage: nohup PROG ARGS



Run PROG immune to hangups, with output to a non-tty






You can see the broken and working versions here:

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # /bin/busybox nohup esxtop -b -d 3 -n 28800 >output1.csv &

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # nohup: appending to nohup.out



/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # /bin/busybox-1.19.3 nohup esxtop -b -d 3 -n 28800 >output2.csv & 

/vmfs/volumes/4f3e8959-c92ca33c-dc7c-00215e26146e # ls -l

-rw-r--r--    1 root     root                  0 Jul 11 15:53 output1.csv

-rw-r--r--    1 root     root             159368 Jul 11 15:54 output2.csv





Now I just changed the sym link for nohup

/bin # rm nohup

/bin # ln -s /bin/busybox-1.19.3 nohup

/bin # ls -l nohup

lrwxrwxrwx    1 root     root                 19 Jul 11 15:55 nohup -> /bin/busybox-1.19.3


And we're in business with a working nohup, and everything else stock.

Saturday, March 10, 2012

Multiboot ESXi 5, Windows 2008 R2, RHEL 6, XenServer 6

Congratulations, you decided to be the ultimate hypervisor egalitarian and install every mainstream hypervisor + KVM onto a single server.  You setup Windows 2000 and Redhat to dual boot 11 years ago, how much can things have changed?

I could have accomplished this project via constantly swapping harddrives when I needed to change hypervisors, but that seemed more like accepting the problem than a solution.

My initial plan was to divide my six harddrives into 4 logical volumes at the raid controller level.  Unfortunately the raid controller for servers I was using (I'm going to refrain from plugging any one vendor, at least as long as I'm operating on other people's hardware) doesn't provide this functionality.  For a brief moment I flirted with the idea of cutting up the disk using partitions, until a closer look at ESXi and Xen's installation revealed they don't provide any granularity in setup of the partition tables beyond selecting an installation volume.

Instead of telling this story as it happened, I'm going to first go over the hiccups I encountered on the way.  That way, incase you're like me and already started this project before doing your research, you can plan a bit before reading the whole article.

VMware Boot Bank Corruption
Not sure how I accomplished this on the first server, because I couldn't replicate it on the second.  I assumed it was because Microsoft was automounting the VMware Boot Banks (they are formatted fat32).  Anywho, for quite a long time I was stuck with the error "Not a VMware Boot Bank".  VMware provides very little information on this error (and not applicable to my situation).  Unfortunately VMware also doesn't seem to provide a way to fix a corrupted Boot Bank, so I was forced to reinstall.  Afterwards I ran this command to prevent Windows from auto-mounting these volumes:

diskpart
automount disable
exit

Windows installing onto the wrong drives
While Microsoft is nice enough to ask you which drive you'd like to install Windows on, that doesn't mean it'll listen.  Unfortunately while it will install your C: drive to the selected partition, it will install the 100MB System Reserved partition which it boots from to where it sees fit.   On a system with lots of unformatted drives like mine, and where I was installing Windows to not the first volume, it chose to put the system reserved partition and the MBR for windows on the first volume set, ensuring that it would be booted automatically.  I ended up fixing this by only leaving in the disks for Windows during it's installation process, then inserting the other disks after it was done.

XenServer is scared of VMFS volumes
Citrix, it's ok to be scared, but you need to face your fears.  Every time I booted the Xen installer it would die when it got the point of scanning the drives with the error "Could not parse Sgdisk".  From what I found via a quick google, this is due to an inability of the Xen 6 installer to handle VMFS volumes.  Citrix, this is no way to convert followers away from VMware.  I resolved this by making sure the disk I was installing Xen to was formatted and pulling the other disks out of the server during the installation of XenServer.

XenServer is scared of GPT volumes
Not sure if this was just from me, or why this was occuring, but I had to follow these instructions to get past a GPT error which occurred during the installation of Xen 6:

1. Boot from the XenServer 6.0.0 install CDROM.
2. At the Xen 6.0.0 install prompt type menu.c32 and press enter.
3. When you get the new menu screen press TAB
4. add disable-gpt just before the last --- then press enter.

BCDedit won't modify the configuration if it's not the primary disk
I promoted this to its own blog post.


The actual project
For these four hypervisors, there are three different boot loaders used, and two major versions of one of those boot loaders:

Windows 2008 R2Bootmgr
RedHat Enterprise LinuxGrub
VMware ESXi 5Syslinux 3.86
Citrix XenServer 6Syslinux 4.04


My original goal was to use Grub with RHEL 6 to load everything else.  This turned out to be a much simpler task than the hours of time I burned on it made it appear.  I thought the errors I was receiving in ESXi due to a corrupted boot bank were from some problem with Grub chainloading Syslinux, turns out i had really just corrupted my boot bank.

Boot everything from Grub
This turned out to be rather simple due to support for something called "chainloading".  My final working grub configuration is:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_ccskvmvmh01-lv_root
#          initrd /initrd-[generic-]version.img
#boot=/dev/sdb
default=2
#timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
#hiddenmenu
title Red Hat Enterprise Linux (2.6.32-220.el6.x86_64)
 root (hd0,0)
 kernel /vmlinuz-2.6.32-220.el6.x86_64 ro root=/dev/mapper/vg_ccskvmvmh01-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_NO_DM rd_LVM_LV=vg_ccskvmvmh01/lv_root rd_LVM_LV=vg_ccskvmvmh01/lv_swap
 initrd /initramfs-2.6.32-220.el6.x86_64.img
title Windows 2008 R2
 rootnoverify (hd1)
 chainloader +1

title ESXi 5
 rootnoverify (hd2)
 chainloader +1

title XenServer 6
 rootnoverify (hd3)
 chainloader +1

Grub simply hands off booting to the MBR on the respective drives, and the boot loaders for the other hypervisors takes it from there.

Of course since this didn't initially work for me, I went through all the additional effort to also make this for from ESXi's and Xen's Syslinux to boot everything, and I made an effort (but did not finish) making Bootmgr boot everything else.

Boot everything from ESXi 5 (Syslinux 3.86)
This works pretty much the same way, with two major caveats.  First, VMware was not nice enough to ship menu.c32 or chain.c32 with the installation of ESXi 5.  Therefore you need to go here, download Syslinux 3.86 and extract those two files.

Second, ESXi doesn't mount it's first partition, so you can't do any of this from within ESXi's console.  I'd recommend a linux boot disk, or if you chose to dual boot with RHEL 6, you can mount it from there.  You'll know you have the right partition if you see the files:

syslinux.cfg
safeboot.c32
mboot.c32

Copy chain.c32 and menu.c32 into this partition.  Now backup syslinux.cfg and replace it with the following:

ui menu.c32

label esxi
menu label ESXi 5
COM32 safeboot.c32

label win
menu label Windows 2008 R2
COM32 chain.c32
APPEND hd1

label xen
menu label XenServer 6
COM32 chain.c32
APPEND hd3

label linux
menu label RedHat Enterprise 6
COM32 chain.c32
APPEND hd0
 
Pretty simple.

Boot everything from Xen 6 (Syslinux 4.04)
Citrix was nice enough to include both menu.c32 and chain.c32 in their boot partition.  I didn't check to see if they mounted this within their OS or not, and simply edited it out of band anyways.  Since I was already done with most everything at this point, I didn't change it to use the menu config but instead left it with requiring you to enter the label at the boot prompt.

The file you want to edit this time is extlinux.conf.  Again, I suggest making a backup first:

# location mbr
serial 0 115200
default xe
prompt 1
timeout 50

label xe
  # XenServer
  kernel mboot.c32
  append /boot/xen.gz dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M console= vga=mode-0x0311 --- /boot/vmlinuz-2.6-xen root=LABEL=root-ckxrtmdk ro xencons=hvc console=hvc0 console=tty0 quiet vga=785 splash --- /boot/initrd-2.6-xen.img

label xe-serial
  # XenServer (Serial)
  kernel mboot.c32
  append /boot/xen.gz com1=115200,8n1 console=com1,vga dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz-2.6-xen root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-2.6-xen.img

label safe
  # XenServer in Safe Mode
  kernel mboot.c32
  append /boot/xen.gz nosmp noreboot noirqbalance acpi=off noapic dom0_mem=752M com1=115200,8n1 console=com1,vga --- /boot/vmlinuz-2.6-xen nousb root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-2.6-xen.img

label fallback
  # XenServer (Xen 4.1.1 / Linux 2.6.32.12-0.7.1.xs6.0.0.529.170661xen)
  kernel mboot.c32
  append /boot/xen-4.1.1.gz dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz-2.6.32.12-0.7.1.xs6.0.0.529.170661xen root=LABEL=root-ckxrtmdk ro xencons=hvc console=hvc0 console=tty0 --- /boot/initrd-2.6.32.12-0.7.1.xs6.0.0.529.170661xen.img

label fallback-serial
  # XenServer (Serial, Xen 4.1.1 / Linux 2.6.32.12-0.7.1.xs6.0.0.529.170661xen)
  kernel mboot.c32
  append /boot/xen-4.1.1.gz com1=115200,8n1 console=com1,vga dom0_mem=752M lowmem_emergency_pool=1M crashkernel=64M@32M --- /boot/vmlinuz-2.6.32.12-0.7.1.xs6.0.0.529.170661xen root=LABEL=root-ckxrtmdk ro console=tty0 xencons=hvc console=hvc0 --- /boot/initrd-2.6.32.12-0.7.1.xs6.0.0.529.170661xen.img

label redhat
com32 chain.c32
append hd0

label win
com32 chain.c32
append hd1

label esxi
com32 chain.c32
append hd2

Boot everything from Bootmgr
First note, I never actually go this working.  Booting Grub from Bootmgr is easy.  I'm sure it's possible somehow to boot Syslinux from Bootmgr, but after accomplishing the above tasks this seemed like a trivial exercise.  Still, I figured I'd list what I did.

First I pulled the mbr off each of the disks:

dd if=/dev/sda of=/tmp/linux.bin bs=512 count=1
dd if=/dev/sdc of=/tmp/syslinux.bin bs=512 count=1
dd if=/dev/sdd of=/tmp/xen.bin bs=512 count=1

Then I copied the files I created to the root of the C: drive for Windows.  Then within BCD edit I ran:

bcdedit /create /d “Linux” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \linux.bin
bcdedit /displayorder {ID} /addlast

bcdedit /create /d “ESXi 5” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \syslinux.bin
bcdedit /displayorder {ID} /addlast

bcdedit /create /d “Xen 6” /application BOOTSECTOR
bcdedit /set {ID} device partition=c:
bcdedit /set {ID}  path \xen.bin
bcdedit /displayorder {ID} /addlast

bcdedit /timeout 30
{ID} should be replaced in each entry here by the ID which BCDedit gives you for that entry after you run the /create command.   When I was finished the configuration looked like this:

Windows Boot Manager
--------------------
identifier              {bootmgr}
device                  partition=E:
description             Windows Boot Manager
locale                  en-US
inherit                 {globalsettings}
default                 {default}
resumeobject            {69db61ae-5dc5-11e1-8bc1-bb2a8301fdf5}
displayorder            {default}
                        {53d01ece-5e3e-11e1-94cf-9e5144662cd0}
                        {657ad556-5e3e-11e1-94cf-9e5144662cd0}
                        {c4a9ea4a-6a30-11e1-8e1a-f59982b714d6}
toolsdisplayorder       {memdiag}
timeout                 10

Windows Boot Loader
-------------------
identifier              {default}
device                  partition=C:
path                    \Windows\system32\winload.exe
description             Windows Server 2008 R2
locale                  en-US
inherit                 {bootloadersettings}
recoverysequence        {69db61b0-5dc5-11e1-8bc1-bb2a8301fdf5}
recoveryenabled         Yes
osdevice                partition=C:
systemroot              \Windows
resumeobject            {69db61ae-5dc5-11e1-8bc1-bb2a8301fdf5}
nx                      OptOut

Real-mode Boot Sector
---------------------
identifier              {53d01ece-5e3e-11e1-94cf-9e5144662cd0}
device                  partition=C:
path                    \syslinux.bin
description             ESXi

Real-mode Boot Sector
---------------------
identifier              {657ad556-5e3e-11e1-94cf-9e5144662cd0}
device                  partition=C:
path                    \linux.bin
description             RedHat

Real-mode Boot Sector
---------------------
identifier              {c4a9ea4a-6a30-11e1-8e1a-f59982b714d6}
device                  partition=C:
path                    \xen.bin
description             Xen

This seemed to boot Grub just fine, but everything else gave an error about no operating system found. I have not spent more time to troubleshoot it.

End Result
Everything available from Grub.   I know it's cheesy to take a picture of a monitor, but until someone gets me some test boxes with out of band management (hint hint, Cisco and/or Fujitsu), this is how it's going to be.


Credit: Information for this solution was gathered from numerous websites in addition to independent research.  In addition to the websites listed above, some of the others which provided instrumental information are:

http://www.legacycode.net/2010/01/31/esxi-3-5-server-2003-dualboot/
http://www.msfn.org/board/topic/135148-booting-bootmgr-from-syslinuxubcd/

Use BCDedit when Windows is not the primary disk

Congratulations, you've decided to use some other boot loader to as your primary and simply chainload bootmgr for Windows.  Unfortunately you still want to modify the configuration of bootmgr, but BCDedit doesn't do work unless you admit it's god of your boot process.

BCDedit by default opens the configuration for the very first disk in the system.  If that doesn't happen to be Windows, or the Windows installation you want, then it just won't open it.  To get around this you need to mount the system reserved partition which windows creates for the BCD configuration to live on, and then specify that you want to modify that store in your configurations.

First, under disk management, find the 100MB system reserved partition and give it a drive letter.


The BCD configuration will be at (drive letter):\boot\bcd.  To specify this in BCDedit use the following syntax:

bcdedit /store (drive letter):\boot\bcd (your command)


Credit: Information for this was gathered from numerous websites in addition to independent research.  This project came up as a component of my plan to run ESXi 5, Windows 2008 R2, RHEL 6 and XenServer 6 all on one box.  Booting ESXi and Xen (both which use syslinux) did NOT work, despite being referenced in the image above.  Further details can be found in the blog post on that topic.

Sunday, February 26, 2012

RD Web Access + RD Gateway + Multiple ISPs

Congratulations, you've successfully implemented a RD Web Access + RD Gateway solution.  Perhaps  you're trying to emulate Remote Web Workplace or Citrix XenApp.  The accolades of your peers continued until that day when the primary internet line went down for 8 hours.  Suddenly you've become the guy who implemented a business critical solution with a single point of failure in Baby Bell.

The problem, while RD Web Access (TS Web Access in the non-R2 version of 2008) will give you a warning about non-matching certificate which you can ignore, RD Gateway (TS Gateway in non-R2) simply fails.  Changing DNS records to point to the secondary isp could take hours, and manually programming the DNS entry in every client's computer would likely take just as long (not to mention prevent you from failing back).

Microsoft doesn't seem to provide any obvious answer for this situation.  However, the need for a certificate to work from multiple urls is not new.  Star certs and UCC certs provide exactly this functionality.  Furthermore there are millions of IIS and Apache servers out there which have successfully implemented multi-tenanted solutions which require the ability to determine which url the end user requested to provide the correct webpage.

The first step is go acquire a UCC or Star cert for the domain.  Since it's still one site in IIS, it can't handle providing a different certificate based on each domain.  If you do choose a UCC cert, I suggest not just putting one alternative name on there, but filling up whatever the allotment is for your price point with generic names.  Trust me, you'll need a certificate again at some point, and wouldn't it be handy if you didn't need to go get another purchase approved?


When you install the certificate, make sure to do it both through the role services.  You'll need to specify the certificate in IIS for RD Web Access and in under the RD Gateway Role settings.  See the information here for setting this certificate for RD Gateway:

http://technet.microsoft.com/en-us/library/cc732886.aspx

Now RD Web Access and RD Gateway should work fine on the primary internet line, and they will appear to work from the backup internet line, IF the primary internet line is up, but it won't work without the primary internet line.  The reason is because the DNS entry for the RD Gateway server is hard coded into IIS.  Even though you connected to the RD Web Access on the backup isp, the RD Gateway session will be initiated on the primary isp because that entry is specified as the DefaultTSGateway

You can see more detailed instructions on how to set this setting here:

http://technet.microsoft.com/en-us/library/cc731465.aspx

However this only provides for a single hard-coded entry.  The trick to give us support for multiple isps simultaneously is to modify the underlying asp coding.  Open IIS, browse to Default Web Site -> RDWeb -> Pages -> "en-us" and select "Explore" from the action pane on the right.  Finally edit the file "Desktops.aspx"

Find this line:
DefaultTSGateway = ConfigurationManager.AppSettings[“DefaultTSGateway”].ToString();

And change it to:
DefaultTSGateway = Request.ServerVariables[“SERVER_NAME”].ToString();


This tells IIS to use the name of the server as requested from the client.  As long as RD Web Access and RD Gateway are running on the same server, this should be correct.

These instructions vary slightly for 2008 non-R2, but should be close enough to still follow.  The skinny is that everything is named TS instead of Remote Desktop and the IIS folder is TS instead of RDWeb.

Credit:
In addition to the websites listed above in this article, information was gathered from numerous other unmentioned websites.  This question, and the solution, were originally posted by myself in the Microsoft Partner Forums.  You can find the original thread here:
http://social.microsoft.com/Forums/en-US/partnerwinserver/thread/bb2f49f0-21eb-4407-a33f-19b1de0ff285

Thursday, February 23, 2012

Restore from formatted offline files database

Congratulations, you fixed offline files syncing at the expense of the last six months of work for a company executive.  They've given you an  hour to restore their files or to pack up your stuff.  Your frantic internet searches give you some false hopes, but you start to realize your screwed.

Not quite yet!  Highly Unsupported has one last option for you that might just save you from having to get your interview suit dry cleaned.

Windows stores offline files in the folder: C:\Windows\CSC.  The folder is locked down to prevent access from any interactive user.  However if your files are still in there, you can follow numerous instructions online to simply take control of this folder and browse to the files you need.  Or alternatively you can use the psexec method below to hack into it without needing to forcibly take control.

But if you're gone through the trouble of formatting the offline files database, your files won't be there anyways. Your one last hope is one feature Microsoft added with Windows ME, System Restore.  It's never done me any good to actually fix a broken operating system, but it can save you now.

First off, from the afflicted computer, run "vssadmin list shadows" to see if you even have any restore points.

If you have some restore points from before you blew away the offline files cache, and after they created the files, you are in business.

You'll need volrest.exe and psexec for these next steps.  volrest.exe comes from the Windows Server 2003 Resource Kit Tools.  Don't worry though, it'll install and work just fine on Windows 7 (just ignore the incompatibility error).  You can download the Resource Kit here:
http://www.microsoft.com/download/en/details.aspx?id=17657

Grab psexec off http://live.sysinternals.com .  If you don't know what PSEXEC is you should spend sometime to find out after you've saved your job.

The reason we need psexec is because while you could take ownership of the CSC folder which is presently in the operating system, you have no way to take control of the one inside of the restore point.  Microsoft, trying to provide security through obscurity, doesn't let you restore folders you don't have access to, but as administrator that's merely a hurtle.  Run this command to create a cmd window running as nt authority\system

psexec.exe -i -s -d cmd


In XP you could have used the "at" command with the /interactive flag to have accomplished this same thing, but again, Microsoft made it slightly more difficult for "security".

Now the fun part.  Use volrest to restore the CSC directory.  Volrest only works with UNC paths, but that's not an issue, the administrative share provides you with the unc path you need.

volrest \\localhost\c$\windows\CSC /s /e /sct /r:C:\temp\directory

This will restore a copy of every file under the path C:\Windows\CSC for every restore point which has those files in it.  If you have a lot of restore points, you could end up with a lot of files.  The /sct flag date stamps all the files, so you can quickly sort out which is the newest.

Now, copy the files back to the proper locations and make sure the offline files sync is working properly.


Credit:
Information for this solution was collected from numerous websites to generate a complete solution.  Special thanks also needs to be extended to Jim Banach whom not only created this problem in the first place, but was the primary force in discovering this solution.

Wednesday, February 22, 2012

Ode to the Blog

Congratulations, via a matter of hapless clicks, you've discovered your browser inexplicably displaying this web page.  Let me be the first to assure you that your back button is fully operational, and escape is just a few panicked clicks away at any time.

Being as this is my inaugural post, I suppose I should provide some context concerning my decision to begin a blog.  Though, I think possibly it's better to start with why I've waited so long to begin.  The rampant narcissist teenage angst which pervaded my and many of my friend's LiveJournals left me with a poor perspective on blogs as a communication mechanism.

A decade later my natural aptitude for problem solving lead me into a career as a fixer of problems, specifically IT issues.  The complexity and ever changing nature appeals to the side of me which craves challenges, while the constant deluge of catastrophe provides plenty of opportunities to play hero.  In this process, I've spent countless hours on the internet, exploiting google to replace the need for me to actually be knowledgeable on a subject. Unfortunately, over and over again I found someone whom posted an issue almost identical to mine, only to follow it up with "Thanks, I fixed it", if they followed up at all.

And I realized, I'm one of the worst offenders.  On many an occasion I've found something after countless hours of putting together bits and pieces of poor documentation, or went off on original research.  However soon as I finished, I made no attempt to record the answer for posterity.

That doesn't mean I'm restricting this blog purely to paying my dues to the collective knowledge.  I expect I'll throw in some additional material with questionable correlation to the central theme based on my whims at that moment.  Hopefully it'll all be interesting, but I can assure you that it will all be highly unsupported.