Linux And VMware ESX (via Working At Dell)

15 Feb

I have been doing a lot of different kinds of computer support during my time in the computer support business, more so since coming to Austin in December of 2005. Most of it had been with Wintel PCs, desktops, laptops and servers. But a fair amount of Macs too, after all that is what I use almost exclusively at home and what this blog is served off.

In January of 2010 I got a job at Dell answering phones on their “Alternate O/S” server queue. Basically any server that doesn’t run any form of Windows. That turns out to be primarily (Red Hat/SUSE) Linux and VMware’s ESX. When I started the job I barely knew what ESX was and vaguely knew my way around Linux.

A year has gone by and I have gathered a bunch of notes that I have found helpful in doing that job. Most of them relate to basic Linux and ESX functions. Some specifically relate to Dell servers. I have divided this page into those three sections, Dell-specific, Linux (RHEL, specifically) and ESX. This stuff works for me but if you have suggestions, quibbles, flames or corrections please feel free to Email me and I will be happy to make any necessary changes.

This was finished up in January of 2011, not long before I left Dell. A number of things have changed, most of these have to do with updated versions of OMSA and ESX. Most (if not all) of this stuff still applies. And with a little sleuthing you can find the updated versions.

Disclaimer:

Don’t blame me if you try something here and you lose years of irreplaceable data, screw up the computer, cause the ground to open under your feet and send the Earth careening into the sun. You are responsible for your own actions. This stuff works for me. So make copies of any files before you change them. Think about what and how you are doing something so that, if it fails, you can back out of the changes you have made.

 


 

I have not included the shell prompt in my instructions below. Assume that each line is to be entered one at a time. Terminal lines and paths always look like this respectively:

Command -a -b -C

/volumes/path/to/a/file

And if you are confused as to what some of the characters are and where spaces are just copy/paste the line into the terminal program you are using. That’s what I do.

 


Dell Diagnostic Tools

When you call Dell to get technical support on a server that runs Linux or ESX (or Xen, Or Novell, or Solaris, or…) I can just about guarantee that eventually you will run into two tools. The first is the OMSA Live CD. It is a standard Linux live CD (Running CentOS) with a few Dell specific additions that make it particularly useful if you administer Dell servers. Those additions are:

  1. Open Manage Server Administrator (OMSA).
  2. DSET the Dell Systems E-Support Tool.
  3. Dell Diagnostics.
  4. MPMemory [PDF]
  5. (an environment to run) Dell firmware updates.

These Officially Supported server tools only run on officially supported operating systems. That means Windows, RHEL, SUSE and VMware ESX. There is some support for Xen Server as well. Because of that some senior Linux techs came up with this live CD. It’s not officially supported by Dell (yet), But it works very well and is used a lot. If you work on Dell hardware with unsupported operating systems it is worth having in your arsenal.

Previous versions are kept around as well so if you have an older system or need to run int 32bit mode have a look here.

Download the livecd.iso file from the link above, burn it to CD and boot the server from it. It will take a long time to boot. Nothing will seem to happen for an extended time, but if you leave it alone it will boot to a Linux desktop. It is set to grab an IP address with DHCP. And if you have a USB thumb drive inserted it will enumerate it and mount it. You now have a pretty standard CentOS live CD with the additions listed above. Plus the ability to run the Linux .bin firmware and BIOS upgrades that Dell makes for all of it’s servers.

Dell SDC Javascript Bookmarklet

You may (or may not) be surprised to learn that Dell technicians use support.dell.com (SDC) as the authoritative location for documentation and updates for Dell computers. There are internal documents that feature marketing, positioning information and knowledgebase articles about issues on the various platforms Dell sells that customer’s never see. But we spend a lot of time on SDC looking up firmware versions, manuals and the like.

SDC is actually fairly good, but some clever person came up with a javascript bookmarklet that I kept in my browser toolbar that will pop up a dialog box to enter a service tag into and will jump you right to the downloads page for any Dell system. And since Dell tracks issues by service tag this was very handy.

Service Tag Jump

Click that link and it will pop up a request for a service tag. Type or paste the tag in and press enter. It will take you to the downloads page for that system

Dell Repository Manager

If you manage a bunch of Dell servers then you need to know about the Dell Repository Manager. This little gem will create local repositories of Dell firmware and driver updates for all Dell servers and desktop systems. Basically you specify the model and operating system and then it goes out to the Dell FTP site and downloads ALL of it in one fell swoop. It has lots of options on how to distribute the updates, ISO, network, thumb drive, etc. While it supports Windows and Linux it only runs in Windows. It is worth your while to get to know this tool.

Dell DSET

The workhorse of the server support technician is the DSET utility. It stands for Dell Systems E-Support Tool. And what it does is gather a whole raft of diagnostic information, logs and the server’s internal inventory down to the serial number of the DIMMs in the system.

If you have a problem with a something in the memory or disk subsystem, if your server is spitting out strange errors, pretty much no matter what the problem is eventually a Dell support tech will ask you to download and run this tool. If you do this before you call in with your problem it will speed up the troubleshooting process.

There is no special tool that you need to open and read a DSET report. It spits out a password protected zip file. The supper sekkrit password is “dell”. Once unzipped you have a couple of folders and an .HTA file. Open the .HTA file and you have a web-based GUI of your system logs, hardware and OS.

There are a few things the DSET report doesn’t display, so explore the files. One you will want to know about is the PERC Controller log. Search for “Controller_0.log”. And that will give you an intimate look into what the RAID controller is doing. There is enough info to make your hair hurt. But eventually it will make sense and will really tell you what the deal is with that battery error and what is really happening with drive 0.

Dell System BIOS

Did you ever notice that some Dell BIOS does not offer an option to reset to defaults? Well this will do it. And it works on not only on servers but all Dell computers.

  1. Enter BIOS (F2 on the BIOS splash screen).
  2. Turn on caps lock, scroll lock and num lock.
  3. While holding the ALT key press e then f then b.

The b should reboot the server and bring the BIOS back to factory defaults.

Dell, Setting the Service Tag

The Dell Service Tag is the serial number of the system. All Dell systems have 7-digit alphanumeric service tags that look something like this, 7WR56T1. This is also written to the BIOS CMOS NVRAM and is used by a variety of Dell utilities to identify the computer.

(I made that number up). When you call in to open up a case on a Dell server they have to have your service tag. In fact, without a valid service tag you ain’t gettin’ no tech support. So it is vitally important that they get the Service Tag and get it correctly.

It’s amazing how few people understand that no matter how well they enunciate, a lot of letters sound the same over the phone: T & D, F & S, M & N, etc. Add into that a variety of foreign and regional accents and the noisy server rooms you are calling from and the noisy call center we work in and you really need to use a Phonetic Alphabet, s l o w l y.

That service tag above would sound something like this: seven, whiskey, romeo, five, six, tango, one. Don’t worry about remembering the official words. Make up your own as long as they are common and obvious. And those of you who use ” Mnemonic.” for M, will get hung up on.

If a motherboard is replaced then the tech that Dell sends out to do that replacement (The “OST” or Onsite Service Technician) has a utility to assign the new motherboard the correct Service Tag for your computer. On rare occasions they forget to do so, if they forget you can do it yourself with the asset utility.

Dell, Bootable USB Key

One of the daily issues we faced was how to apply BIOS and firmware updates in unsupported operating systems. ESXi’s lack of a command prompt makes this necessary. But if you had Novell, Xen, Solaris or any one of a hundred Linux distros on your box, how do you upgrade the BIOS? Dell doesn’t offer new updates in bootable floppy format any more but it does offer DOS versions. All you need is something to make bootable media from.

Normally we would use the Dell 32-Bit diagnostics utility. That can create bootable media including a (2GB or smaller) USB thumb drive. All you have to do is delete the autoexec.bat file from the key and copy the updaters to the key and you are good to go. But you have to execute each update manually, if you want to automate it a bit more there is this.

You will need a Windows machine to do this on. Start by downloading this file.

Run it to extract the file to C:\BIOS_update\8104-020115\. Close the window it opens after extraction is done and grab your BIOS/Firmware update.

If you are doing a BIOS update, rename your update file to something that will fit the 8.3 filename convention, such as bios.exe then move this file into C:\BIOS_update\8104-020115\BIOS\

In the C:\BIOS_update\8104-020115\BIOS\ directory edit autoexec.bat to reflect the new name, like @bios.exe instead of the @ C020115.exe which was in there by default. Or if you like delete the autoexec.bat and run it manually

If you are doing a RAID controller update, copy the update into C:\BIOS_update\8104-020115\BIOS and then run it to extract the files. In the C:\BIOS_update\8104-020115\BIOS\ directory edit autoexec.bat to run @nocheck.bat instead of the @ C020115.exe which was in there by default.

Once you have the updates copied to the correct directories run bdp.exe in the C:\BIOS_update\8104-020115\ directory and create your CD or USB flash key media. If you are using a USB thumb drive it should be 2GB or smaller.

Dell BIOS Not Installing?

If you are on a Linux or ESX system and are getting HAPI errors when trying to update firmware you can do one of two things. Install OMSA, or install the Dell Client Configuration Toolkit.

Just download it, untar it and run the installer. It should place all the necessary dependencies on the system so that your firmware upgrades will work without resorting to secondary boot media

Install the RPMs in CCTK in the following order:

  • rpm -ivh srvadmin-ipmi-6.0.1-800.DUP.i386.rpm
  • rpm -ivh srvadmin-omilcore-6.0.1-800.i386.rpm
  • rpm -ivh srvadmin-hapi-6.0.1-800.i386.rpm
  • rpm -ivh cctk-linux-1.0.0-1.i386.rpm

 


 

Linux and Linux on Dell

Any reasonably competent Linux SysAdmin already knows this stuff, so don’t expect any big revelations, it’s just simple, handy beginner stuff. Stuff that I wrote down so I would be able to paste them into an email for the numerous unqualified Linux SysAdmins I got calls from.

Speaking of which, I have recommended the One Page Linux Manual [PDF] many times. It’s handy. Also handy is the FossWire.com Linux Command Cheat Sheet.

How Do You Mount A CD in Linux (and ESX)?

Insert your optical media and open up a command line. Lets determine which device it is:

ll /dev | grep cdrom

Let’s assume that you got /dev/scd0 as your optical drive. If you get a few and you are not sure you may have to try each one till it works. But in the meantime we will need a place to mount the optical media to, so make a directory.

mkdir /mnt/cdrom

If you get an error that directory may already exist, you can use that one or make one with a different name, your choice, doesn’t matter. Now we mount the device to the mount point (directory) you just made

mount /dev/scd0 /mnt/cdrom

If we got the correct device you will get an error about the device being a read-only filesystem, that is expected, ignore it. The optical media is now accessible here.

/mnt/cdrom

When you are finished don’t forget to unmount the media.

umount /mnt/cdrom
Note: (NOT uNmount)

Disk / Drive info

How about some really simple stuff, like how big and how full are my drives?

df -h
And you get something like this:

 

Filesystem Size Used Avail Use% Mounted on
/dev/sdb5 4.9G 2.2G 2.5G 47% /
/dev/sda1 1.1G 75M 952M 8% /boot
/dev/sdb2 2.0G 69M 1.8G 4% /var/log
/vmfs/devices 74G 0 74G 0% /vmfs/devices
/vmfs/volumes/4bd… 73G 52G 20G 71% /vmfs/volumes/Storage1

 

Pay close attention to the “use%” column. No fancy pie charts here just numbers to tell you when you are low on drive space and need to clear some space ASAP.

FSCK, Filesystem Check

So you think you have some volume corruption, damaged filesystem, etc. What do you do? In Windows it’s CHKDSK. In Linux: FSCK. Remember one thing, you should never run FSCK on a mounted file system, that could result in Bad Things. So either unmount the filesystem first, boot into single user mode or from a live CD.

First do this to determine where the volume is.

fdisk -l
Note: use lvs or lvdisplay for logical volumes

Then once you know where your disk is, lets say /dev/sd1

Do this

fsck -fy /dev/sd1

If the filesystem is really messed up this could take a long time, let it do its thing. When its done, run it again, and again until it can’t find anything more to fix.

Command Line Versioning

On Dell hardware you can get the BIOS version and the service tag from the command line.

BIOS verison:

dmidecode | grep -i version

Service Tag:

dmidecode | grep -i serial

Want to know what version of RHEL you are running?

cat /etc/redhat-release
You’ll get something similar to this:
“Red Hat Enterprise Linux Server release 5.4 (Tikanga)”

But what about the Kernel version?

uname -r
Gives you this:
“2.6.18-164.el5”

The SOS report

Red Hat Enterprise Linux has a nice little diagnostic utility that someone smarter than me would look through and see what is happening on your machine. It is very detailed. I have seen them range in size from a few to a few hundred MB.

Just type sosreport at a command prompt. The output is bzipped in /tmp.

an SOSreport can help A Red Hat engineer or one of Dell’s Sr. Analysts determine what is going on with your server. If you are having problems on your RHEL server having one of these ready to go would not be a bad idea.

SUSE Linux has a similar package but Support Config is usually not installed by default. Installation and usage instructions are here.

Yum

No doubt you are familiar with how you update software on RedHat through YUM?
But you just noticed that you cant get YUM updates to work, it just fails with some gibberish or no error at all. Try this.

yum clean all

That will fix a number of issues and will often get yum working again.

Run Level-1

Remember when I said you could run FSCK in single user mode. Single user mode is also called Run Level 1 and in RHEL this is how you get there.

  1. At the GRUB menu press e
  2. Select the KERNEL line
  3. Press “e” (for edit)
  4. At the end of the line insert one space then type “single” and “b”
  5. press enter and follow the prompt to boot

OK things are working better and you need to elevate the run level to something more useful?
To get to runlevel 3 type init 3 (or 5 or whatever…) at a shell prompt.

 


 

VMware ESX and ESXi

 

OMSA on ESXi 4.x

Dell’s OpenManage Server Administrator (OMSA) is a web-based GUI that allows you to view and manage various aspects of your Dell server. Since there is no specific application that lets you manage your PERC RAID arrays in Linux and ESX you will need to install OMSA if you want to manage your RAID without booting the server into the RAID BIOS.

Installing OMSA on Linux and ESX is pretty straightforward. Just go to the download page on support.dell.com for your particular model of server, select the Operating System and, Under the heading “Systems Management,” look for “Open Manage Server Administrator Managed Node.” If there is one that is “Distribution Specific” use that one.

For Linux and ESX you will get a tarball about 100 or so MB. Copy that to your host computer (I like to put it in /root/omsa/) and then untar it.

tar -xzf [om_6.4.0… tar.gz]
It will extract the installation folder structure that you should hang ontoin the event you need to uninstall, reinstall, etc.

While there will be a setup.sh installer right there, don’t use it. While I am not sure why, exactly. I have been told by wiser men than I That Is Not The Preferred Way. Rather look for the setup script here.

/root/omsa/linux/supportscripts/
the preferred installer is there along with uninstall and maintenance scripts.

Once you have done this on ESX you will need to open a port in the firewall so you can talk to the OMSA web server. Do this:

/usr/sbin/esxcfg-firewall -o 1311,tcp,in,OpenManageRequest
This will permanently open port 1311 so you can get to OMSA like this:

https://[hostname]:1311

While installing OMSA on Linux and ESX is fairly straightforward, doing so on ESXi with no native command line is a bit more complicated. In fact we got a lot of calls asking how to do just that. While most of these callers could have benefited from a little Googling for the instructions I came up with a document condensed from other documents written by fellow techs floating around the department and (hopefully) clarified the process.

Tools required:

    1. Download and install the vSphere VCLI from the link above
    2. Download the OpenManage Server Administrator Managed Node:
    3. Place the ESXi host into maintenance mode.
    4. Copy the OpenManage Bundle to the vSphere VCLI location.
    5. Copy the downloaded OpenManage zip file to the folder that vSphere VCLI was installed to on your Windows PC.
    6. Open vSphere VCLI to install the package.
    7. Click on Start → Programs → VMware → VMware vSphereCLI → Command Prompt
    8. Enter this command
      • bin\vihostupdate.pl --server [IP or hostname] -i -b [full path to OMSA zip file]
      • It is very important you use the full path to the file. Also you can rename it if you like.
      • The system will run for a bit as it copies and installs OMSA.

 

    1. Enable CIM using the vSphere Client
      • Login into the server with the VSphere client and go to the Configuration tab, then in the Software box.
      • click on Advanced Settings → UserVars → and enable UserVars.CIMOEMProvidersEnabled
The second box down, change it from 0 to 1
The second box down, change it from 0 to 1

 

  1. Restart the Management Agents on ESXi from the ESXi service console (DCUI).
  2. Now we have to install OpenManage Server Administrator on your Windows workstation in order to connect using the OMSA Web interface to the ESXi host.
  3. Download OMSA for Windows from the think above and install it.
  4. Once the install has completed go to Start → Programs → Dell OpenManage Applications → Server Administrator. There should also be a new icon on your desktop.
  5. It will open your browser. Enter the IP or hostname of your ESXi server and the root username and password. Also check “Ignore Certificate Warnings”.

You are now logged in to OMSA on your ESXi host.

Configuring SNMP:

  • Open a command prompt on the system in which the vSphere CLI is installed.
  • Navigate to the directory in which the vSphere CLI is installed.
  • The default location on Linux is /usr/bin and on Windows is C:\Program Files\VMware\VMware vSphere CLI\bin.
  • Configure the SNMP setting using the following command:

vicfg-snmp.pl --server [server] --username [username] --password [password] -c [community] -t [hostname]@162/[community]

  • [server] is the hostname or IP address of the ESXi system
  • [username] is a user on the ESXi system, root is best
  • [password] is the password of the above user
  • [community] is the SNMP community name
  • [hostname] is the hostname or IP address of the management station.

NOTE: If you do not specify a user name and password, you are prompted to specify one.

To enable SNMP do this:

vicfg-snmp.pl --server --username --password -E

To view the SNMP configuration do this:

vicfg-snmp.pl --server --username --password -s

To test the SNMP configuration do this:

vicfg-snmp.pl --server --username --password -T

Enable Remote Root Access

If you have ESX rather than ESXi you have access to a command line. A pretty standard RHEL-compatible environment with some VMWare specific additions. Most binaries that run under RHEL will run in ESX, something Dell counts on for firmware and BIOS updates. Like Linux you can log in from a remote SSH session as root, but you will have to enable it first.

Check with your IT Security folks if turning on remote root access is allowed. It is a security vulnerability and as such is turned off by default. In my experience well over half of the people I dealt with had it enabled.

If you can’t enable remote root access then you just have to elevate privileges to root once logged in. Do that with this command:

su -
that is the superuser command a space and a dash. This elevates you to root and brings the environment along with you. You will be prompted for the root password.

Now that we have that out of the way and you are now the root user, edit the sshd_config file, like this:

vi /etc/ssh/sshd_config

Scroll down till you see PermitRootLogin. Change the “no” to “yes”. When you have done that quit and save

ESC :wq

All that remains to be done is restart the SSH service so that it re-reads the sshd_config file and allows remote root access from anywhere on your network.

service sshd restart

If you don’t already have an SSH client, Putty is nice.

And for transferring files back and forth WinSCP is much better than the datastore browser in the Vsphere client.

And if you have the “no command line” ESXi, there is hope, check this out.

Intel Quad Port NIC

Generally speaking ESX has drivers built into the OS for all the hardware that comes with the server. When you need drivers VMware is your source. One of the more common add on cards is the Intel four port gigabit NIC, the 82575 and 82576. Strangely enough the drivers are sometimes not installed in a system with a factory installed OS and this NIC. Fortunately VMware has you covered with drivers.

ESX 4 Intel 82575/82576 NIC Drivers from VMWare.

Management Services

Sometimes when you have lost your connection to your ESX server but the VMs are running fine all you need to do is restart the management services. Access the console and at a command line (directly or via SSH) issue these two commands, one after the other.

service mgmt-vmware restart

service vmware-vpxa restart

service vmware-vmkauthd restart

In many cases that will get your VSphere or VI client to find the errant server.

If you have ESXi, just access the console for the DCUI. There is an option there to restart management services.

Messages Log

Want to see what your ESX host is up to? Press Alt F12 on the console window. That gives a list of all kernel messages. Scroll up/down with arrow keys latest messages at bottom.

PERC and Hardware Logs

Want your RAID and Hardware logs? You can get this information using Dell’s DSET utility. However there is a utility built into ESX so no need to download anything. You can get a VM-Support report from your VSphere client by selecting File → Export → Export System Logs, or type vm-support at a command prompt, When that is done open up the zipped bundle and look for these files

For your PERC Raid Card:

/tmp/lsi.xxxxx.log
xxxxx is a random array of numbers

The SEL or Hardware Log:

/var/log/ipmi/0/sel

The vm-support logs are of great value to those people at Dell and VMWare who have the technical knowledge to extract all the info in them. It is an invaluable resource. But for most of us is is total information overload. VM-Support exports all of the normal log files too, some of which is readable by mere mortals.

/tmp Redirected text of console commands
/tmp/vmware Version of ESX
/var/log/vmkernel Robust logging of most activities
/var/log/vmkwarning Warning and error lines from vmkernel log
/etc/vmware/esx.conf Configuration file for the vmkernel
/tmp/df – Reports filesystem disk space usage and as always it is VERY important that / is not full

And like in Linux you can run commands to get some Dell-specific information, like:

Hardware information from the system board. Includes BIOS, Service Tag, BMC, etc.

/tmp/dmidecode

If OMSA is installed, vm-support will also run the omreport commands to gather Alert logs, Cmd logs, Esm logs, Fan information, Memory information POST logs, Temperature Information

/tmp/omreport

Poor ESX Performance?

ESX performance is bad, services won’t start, or system hangs. These are all symptoms of / being full. Check df (like under Disk/Drive Info above) to make sure there is free space and clean up files if there is not. For example you could use this to locate files over 200 MB:

find / -mount -size +204800k -exec ls -hl {} \;

Strange Behavior

So you can’t add a host to Virtual Center. Advanced features like HA, DRS, and VCB are not working correctly.

Many aspects of ESX rely heavily on name resolution. Make sure DNS is properly configured and working. Ping each host by IP and name from all other hosts to verify this. A good “belt and suspenders” solution is to add entries for all the ESX hosts on your network to the /etc/hosts file on on all of your ESX hosts.

ESXi Licensing Information

Dell servers that ship with ESXi pre-installed and licensed on rare occasions don’t have the license information included on that bright yellow or orange paper. Even if you did get that piece of paper I would advise you to do this anyway as once you reinstall getting your licensing information out of VMWare is an exercise in frustration.

Press F2
At the ESXi console enter the DCUI.
Select VIEW SUPPORT INFORMATION

The License is there. But it is not really the serial number it’s the Partner Activation Code (PAC). How do you turn this into a serial number? Go here.

Plug in the partner activation code. that will redeem a license activation code (LAC) then you can log in to the regular vmware license portal, and activate the LAC for your enterprise license.

If that does not work you will need an OEM Part Number. And that requires a Partner Activation Code Replacement Form available from Dell.

A Few Handy ESX PDFs

Storage/San Compatibility Guide

I/O Compatibilty Guide

ESX Configuration Guide

Resource Management Guide

Fibre Channel SAN Configuration Guide

Configuration Maximums for VMware Infrastructure 3

Dell OMSA on ESX


One More Thing…

One thing I came away from during my year plus on the phones in Dell Server Support (High Complexity Alternate OS division, or AltOS). You’d be hard pressed to find a better group of (mostly) guys who will do their damnedest to solve your problems. I have heard lots of complaints out here on teh Intarwebs about Dell tech support. Damn few are about the AltOS guys. Should I ever be in a position to purchase server hardware you can be damn sure I’ll buy a Dell.

February 2011