Rancid email notification issues

Just spent a few days getting RANCID setup on one of my monitoring servers to backup my device configs on a daily basis. Whilst setting it up I followed a number of guides to get my config files setup and checked. The one thing I couldnt get to work however was the email when RACID detected a config change on one of the network devices.

Scouring the Internet I couldnt find what I had missed. Postfix was setup correctly and I could use the aliases I setup in /etc/alises if i “telnet localhost 25” and mail was delivered. In the end looking at the update logs I could see a line saying it couldnt find sendmail.

A quick look at racnid_control and I updated the lines that referenced sendmail to include a full path to /usr/sbin/sendmail and low and behold my inbox was full of config changes this morning.

I’m sure that if I was able to get the money to buy Opsview Enterprise I would make full use of the RANCID module within this but for the moment this works well enough for me.

My next goal is to get SNMP Trap processing setup so that if the appropriate trap is received from a monitored device it will pull the latest config down and we will always have the latest config.

Opsview users – beware javascript-common package

I have just come across an interesting issue with my production Opsview server where the web interface was loading successfully on http://<server>:3000 however http://<server> through the apache proxy was not working following an upgrade from Ubuntu 8.04 to 10.04

The upgrade process went smoothly and everything looked OK having reinstalled Opsview (the upgrade process will uninstall the package but the data is kept in the databases) except for the fact that I couldnt expand any of the menus across the top of Opsview and a small box had appeared under the search bar.

Opsview 3.7.0 with Javascript Error

A bit more research in my browser showed the following errors:

Opsview errors

Following on from this I popped a quick email to the opsview-users distribution group. to find out if they were aware of any issues with the upgrade process. I reviewed the Apache proxy config and replaced this with the stock one from the new Opsview install. This didnt help.

Next I ran through disbaling and re-enabling the three proxy modules and reloading apache a few times and still no joy.

Feedback from the mailing list suggested that the proxy config was correct but to try accessing the javascript file http://<server>/javascript/prototype.js (this returned a 404 error) and to also look at the apache error logs at the same time.

The logs from apache gave me the following:

I would expect the path for the javascript to be /usr/local/nagios/share/javascript/… and not just /usr/share/javascript. I double checked my apache config and ran through all the configuration files that were included. excluded the /etc/apache2/conf.d directory and reloaded Apache. The result… Opsview loaded and displayed correctly:

Going back through the different files in the directory I came across javascript-common.conf which has the following code in it:

I removed the symlink, re-enabled the conf.d directory in the apache config and all looked good.

Having a quickl look round I couldnt find any reason for the package being installed on my machine so I removed it and restarted apache followed by an apt-get check to see if there were any broken dependencies and there were none.

Upshot of all of this… Unless you want all your Javascript to be in one location dont install the javascript-common package.

T-Mobile UK 2G Data outage

I’ve been having some issues with my clients’ Blackberry handsets this morning and just received the following update from T-Mobile

I am afraid we have currently got an issue with 2g data services which is affecting all BlackBerry. Its currently affecting central and east London.

I dont have any further information at this time but it looks to only be affecting the BB Curves we have and not the Bolds. Looking at a couple of devices they appear to have GPRS back again but I am still waiting on the OK from T-Mobile to confirm everything is working

New Qualification – JNCIA-FWV

Today I sat and passed, after a long time of putting it off, my JNCIA (Juniper Networks Certified Internet Associate) Firewall/VPN Exam.

This now means that I have a qualification in the firewall technology that we are using at work. Hopefully I can play with some of the more funky stuff they use and work towards my JNCIS now 🙂

Check E-Trust Antivirus Definitions

Following on from my Symantec AV check I have written a first version of a similar check for E-Trust virus definitions. The format and structure to the check is the same as this check but it should return the relevant information for Computer Assoicates E-Trust Antivirus product.

For details on installation and configuration please check out the previous post. For the source code please check out the details below. If you wish to download this from Monitoring Exchange please use this link.

Failing hard disk

I had a small shock this evening when I noticed that one of the iSCSI mounts to my lab servers was not working as expected. I could see a folder structure but no data in the folder. I have had issues in the past because the iSCSI mount is a dynamic disk (Yes I know now that I should have left it a basic disk but I have not got enough space to move the relevant data off, covert to basic and move it back again) and when I reboot the server that it mounts to I have to reactivate the disk manually and recreate the appropriate shares. This issue was different.

I logged onto the admin interface for my NAS (Thecus 5200BR Pro) and checked the disk status to find the following screen

I hadnt been notified that my Nas was not 100% healthy so this was quite a shock. Clicking further on the Warning I had the following screen confront me

I think this is why I have some errors on my server. I shut down all the VMs and powered the NAS back on to do a file system check which it seemed to pass but it only checks the file system and not the iSCSI mounts that exist.

I rebooted the NAS again into normal operating mode and powered my ESX servers back on, logged back into my server with the iSCSI mount and reactivated the local disk and shared the folders again. Data was there 🙂

I dont trust that this wont happen again so I have purchsed a further two 1TB Western Digital hard drives from my preferred supplies (www.overclockers.co.uk) and am having them shipped to work so I can get them installed as soon as possible in my lab. I think I am also going to take this as a chance to move the iSCSI mount away from the existing setup and onto a new iSCSI array that is mounted on a dedicated iSCSI LAN and hopefully improve performance a little bit.

Publishing scripts to Monitoring Exchange

As I start to write/modify more checks and scripts for monitoring applications in Nagios/Opsview I have decided to share these as much as possible with the community so they can enjoy, and if necessary, improve the scripts I have written. I have decided to use the MonitoringExchange.org website to host my scripts (as well as detailing them on this blog) as I have found a number of good scripts here that do what I wanted them to.

All the scripts should appear as projects under my profile (wibble) with a link back to the same script on the blog here.  I will also endeavour to post the link to Monitoring Exchange in the bottom of the blog post.

Nagios/Opsview: Check Symantec AV Definitions

This morning whilst deploying a modified version of the Symantec Anti-Virus check from MonitoringExchange.org I noticed that on my 64-bit hosts that the check was not returning the correct data and instead of the expected output I was receiving the following error code:

Initially I thought this could be a change due to the new installs being Symantec Endpoint Protection compared to the previous times I had implemented this using Symantec Anti-Virus 10.x but the SEP installs on the 32-bit systems were working fine however the 64-bit versions were not.

A quick look in the registry showed me that the value that is read by the script is not there on the 64-bit version and has been moved to another location (HKEY_LOCAL_MACHINESOFTWAREWow6432NodeSymantecSharedDefsDefWatch). I sat down with the script and quickly wrote in some extra code that would allow me to change the search path depending on the Operating System Architecture. I also added in some more error checking so if the key didnt exist then rather than exiting with an OK status it returns an UNKNOWN status and a relevant error message.

As I use NSClient++ to enable me to monitor my Windows servers I simply save the script to the NSClient++scripts folder and add the following line into my NSCI.ini under [NRPE Handlers]

Then from within Nagios or Opsview define the command for check_nrpe

check_nrpe -H $HOSTADDRESS$ -c check_av -a 2 3

The full script is listed below and is also available on Monitoring Exchange (link):

Website Migrated and Theme issues

Whilst trying to make the home page and blog page look the same I managed to break the wp-admin section of my blog. Thanks to the guys at Loho.co.uk they have migrated my whole site to a new platform where I can administer the site more efficiently and I have been able to remove the faulty theme. Hopefully I can fix what I broke but for now its going to be the standard WP theme.

Making Windows Mobile work with Self-Signed certificates

If you try to synchronise a Windows Mobile PDA with Exchange Direct Push using SSL and the certificate is not issued by a Certification Authority (CA) that is in the PDA’s trusted certificate list then the device will not activate. Most commonly I have come across this with SBS servers that use the default self-signed certificate.

The solution should always be to purchase and install a certificate that is issued by a trusted CA to overcome the issue and the PDA will start to work automatically in these cases. If however you don’t want to purchase the certificate then you can bypass the security checks that Windows Mobile imposes on Active Sync. To do this requires you to install the certificate on the PDA and modify the registry to accept the installed certificate as a trusted one.

As each time I have done this I haven’t had the relevant PDA in front of me I have found a useful tool, that saves you trying to talk the end user through making the changes themselves, called My Mobiler (http://mymobiler.com/) which lets you interact with the PDA from your desktop.

  1. Install the certificate on the PDA
    1. Browse to your Outlook Web Access URL in Internet Explorer and save the certificate locally to your desktop by clicking on the padlock icon
    2. Connect the PDA via USB to the PC and allow Active Sync to connect.
    3. Click Explore Device in Active Sync and copy the certificate to the folder that is open
    4. Open File Explorer on the PDA and click on the certificate (it should be in My Documents)
    5. You will likely receive errors that the certificate is not trusted. Click More and then Install
    6. You should receive confirmation the certificate has been installed successfully.
  2. Install PHM RegEdit on your PDA
    1. There are a number of places to download the .cab file on the Internet (link) save this to your desktop
    2. With the PDA connected Explore the device again and copy the .cab file to the device
    3. Open File Explorer and click on the .cab to install it (again it should be in My Documents)
    4. When prompted that the installer cannot be verified click Install
  3. Apply the registry fix
    1. Click Start and select Programs. Scroll down and click on PHM Registry Editor
    2. Expand the following path: HKEY_CURRENT_USERSOFTWAREMicrosoftActiveSyncPartners
    3. You will see a list of GUID keys. Search through these for the one that contains the name “Microsoft Exchange” this is the key you need to modify
    4. Click Edit and select new DWORD
    5. Name the DWORD “Secure” and leave the value as 0
    6. Exit the Registry Editor

If everything has worked correctly your PDA should now synchronise with Exchange