Category Archives: Work

Time Is Of The Essense

We’ve been working on our current software project for about a year and it has been one hell of a ride. With less than a week to go before we flick the switch, we’re going through our final stages of preparation – lots of t’s to cross and i’s to dot but it sure is exciting.

Are Daily Backups Really Sufficent?

Monday afternoon we had a critical failure of an Oracle database at work. Within a few minutes of the fault taking place, I started seeing block corruption errors whilst I was reviewing some information in the production environment. At this stage, I was thinking that we might have dropped a disk in the SAN but referred it onto our database administrator to rectify it.

As is quite common, our environment consists of multiple Oracle 10g RAC nodes connected into a shared data source. The shared data source in this instance is a SAN, where we have a whole bunch of disks configured in groups for redundancy and performance. As soon as the database administrator became involved, it became apparent that we didn’t drop a single disk but had in fact lost access to an entire group of disks within the SAN.

Due to the manner in which the SAN and Oracle are configured, we were not in a position where running in a RAID environment was going to help. If we had dropped a single disk or a subset of disks from any group within the SAN, everything would have been fine; unfortunately we dropped an entire disk group. The end result of this was that we were forced to roll back our database to the previous nights backup.

The following days have been spent recovering the lost days data through various checks and balances; but it takes a lot of time and energy from everyone involved to make this happen. We’ve been fortunate enough to trade for several years without ever needing to roll back our production database due to some sort of significant event; which I suppose we should be thankful for.

After three years without performing a production disaster recovery, had we become complacent about data restoration and recovery as haven’t really needed it before? I believe that since we haven’t had a requirement to perform a disaster recovery for some three years, that our previous data recovery guidelines have now become out of date. Whilst a daily backup may have been more than sufficient for this particular database two or three years ago, the business has undergone significant growth since that time. The daily changeset for this database is now significant enough that, whilst having a daily backup is critical – it requires significant amounts of work to recover all of the data in a moderate time frame.

As a direct result of this disaster, we’re going to be reviewing our data recovery policies shortly. The outcome of that discussion will most likely be that we require higher levels of redundancy in our environment to reduce the impact of a failure. Whilst it would be ideal to have an entire copy of our production hardware, it probably isn’t going to be a cost effective solution. I’m open to suggestions about what sort of data recovery we implement, however I think that having some sort of independent warm spare may win out.

What have we learned out of this whole event:

  • daily backup of data is mandatory
  • daily backup of data may not be sufficient
  • verify that your backup sets are valid, invalid backup data isn’t worth the media it is stored on
  • be vigilant about keeping data recovery strategies in step with business growth and expectations

Maybe periodic disasters are actually healthy for a business? Whilst every business strives to avoid any sort of down time, I expect that as a direct result of the typically high availability of certain systems that disaster recovery isn’t put through its paces often or rigorously enough; which may result in longer downtimes or complete loss of data when an actual disaster recovery is required.

Learning How To Scale An ASP.NET Application

For the last nine months, the development team at Stella Hospitality Group have been working on integrating a new piece of software into the enterprise. Throughout that process, we’ve come up against various stumbling blocks and subsequently learned how to climb over them.

One of the interesting parts of this project involved learning how to scale an ASP.NET web application. Unlike most other pieces of development we’ve previously worked on, we didn’t have access to hardware and services that were capable of delivering smoking performance (read: Oracle 10g clustered using RAC). As a by product of the constraints which were placed on us, scaling the new web application proved a little harder than it first looked.

Over the course of the next few weeks, I’ll be posting about various steps which we’ve taken to scale our ASP.NET application. Some of the points are hinged in the physical world, others operational and of course technical as well. Items which come to mind immediately include:

  • load balancers
  • clustering physical servers
  • clustering web servers
  • web gardens
  • user interface process control
  • session handling
  • web services & XML
  • spike testing

Changing Of The Guards

The company I work for, Stella Resorts Group, have been going through some massive change, most of which is a good thing™.

Over the last five years, BreakFree has experienced explosive growth. BreakFree was initially a small business with a handful of employees, which five years later has transformed into a publicly listed company with over 500 employees. At the start, the business had access to only a very small number of properties to sell through, while it currently has access to 115 properties spread from North Queensland down to Tasmania and into New Zealand. In the last 18 months, BreakFree has been purchased by the Gold Coast based funds management company MFS Limited (ASX Stock: MFS) to complement their extensive leisure assets. After the acquisition, Stella Resorts Group was created to act as an umbrella over all of the MFS leisure assets, including BreakFree, Peppers, Mantra, Bale, Falls Creek, Mount Hotham and others.

During the same time frame, BreakFree has seen two CEO’s take the hot seat and a third take the reigns since Stella Resorts Group was formed. We have a new CFO and Financial Controller, along with a new COO. A CIO was introduced into the company and a new General Manager for sales appointed. Coming down the chain a little bit, there has also been a IT infrastructure manager and a software development manager slotted in. The infrastructure side of the fence has increased from one up to six with more on the way, while the development has increased from two to nine. Along the way, the rest of the company has also grown proportionately with approximately 120 staff in the head office at the Gold Coast.

A little closer to home and in the last two months, a contractor who was essentially permanent didn’t have his contract renewed. One of our longest standing developers, who has seen the company grow from its infancy has also moved on after four years. After being with the company for a year, we’ve had our two other developers leave for a change along with our development manager also.

Without any doubt, all the changes collectively have created a lot of flux. I personally see it as a good thing™, as it allows the company to put the right people in the right seats to take the company forward. Once some of the dust settles and bodies are replaced, there is going to be a lot of round table discussion on who, what, where, when and how the next business directives are going to be executed. There is a huge amount of change for the business on the horizon which I can’t wait for it to unfold.

Exciting times ahead.