Tuesday, September 22, 2009

Guardian: a fault-tolerant operating system

This chapter gives a very nostalgic trip down memory lane, and has much more of a hardware-themed contents than the earlier papers that we have come across so far. It is interesting to note how Tandem handled the limitations of address space. I am not clear as to why later versions of Tandem considered duplication of memory as an option. Also now that we are on the subject of building reliable systems, it seems to me that all kinds of inefficient solutions to providing component dependability such as freezing a CPU in cases of failure etc, had been under consideration. In a further paper by Lee et al, the causes of software failure in Guardian are analysed in detail. The authors found out that 77% of software failures are caused by software problems themselves. It looks to me that the single failure tolerance of the guardian system is not actually beneficial. Another study found that memory management is the main source of software problems in Guardian. Guardian seemed to have performed better in terms of number of faults, as compared to the pre-existing machines of that time, mainly IBM and VAX. Given the detail to which this chapter analyses the strengths and weaknesses in the design of Guardian, I think it should still serve as a case in-study(inspite of the era in which it was developed) in the design of similar systems now.

No comments:

Post a Comment