Quality of Service of Crash Recovery Failure Detectors Project Proposal

Project background

Distributed systems are widely used across these days and the performance of the distributed system can be measured against different factors and dependability is one of the important factors among them. As the usage of distributed systems has increased a lot worldwide, there are different ranges of fault tolerant mechanism to ensure the exact dependability across the distributed systems. There are many research papers submitted against the dependability of the distributed systems and still it is a great research area. There are different fault tolerant techniques that were introduced to achieve the desired dependability across the distributed systems and to enhance the quality of the service oriented applications across the distributed systems. Apart from the tremendous efforts kept towards achieving the dependability for the distributed systems, still there are many problems across the service oriented applications like detecting the exact failures across the distributed systems and also identify the failed components across the application implementation.

Among many recovery techniques in place to identify the failures, crash recovery has a great importance and there are different models presented across the crash recovery aspect. Detecting the failures across the distributed systems is the common task executed and apart from that the key achievement should be to provide the desired quality of service (QoS) against the failure detection techniques and in specific the crash recovery. In general a simple crash recovery should contain few aspects like identifying the affect on the target, level of effect on the target and observing few aspects like whether the target can be recovered or self healed and the perfect crash recovery should focus on these aspects to detect the exact crash recovery measures.

In general, whenever there is a crash across the distributed systems, the crash recovery program should stop the joining of the new members to the distributed system and once the recovery process is completed, the joining of the new members should be allowed and this can be considered as the key step across implementing the perfect crash recovery and also to achieve a perfect quality of service. To implement a perfect crash recovery across the distributed systems, always self-repairing and self-healing techniques should be implemented and also the level of this self-healing can be used to measure the accuracy and reliability of the crash recovery implementation across the distributed systems.

In general across the global scale distributed systems, there could different types of crashed and detecting the crashes and recovering them is really a tedious job with respect to providing the quality of service metrics. In this project, the quality of service requirements for a perfect crash recovery program is evaluated with respect to mean time to failure (MTTF) and mean time to recovery are estimated and the actual aim and objective of this project are as given below

Aim: To evaluate the quality of service requirements for crash recovery models across the distributed systems

Objectives

To critically review the distributed systems and their working methodologies
To review and investigate various crash detection and recovery techniques and evaluate their performance and limitations
To design the quality of service metrics like mean time to failure (MTTF) and mean time to recover (MTTR) across the crash recovery techniques for the distributed systems
To implement the proposed design using the dot net coding
To evaluate the results against the MTTR and MTTF parameters and evaluate the quality of service requirements for the crash recovery across the distributed systems.

How the objectives will be achieved

Literature review is prepared by referring the academic articles, journals, publications, ACM digital library and few websites
Various aspects like the review on distributed systems, crash detection, failure detection and crash recovery techniques are analyzed using the references as mentioned above.
Different quality of service requirements are analyzed against the crash recovery technique for distributed systems and they are reviewed
Design of the application is done using dot net framework and the front end and back end are designed
All the required screens are designed and the required database is created using SQL server
Proposed system is developed using dot net coding
Results are evaluated and the performance of the crash recovery technique is studied against the quality of service requirements
A sample distributed system with multiple applications running is considered and the system is eventually crashed at the programming level and the corresponding crash recovery module is invoked and the required quality of service parameters like MTTF and MTTR are evaluated across the implementation part
Results are evaluated against the aims and objectives of this project with reference to the future work to be done by concluding the actual work done.

Reasons for selecting this project

Distributed systems are widely used across these days and even the range of application support for the distributed systems is high. Crash recovery and the corresponding quality of service requirements are really essential to estimate the performance and dependability of the distributed systems. I am interested in developing a perfect crash recovery technique with reference to the required quality of service metrics and thus I have chosen this topic.

Resource requirements

Hardware Requirements:

PROCESSOR : PENTIUM IV 2.6 GHz

RAM : 512 MB

Software Requirements:

Software requirements

Operating System : Windows XP/2003 Server

Coding Languages : C#.Net, VB.Net,

Data Base : SQL Server 2005

Front End : Microsoft Visual Studio .Net 2008

Project Plan

Task	Description	Start Date	End Date	Duration
Introduction	Basic introduction to the project along with the problem definition and aims and objectives are defined in this section.			1 week
Literature review	Following topics are covered under literature review Review on distributed systems and different crash recovery techniques Evaluation of various crash detection and recovery systems and their limitations Review on quality of service requirements to implement a perfect crash recovery technique			4 weeks
Design	Following design aspects are covered under this section Front end design of the application Database design Modules design			3 weeks
Coding and Implementation	Following aspects are covered under the implementation section Coding all the modules Establishing the database connections Implementing the modules Explanation to the important coding functions			4 weeks
Evaluation of results	Results achieved after running the individual scenarios and after comparing the scenarios are explained in this chapter with reference to the aims and objectives of this project.			2 weeks
Conclusion and Future work	Conclusion from the total work done and the results evaluation is given and the better ways to improve the project in future are also explained.			1 week

Deliverables

Following are the project deliverables

Initial report contain the following deliverables

Introduction to the project and problem definition
Literature review on distributed systems, crash detection and crash recovery techniques
Review on various quality of service metrics to measure the performance of the crash recovery systems

Interim report contain the following deliverables

Detailed explanation to the front end design
Detailed explanation to the database design and tables used
Explanation to the QoS metrics chosen like MTTF and MTTR across the crash recovery technique

Final project report holds the below information

Information from both the initial and interim report
Implementation procedure
Results and explanation
Conclusion and future work to be done
References used

Final product holds the below information

Front end of the application with rich user interface
Business logic with a sample distributed system
Crash detection module
Crash recovery module
QoS metrics module
Database and the corresponding tables

Paper Submitted & Written by Sathish Nagarajan