Failure Recovery in Distributed Systems

Failure recovery programs are driven with respect to the requirements and behavior of the faults across the systems. There are different cases to be considered against the common failures across the distributed systems and there are the possible solutions suggested as well. Following are the few cases identified and the corresponding solutions suggested across the literature.

If the server is down and the client can’t locate the server across the distributed systems, then a simple exception handling can be done and this may not be considered as the feasible solution as most of the programming languages use complex logic across the exception handling mechanism.

If the request of the client is lost to the server across the distributed systems, a simple timeout operation can be used to await the clients till the server responds and if the time is exceeded, the request is reinitiated. This can’t be considered as the optimal implementation as there could be chances of performance degradation due to the timeout operations and also there could be chances of idempotent operations across the distributed systems. In most of the cases, the servers may be crashed once they receive the request from the clients and if this case, the clients keep on waiting for the reply from the server.

The best possible solution for this case is that, rebuilt the server and also rebuilt the client to make the requests of the clients successful. There could be chances where the server reply to the client is lost before reaching the desired client and the server would not be aware of these situations and keep on pinging the client for the required acknowledgment. The usual solution implemented across this failure is that the client sets a maximum time limit and if the time is exceeded, it assumes the server is lost or busy.

Thus there are different failures and corresponding possible solutions identified across the distributed systems and these solutions as discussed not optimal in nature and can work fine for small range of failures and they used to fail in case of major failures across the server of a typical distributed system.

This paper is written and submitted by sai

Drew Helzermen on Implementation of Telemetry Link on FPGAFebruary 13, 2023
I'm interested in using this for my final project, please send code related to this project.
divya on Health Prediction Management SystemFebruary 12, 2023
how to download
Gokul on Security for Lost or Misplaced device – iLocateFebruary 10, 2023
Tell me the software requirements and wt are the application required to create this app if possible help me or…
Gokul on Security for Lost or Misplaced device – iLocateFebruary 10, 2023
How to do this project what are the software requirements plz can u tell this will help for my final…
Xavi on Project Management System Vb.Net ProjectFebruary 9, 2023
a good project

Related Projects

Leave a Reply Cancel reply