The purpose of this course is to understand the system models, algorithms and protocols that allow computers to communicate and coordinate their actions to solve a problem. This course helps the learner to understand the distributed computation model and various concepts like global state, termination detection, mutual exclusion, deadlock detection, shared memory, failure recovery, consensus, file system. It helps the learners to develop solutions to problems in distributed computing environment.
Prerequisite: Basic knowledge in data structures and operating systems.
Course Outcomes: After the completion of the course the student will be able to
CO1 – Summarize various aspects of distributed computation model and logical time.
CO2 – Illustrate election algorithm, global snapshot algorithm and termination detection algorithm.
CO3 – Compare token based, non-token based and quorum based mutual exclusion algorithms.
CO4 – Recognize the significance of deadlock detection and shared memory in distributed systems.
CO5 – Explain the concepts of failure recovery and consensus.
CO6 – Illustrate distributed file system architectures.
Syllabus
Module – 1 (Distributed systems basics and Computation model)
Distributed System – Definition, Relation to computer system components, Motivation, Primitives for distributed communication, Design issues, Challenges and applications. A model of distributed computations – Distributed program, Model of distributed executions, Models of communication networks, Global state of a distributed system, Cuts of a distributed computation, Past and future cones of an event, Models of process communications.
Module – 2 (Election algorithm, Global state and Termination detection)
Logical time – A framework for a system of logical clocks, Scalar time, Vector time. Leader election algorithm – Bully algorithm, Ring algorithm. Global state and snapshot recording algorithms – System model and definitions, Snapshot algorithm for FIFO channels – Chandy Lamport algorithm. Termination detection – System model of a distributed computation, Termination detection using distributed snapshots, Termination detection by weight throwing, Spanning-tree-based algorithm.
Module – 3 (Mutual exclusion and Deadlock detection)
Distributed mutual exclusion algorithms – System model, Requirements of mutual exclusion algorithm. Lamport’s algorithm, Ricart–Agrawala algorithm, Quorum-based mutual exclusion algorithms – Maekawa’s algorithm. Token-based algorithm – Suzuki–Kasami’s broadcast algorithm. Deadlock detection in distributed systems – System model, Deadlock handling strategies, Issues in deadlock detection, Models of deadlocks.
Module – 4 (Distributed shared memory and Failure recovery)
Distributed shared memory – Abstraction and advantages. Shared memory mutual exclusion –Lamport’s bakery algorithm. Check pointing and rollback recovery – System model, consistent and inconsistent states, different types of messages, Issues in failure recovery, checkpoint based recovery, log based roll back recovery.
Module – 5 (Consensus and Distributed file system)
Consensus and agreement algorithms – Assumptions, The Byzantine agreement and other problems, Agreement in (message-passing) synchronous systems with failures – Consensus algorithm for crash failures. Distributed file system – File service architecture, Case studies: Sun Network File System, Andrew File System, Google File System.
Text Books
1. Ajay D. Kshemkalyani and Mukesh Singhal, Distributed Computing: Principles, Algorithms, and Systems, Cambridge University Press, 2011.
Reference Books
1. George Coulouris, Jean Dollimore, Tim Kindberg and Gordon Blair. Distributed System Concepts and Design, Addison Wesley, Fifth edition.
2. Kai Hwang, Geoffrey C Fox, Jack J Dongarra, Distributed and Cloud Computing – From Parallel Processing to the Internet of Things, Morgan Kaufmann Publishers, 2012.
3. Sukumar Ghosh, Distributed Systems: An Algorithmic Approach, CRC Press, Second edition, 2015.
4. Maarten Van Steen, Andrew S. Tanenbaum, Distributed Systems, Prentice Hall of India,Third edition, 2017.
5. Randy Chow and Theodore Johnson, Distributed Operating Systems and Algorithm Analysis, Pearson Education India, First edition, 2009.
6. Valmir C. Barbosa, An Introduction to Distributed Algorithms, MIT Press, 2003.
Requirements
- Basic knowledge in data structures and operating systems.
Target audiences
- Final year students of B. Tech Computer Science and Engineering