Distributed Computing

Overview
Curriculum
Instructor
FAQs

The purpose of this course is to understand the system models, algorithms and protocols that allow computers to communicate and coordinate their actions to solve a problem. This course helps the learner to understand the distributed computation model and various concepts like global state, termination detection, mutual exclusion, deadlock detection, shared memory, failure recovery, consensus, file system. It helps the learners to develop solutions to problems in distributed computing environment.
Prerequisite: Basic knowledge in data structures and operating systems.

Course Outcomes: After the completion of the course the student will be able to

CO1 – Summarize various aspects of distributed computation model and logical time.
CO2 – Illustrate election algorithm, global snapshot algorithm and termination detection algorithm.
CO3 – Compare token based, non-token based and quorum based mutual exclusion algorithms.
CO4 – Recognize the significance of deadlock detection and shared memory in distributed systems.
CO5 – Explain the concepts of failure recovery and consensus.
CO6 – Illustrate distributed file system architectures.

Syllabus

Module – 1 (Distributed systems basics and Computation model)

Distributed System – Definition, Relation to computer system components, Motivation, Primitives for distributed communication, Design issues, Challenges and applications. A model of distributed computations – Distributed program, Model of distributed executions, Models of communication networks, Global state of a distributed system, Cuts of a distributed computation, Past and future cones of an event, Models of process communications.

Module – 2 (Election algorithm, Global state and Termination detection)

Logical time – A framework for a system of logical clocks, Scalar time, Vector time. Leader election algorithm – Bully algorithm, Ring algorithm. Global state and snapshot recording algorithms – System model and definitions, Snapshot algorithm for FIFO channels – Chandy Lamport algorithm. Termination detection – System model of a distributed computation, Termination detection using distributed snapshots, Termination detection by weight throwing, Spanning-tree-based algorithm.

Module – 3 (Mutual exclusion and Deadlock detection)

Distributed mutual exclusion algorithms – System model, Requirements of mutual exclusion algorithm. Lamport’s algorithm, Ricart–Agrawala algorithm, Quorum-based mutual exclusion algorithms – Maekawa’s algorithm. Token-based algorithm – Suzuki–Kasami’s broadcast algorithm. Deadlock detection in distributed systems – System model, Deadlock handling strategies, Issues in deadlock detection, Models of deadlocks.

Module – 4 (Distributed shared memory and Failure recovery)

Distributed shared memory – Abstraction and advantages. Shared memory mutual exclusion –Lamport’s bakery algorithm. Check pointing and rollback recovery – System model, consistent and inconsistent states, different types of messages, Issues in failure recovery, checkpoint based recovery, log based roll back recovery.

Module – 5 (Consensus and Distributed file system)

Consensus and agreement algorithms – Assumptions, The Byzantine agreement and other problems, Agreement in (message-passing) synchronous systems with failures – Consensus algorithm for crash failures. Distributed file system – File service architecture, Case studies: Sun Network File System, Andrew File System, Google File System.

Text Books

1. Ajay D. Kshemkalyani and Mukesh Singhal, Distributed Computing: Principles, Algorithms, and Systems, Cambridge University Press, 2011.

Reference Books

1. George Coulouris, Jean Dollimore, Tim Kindberg and Gordon Blair. Distributed System Concepts and Design, Addison Wesley, Fifth edition.
2. Kai Hwang, Geoffrey C Fox, Jack J Dongarra, Distributed and Cloud Computing – From Parallel Processing to the Internet of Things, Morgan Kaufmann Publishers, 2012.
3. Sukumar Ghosh, Distributed Systems: An Algorithmic Approach, CRC Press, Second edition, 2015.
4. Maarten Van Steen, Andrew S. Tanenbaum, Distributed Systems, Prentice Hall of India,Third edition, 2017.
5. Randy Chow and Theodore Johnson, Distributed Operating Systems and Algorithm Analysis, Pearson Education India, First edition, 2009.
6. Valmir C. Barbosa, An Introduction to Distributed Algorithms, MIT Press, 2003.

Curriculum

5 Sections
24 Lessons
12 Weeks

Expand all sectionsCollapse all sections

Module 01 - Distributed systems basics and Computation model
This module provides an in-depth understanding of distributed systems, covering their definition, relationship to computer system components, and motivation for their use. It explores primitives for distributed communication, key design issues, challenges, and real-world applications. The course also delves into models of distributed computations, including distributed programs, execution models, communication networks, and the global state of distributed systems. Concepts such as computational cuts, past and future event cones, and process communication models are analyzed to provide a comprehensive framework for understanding distributed computing.
5
Module 02 - Election algorithm, Global state and Termination detection
This module covers key concepts in distributed systems related to logical time, leader election, global state recording, and termination detection. It introduces logical clocks, including scalar time and vector time, as frameworks for ordering events in distributed environments. The Bully and Ring algorithms for leader election are explored, enabling coordination in distributed networks. The module also delves into global state and snapshot recording, with a focus on the Chandy-Lamport algorithm for FIFO channels. Finally, it examines termination detection, discussing models of distributed computation, snapshot-based detection, weight-throwing techniques, and spanning-tree-based approaches for ensuring correct process completion in distributed systems.
5
Module 03 - Mutual exclusion and Deadlock detection
This module explores algorithms and strategies for ensuring mutual exclusion and detecting deadlocks in distributed systems. It covers fundamental system models and requirements for mutual exclusion, including classical algorithms such as Lamport’s and Ricart–Agrawala’s timestamp-based approaches, as well as quorum-based methods like Maekawa’s algorithm. Token-based solutions, such as Suzuki–Kasami’s broadcast algorithm, are also discussed. Additionally, the module delves into deadlock detection, examining system models, handling strategies, challenges, and different deadlock models in distributed environments.
6
Module 04 - Distributed shared memory and Failure recovery
This module covers the concept of Distributed Shared Memory (DSM) as an abstraction for interprocess communication, along with its advantages. It explores mutual exclusion in shared memory systems, focusing on Lamport’s Bakery Algorithm for ensuring process synchronization. The module also delves into checkpointing and rollback recovery, explaining system models, consistent vs. inconsistent states, message types, and key challenges in failure recovery. Various recovery techniques, including checkpoint-based and log-based rollback recovery, are discussed, addressing critical issues in distributed fault tolerance.
3
Module 05 - Consensus and Distributed file system
This module explores fundamental concepts in achieving consensus in distributed systems. It covers key assumptions, the Byzantine agreement problem, and other challenges in reaching agreement among distributed nodes. It delves into consensus algorithms for message-passing synchronous systems, particularly in the presence of failures, including crash failure scenarios. Additionally, the module examines the architecture and design of distributed file systems, focusing on file service models and key case studies such as Sun Network File System (NFS), Andrew File System (AFS), and Google File System (GFS). It provides insights into how these systems ensure reliability, consistency, and efficiency in distributed environments.
5

Prof. Nisy John Panicker

Nisy John Panicker is an educator with a passion for computer science and programming. As an assistant professor in the Department of Computer Science at Albertian Institute of Science and Technology, Nisy is dedicated to providing high-quality education in programming languages. Her teaching philosophy emphasizes hands-on learning, problem-solving skills, and fostering a collaborative learning environment. She believes in empowering students to explore the intricacies of programming in C while guiding them to develop practical solutions to real-world challenges.

What is the language of instruction?

The language of instruction for this course is English and Malayalam. At this time, the course is not available in other regional languages.

Will the classes be taught live? Will there be any interaction?

No. Pre-recorded lessons will be made available on the portal. Learners can learn from the content released at their own pace.

Do I need any special software to access the course?

No special software is required. A web browser and internet connection are sufficient.

How is the course structured?

The course consists of 5 modules, including video lessons and quizzes.

Will I receive a certificate upon completion?

Yes, a certificate will be awarded after successfully completing the course.

Requirements

Basic knowledge in data structures and operating systems.

Target audiences

Final year students of B. Tech Computer Science and Engineering

Curriculum

Prof. Nisy John Panicker

Requirements

Target audiences

Modal title