1 00:00:00,867 --> 00:00:02,934 Can Swarm handle failure? 2 00:00:03,967 --> 00:00:07,501 One word answer is, yes, it can. 3 00:00:07,867 --> 00:00:10,934 But more interesting part is, how? 4 00:00:11,767 --> 00:00:14,801 Let's take the previous example of the service running 5 00:00:14,801 --> 00:00:18,234 three replicas of Nginx, each hosted on one worker 6 00:00:18,601 --> 00:00:22,767 or master and our workers are healthy and running. 7 00:00:23,034 --> 00:00:25,701 What if one of the workers go down? 8 00:00:25,701 --> 00:00:29,634 Let's say in this case, worker 3 went down. If that 9 00:00:29,634 --> 00:00:32,933 happens, the task 3 will be sheduled on one 10 00:00:32,933 --> 00:00:36,034 of the other workers. Once worker 3 is back to 11 00:00:36,034 --> 00:00:39,534 its running state, task three might get moved back to it 12 00:00:39,534 --> 00:00:42,834 and if it's not causing any overload on worker 2, 13 00:00:42,834 --> 00:00:45,667 it may just stay there and worker 3 14 00:00:45,667 --> 00:00:47,834 might be ready to host other tasks when they 15 00:00:47,834 --> 00:00:51,067 arrive in future. In a nutshell, if one of the 16 00:00:51,067 --> 00:00:54,501 nodes go down, the other nodes can handle its load. 17 00:00:54,501 --> 00:00:57,067 If the master goes down, though, the workers 18 00:00:57,067 --> 00:00:59,867 perform a mutual election, where one of the 19 00:00:59,867 --> 00:01:02,667 workers gets promoted and the cluster starts 20 00:01:02,667 --> 00:01:06,733 working again. The next question would be how many 21 00:01:06,733 --> 00:01:09,801 nodes can go down without affecting Swarm? 22 00:01:09,967 --> 00:01:13,634 Well, to make sure that the Swarm cluster functions 23 00:01:13,634 --> 00:01:17,967 properly, at least more than half of the nodes should be working. 24 00:01:17,967 --> 00:01:20,867 Minimum number of required working nodes 25 00:01:20,867 --> 00:01:23,534 for a happy Swarm cluster is equal 26 00:01:23,534 --> 00:01:27,967 to the number of total nodes divided by 2 plus 1, 27 00:01:27,967 --> 00:01:30,334 which again, means more than half. 28 00:01:30,334 --> 00:01:31,701 [No Audio]