1 00:00:06,980 --> 00:00:08,510 - In this video, I want to talk 2 00:00:08,510 --> 00:00:11,940 about managing failure using the fail module. 3 00:00:11,940 --> 00:00:13,570 So what is going on? 4 00:00:13,570 --> 00:00:16,270 Well, Ansible, by nature looks at the exit status 5 00:00:16,270 --> 00:00:19,410 of a task to determine whether it has failed or not. 6 00:00:19,410 --> 00:00:21,180 And this may lead to failures occurring 7 00:00:21,180 --> 00:00:23,644 when nothing really is going wrong. 8 00:00:23,644 --> 00:00:26,570 When any task fails, Ansible aborts the rest 9 00:00:26,570 --> 00:00:30,490 of the play on that host and continues with the next host. 10 00:00:30,490 --> 00:00:32,640 There are a couple of solutions that you can use 11 00:00:32,640 --> 00:00:36,920 to change the default behavior of dealing with failures. 12 00:00:36,920 --> 00:00:39,720 In Ansible, you can use ignore_errors 13 00:00:39,720 --> 00:00:44,150 in a task or in a play header to ignore_failures. 14 00:00:44,150 --> 00:00:47,090 And you can use force_handlers in a play header 15 00:00:47,090 --> 00:00:49,220 to force a handler that has been triggered 16 00:00:49,220 --> 00:00:51,910 to run even if another task has failed. 17 00:00:51,910 --> 00:00:54,275 We've already seen the last one. 18 00:00:54,275 --> 00:00:57,163 You can also define your own failed status. 19 00:00:58,070 --> 00:01:01,250 As Ansible only looks at the exit status of a failed task, 20 00:01:01,250 --> 00:01:03,580 it may think that the task was successful, 21 00:01:03,580 --> 00:01:05,532 but this is not the case. 22 00:01:05,532 --> 00:01:09,604 To be more specific, you can use failed_when. 23 00:01:09,604 --> 00:01:11,840 That allows you to specify what to look for 24 00:01:11,840 --> 00:01:15,510 in command output to recognize failure. 25 00:01:15,510 --> 00:01:17,900 And there's also the fail module. 26 00:01:17,900 --> 00:01:20,550 The failed_when keyword can be used in a task 27 00:01:20,550 --> 00:01:23,190 to identify when a task has failed. 28 00:01:23,190 --> 00:01:25,440 The fail module can be used to print a message 29 00:01:25,440 --> 00:01:29,070 that informs why a task has failed. 30 00:01:29,070 --> 00:01:31,780 To use failed when or fail the result 31 00:01:31,780 --> 00:01:33,340 of the command must be registered, 32 00:01:33,340 --> 00:01:37,170 and the registered variable output must be analyzed. 33 00:01:37,170 --> 00:01:39,990 When using the fail module, the failing task 34 00:01:39,990 --> 00:01:42,940 must have ignored_errors set to yes, 35 00:01:42,940 --> 00:01:45,200 because the essence of the fail module 36 00:01:45,200 --> 00:01:46,600 is that you want to do something 37 00:01:46,600 --> 00:01:48,160 if a failure occurs. 38 00:01:48,160 --> 00:01:50,720 And if you want to do something after the failure, 39 00:01:50,720 --> 00:01:52,950 well, it's mandatory to use ignore errors, 40 00:01:52,950 --> 00:01:56,230 because otherwise play execution will be aborted 41 00:01:56,230 --> 00:01:58,040 on that specific host. 42 00:01:58,040 --> 00:02:00,323 Let's go check out a couple of examples. 43 00:02:02,230 --> 00:02:06,980 So to start with, let's have a look at failure.yml. 44 00:02:06,980 --> 00:02:08,620 So what is this? 45 00:02:08,620 --> 00:02:12,573 This is a pretty simple example of failed_when. 46 00:02:13,860 --> 00:02:15,770 And in this example, we will run, 47 00:02:15,770 --> 00:02:18,610 well, it's not really a script, we run a command. 48 00:02:18,610 --> 00:02:20,393 The command is echo hello world. 49 00:02:21,350 --> 00:02:24,890 We use ignore_errors because we want the play to continue 50 00:02:24,890 --> 00:02:27,073 to the next test, no matter what. 51 00:02:28,090 --> 00:02:30,130 Then we register command result, 52 00:02:30,130 --> 00:02:32,090 and we print failed_when, 53 00:02:32,090 --> 00:02:34,720 world in command_result.stdout. 54 00:02:34,720 --> 00:02:37,170 You have already worked with register. 55 00:02:37,170 --> 00:02:40,290 So hopefully you have seen that anytime you use register, 56 00:02:40,290 --> 00:02:44,040 register creates a variable name.stdout 57 00:02:44,040 --> 00:02:46,700 with the result of the actual command. 58 00:02:46,700 --> 00:02:51,470 Now, it needs no imagination to imagine 59 00:02:51,470 --> 00:02:55,610 that hello world is probably going to have the text, 60 00:02:55,610 --> 00:02:59,090 world in command_result.stdout. 61 00:02:59,090 --> 00:03:01,410 But here we are telling Ansible that it needs 62 00:03:01,410 --> 00:03:04,360 to fill if it finds this specific text. 63 00:03:04,360 --> 00:03:06,190 Of course that doesn't make much sense, 64 00:03:06,190 --> 00:03:09,890 but it allows us to understand how it is working. 65 00:03:09,890 --> 00:03:13,520 Then, the next, task is debug. 66 00:03:13,520 --> 00:03:17,050 Message hello, just to see if we get there. 67 00:03:17,050 --> 00:03:18,020 So let me run it. 68 00:03:18,020 --> 00:03:21,663 Ansible-playbook on failure.yml. 69 00:03:25,030 --> 00:03:28,940 And there you can see that it is failing 70 00:03:28,940 --> 00:03:30,910 because we told it to. 71 00:03:30,910 --> 00:03:32,390 But, we have ignored failure 72 00:03:32,390 --> 00:03:37,350 and we are getting to the next part anyway. 73 00:03:37,350 --> 00:03:39,070 Let's have a look at another example 74 00:03:39,070 --> 00:03:39,983 which is failure2. 75 00:03:41,360 --> 00:03:43,960 So what do we have in failure2? 76 00:03:43,960 --> 00:03:45,480 We are running on all hosts 77 00:03:45,480 --> 00:03:48,900 and we are doing something impossible. 78 00:03:48,900 --> 00:03:52,810 Lvol, lvolv is checking logical volumes. 79 00:03:52,810 --> 00:03:56,510 It's an Ansible module that checks for logical volumes. 80 00:03:56,510 --> 00:03:59,180 So it checks to see if we have logical volume 81 00:03:59,180 --> 00:04:01,240 with the name, lvnothing, and a volume group 82 00:04:01,240 --> 00:04:04,770 with the name vgnothing, and a size of one gigabyte. 83 00:04:04,770 --> 00:04:09,770 And we register the output of the command in command result. 84 00:04:10,040 --> 00:04:12,590 Again, ignore errors, yes. 85 00:04:12,590 --> 00:04:15,340 That is because we want it to continue 86 00:04:15,340 --> 00:04:18,303 regardless of the actual result of the command. 87 00:04:19,280 --> 00:04:21,490 Next, just for debugging purposes. 88 00:04:21,490 --> 00:04:23,840 And I would advise you, I told you before 89 00:04:23,840 --> 00:04:26,460 I always would advise you use debugging 90 00:04:27,440 --> 00:04:30,440 kind of variables to show the result 91 00:04:30,440 --> 00:04:31,910 so that you know, what's going on. 92 00:04:31,910 --> 00:04:34,410 So I want to print the contents 93 00:04:34,410 --> 00:04:37,340 of this variable command_result.error, 94 00:04:37,340 --> 00:04:40,680 and then I'm printing nice error message after the failure. 95 00:04:40,680 --> 00:04:43,120 That is where the fail module is coming in. 96 00:04:43,120 --> 00:04:45,250 And the fail module is printing volume group 97 00:04:45,250 --> 00:04:49,890 doesn't exist, when not found in command_result.error. 98 00:04:49,890 --> 00:04:53,420 The thing is that lvol is trying to work on some volumes. 99 00:04:53,420 --> 00:04:55,820 We are most likely getting an error message 100 00:04:55,820 --> 00:04:59,690 and that is when fail should print a nice error message. 101 00:04:59,690 --> 00:05:01,880 So fail is convenient if you want to get out 102 00:05:01,880 --> 00:05:06,880 of desk execution while showing a nice error message. 103 00:05:10,740 --> 00:05:12,240 So here is the result. 104 00:05:12,240 --> 00:05:16,483 We can see the result of the command, vgnothing, not found. 105 00:05:17,730 --> 00:05:20,930 This is also in command_result.error. 106 00:05:20,930 --> 00:05:23,410 And then we have the nice failure message. 107 00:05:23,410 --> 00:05:25,747 It's still red because it's a failure message, 108 00:05:25,747 --> 00:05:28,590 but the purpose here is that we print something 109 00:05:28,590 --> 00:05:32,353 nice and readable for our users and that was successful.