1 00:00:00,600 --> 00:00:03,890 - So at this point we're now ready to take the files 2 00:00:03,890 --> 00:00:06,090 representing our two scripts, 3 00:00:06,090 --> 00:00:09,000 and the text file representing Romeo and Juliet, 4 00:00:09,000 --> 00:00:12,090 and load those into our cluster to get ready 5 00:00:12,090 --> 00:00:15,660 to run the Hadoop MapReduce task. 6 00:00:15,660 --> 00:00:17,800 Now you'll notice that I've switched back over 7 00:00:17,800 --> 00:00:20,733 to my web browser window and you can see here 8 00:00:20,733 --> 00:00:23,570 on the portal.azure.com site, 9 00:00:23,570 --> 00:00:26,670 in my notifications tab that it now says that 10 00:00:26,670 --> 00:00:28,930 the deployment has succeeded. 11 00:00:28,930 --> 00:00:31,550 If this has not happen for you yet you will have to 12 00:00:31,550 --> 00:00:35,820 wait to continue on with this part of the presentation. 13 00:00:35,820 --> 00:00:37,190 So I have a couple of options here; 14 00:00:37,190 --> 00:00:40,780 I can either go to the resource, which I'll do in a moment, 15 00:00:40,780 --> 00:00:42,840 or I can pin it to the dashboard, 16 00:00:42,840 --> 00:00:45,320 which just makes it conveniently available 17 00:00:45,320 --> 00:00:47,280 in the Dashboard option, 18 00:00:47,280 --> 00:00:49,079 over here on the left side navigation 19 00:00:49,079 --> 00:00:52,100 on the portal.azure.com site. 20 00:00:52,100 --> 00:00:55,040 So, let me go ahead and click go to resource. 21 00:00:55,040 --> 00:00:57,700 Once I do, let me expand this as well, 22 00:00:57,700 --> 00:01:01,890 once I do I get the management page for my new cluster 23 00:01:01,890 --> 00:01:06,220 that I configured, and for the purpose of this demonstration 24 00:01:06,220 --> 00:01:09,170 the only thing that we really need to be aware of 25 00:01:09,170 --> 00:01:13,470 is this SSH + Cluster login selection. 26 00:01:13,470 --> 00:01:15,900 And the reason that it's important is 27 00:01:15,900 --> 00:01:18,400 when you go to the Hostname and select the new cluster, 28 00:01:18,400 --> 00:01:23,110 it gives you the command that you'll use with the SSH tool 29 00:01:23,110 --> 00:01:26,060 to log into the cluster remotely 30 00:01:26,060 --> 00:01:29,040 using a command line on your system. 31 00:01:29,040 --> 00:01:32,630 Now, if by any chance you're a Windows user, 32 00:01:32,630 --> 00:01:37,550 you may not actually have SSH on your system initially. 33 00:01:37,550 --> 00:01:41,130 So, let me just pull over another browser tab here, 34 00:01:41,130 --> 00:01:45,210 so this is actually a blog post from the Windows 10 team, 35 00:01:45,210 --> 00:01:50,166 they have created an OpenSSH tool that you can install 36 00:01:50,166 --> 00:01:54,830 into your Windows system and if you google this blog post, 37 00:01:54,830 --> 00:01:57,500 or if you go to the website that you see here, 38 00:01:57,500 --> 00:02:00,410 and as you can see the URL is kind of long, 39 00:02:00,410 --> 00:02:02,210 so let me just also scroll over 40 00:02:02,210 --> 00:02:04,840 so you can see the end of it as well. 41 00:02:04,840 --> 00:02:07,620 Let's try that one more time, there we go, 42 00:02:07,620 --> 00:02:09,550 now you can see the end of the URL. 43 00:02:09,550 --> 00:02:12,180 But in any case if you read through this post you'll find 44 00:02:12,180 --> 00:02:16,360 the URL where you'll need to go to install the 45 00:02:16,360 --> 00:02:19,140 OpenSSH tool, and then you'll be able to use 46 00:02:19,140 --> 00:02:23,040 the commands that I'm about to show you in this video. 47 00:02:23,040 --> 00:02:27,030 Okay, so what I'm gonna do here is, for the moment 48 00:02:27,030 --> 00:02:29,670 I'm going to copy part of what I see here, 49 00:02:29,670 --> 00:02:33,838 specifically the SSH user @ and my cluster name, 50 00:02:33,838 --> 00:02:35,940 so let's copy that. 51 00:02:35,940 --> 00:02:38,007 And of course you're going to have to replace 52 00:02:38,007 --> 00:02:41,180 "deitelpython" with whatever name you chose 53 00:02:41,180 --> 00:02:42,890 for your cluster. 54 00:02:42,890 --> 00:02:46,670 Now in order to copy the files over to the server, 55 00:02:46,670 --> 00:02:48,920 I'm going to go out to my terminal window 56 00:02:48,920 --> 00:02:52,844 here on my Mac computer, if you're on another platform 57 00:02:52,844 --> 00:02:56,230 you'll have either a terminal window, a shell window, 58 00:02:56,230 --> 00:02:59,715 or on Microsoft windows you'd use the Anaconda prompt 59 00:02:59,715 --> 00:03:03,870 possibly, or a Command prompt for this particular task. 60 00:03:03,870 --> 00:03:06,010 You'll notice that I have switched into 61 00:03:06,010 --> 00:03:08,120 the ch16 examples folder 62 00:03:08,120 --> 00:03:10,967 and specifically the HadoopMapReduce sub folder, 63 00:03:10,967 --> 00:03:13,100 which has several files in it. 64 00:03:13,100 --> 00:03:16,390 The first three of which are the files that I want to copy 65 00:03:16,390 --> 00:03:20,252 over to the server and I'll use a command called scp, 66 00:03:20,252 --> 00:03:22,978 secure copy, for that purpose. 67 00:03:22,978 --> 00:03:26,440 The first arguments to this command are the files 68 00:03:26,440 --> 00:03:28,290 you wanna move over to the server, 69 00:03:28,290 --> 00:03:31,458 so I'm going to do a length_mapper.py, 70 00:03:31,458 --> 00:03:36,458 length_reducer.py, and the RomeoandJuliet.txt file as well. 71 00:03:37,780 --> 00:03:40,880 And then the next piece of the command is 72 00:03:40,880 --> 00:03:44,320 the SSH user login ID, 73 00:03:44,320 --> 00:03:49,320 so sshuser @ my host name - ssh.azurehdinsight.net. 74 00:03:51,710 --> 00:03:54,750 And I'm going to add a colon onto the end of that, 75 00:03:54,750 --> 00:03:59,300 which indicates that I want it to prompt me for my password 76 00:03:59,300 --> 00:04:01,110 once I connect to the server. 77 00:04:01,110 --> 00:04:04,760 So when I press enter the very first time you do this 78 00:04:04,760 --> 00:04:06,950 you'll get this authenticity warning, 79 00:04:08,500 --> 00:04:11,889 indicating that it can't authenticate the server, 80 00:04:11,889 --> 00:04:16,889 it's unaware of our deitelpython cluster that we created. 81 00:04:17,620 --> 00:04:20,580 So it's just a warning for you to be aware 82 00:04:20,580 --> 00:04:24,220 that you may be connecting to a server that is dangerous, 83 00:04:24,220 --> 00:04:27,590 in this case we set up the server so we're going to go ahead 84 00:04:27,590 --> 00:04:29,710 and say yes, and hit enter. 85 00:04:29,710 --> 00:04:32,975 So at this point that warning won't come up in the future, 86 00:04:32,975 --> 00:04:35,690 if we connect back to this server, 87 00:04:35,690 --> 00:04:38,170 but at the end of this example I'm actually going to 88 00:04:38,170 --> 00:04:40,365 delete the cluster all together, 89 00:04:40,365 --> 00:04:42,867 in which case the next time I recreate it 90 00:04:42,867 --> 00:04:45,970 I will have a similar message once again. 91 00:04:45,970 --> 00:04:49,450 So let me go ahead and enter my password, 92 00:04:49,450 --> 00:04:52,451 and at this point you can see that it has now copied 93 00:04:52,451 --> 00:04:56,210 each of the three files on to the server. 94 00:04:56,210 --> 00:05:00,470 Now the next thing we want to do is to log into that server 95 00:05:00,470 --> 00:05:04,550 using the ssh command, and again I'm going to use the 96 00:05:04,550 --> 00:05:09,550 same ssh user that I used up above with the scp command. 97 00:05:09,800 --> 00:05:13,570 And when I press enter it will prompt me for my password 98 00:05:13,570 --> 00:05:14,503 once again. 99 00:05:17,580 --> 00:05:19,702 Hopefully I typed that correctly. 100 00:05:19,702 --> 00:05:22,253 Let's do it again there we go. 101 00:05:23,790 --> 00:05:28,790 Okay, now I'm into the server and if I do an ls 102 00:05:28,880 --> 00:05:32,470 we can see the files that we loaded up to the server 103 00:05:32,470 --> 00:05:33,880 a few moments ago. 104 00:05:33,880 --> 00:05:38,880 Now, in order for us to allow Hadoop to feed the contents 105 00:05:39,580 --> 00:05:44,580 of RomeoAndJuliet.txt into our Hadoop MapReduce task, 106 00:05:45,120 --> 00:05:50,100 we do have to move that file in to the Hadoop file system. 107 00:05:50,100 --> 00:05:52,010 Now for that purpose we're going to use 108 00:05:52,010 --> 00:05:54,000 the Hadoop fs command, 109 00:05:54,000 --> 00:05:56,060 which is a Hadoop file system command, 110 00:05:56,060 --> 00:05:59,570 and we're going to be copying from the local directory 111 00:05:59,570 --> 00:06:03,970 that we're in right now, the file RomeoAndJuliet.txt, 112 00:06:03,970 --> 00:06:06,510 and we're going to place that into a folder 113 00:06:06,510 --> 00:06:11,510 that already exists in this cluster called /example/data, 114 00:06:13,320 --> 00:06:18,000 and we'll still name the file there RomeoAndJuliet.txt 115 00:06:18,000 --> 00:06:18,920 as well. 116 00:06:18,920 --> 00:06:22,100 Now the reason that we're using that folder is 117 00:06:22,100 --> 00:06:24,480 it already exists, it's one of the folders 118 00:06:24,480 --> 00:06:27,819 that Microsoft provides for doing their own 119 00:06:27,819 --> 00:06:30,560 Hadoop tutorial as well. 120 00:06:30,560 --> 00:06:34,000 You could of course use other Hadoop file system commands 121 00:06:34,000 --> 00:06:36,810 to create your own folders if you needed to. 122 00:06:36,810 --> 00:06:39,110 So I'm gonna go ahead and execute that, 123 00:06:39,110 --> 00:06:42,218 and it'll now move that file into the file system, 124 00:06:42,218 --> 00:06:47,043 and at this point we're ready to execute our Hadoop task.