1 00:00:07,120 --> 00:00:09,330 - Welcome to Python Fundamentals LiveLessons, 2 00:00:09,330 --> 00:00:10,720 part four of five. 3 00:00:10,720 --> 00:00:11,744 My is Paul Deitel, 4 00:00:11,744 --> 00:00:15,548 and I'll be your instructor for these LiveLessons videos. 5 00:00:15,548 --> 00:00:19,100 This part is based on chapters 12 through 14 6 00:00:19,100 --> 00:00:20,567 of our college textbook, 7 00:00:20,567 --> 00:00:23,207 "Intro to Python for Computer Science and Data Science: 8 00:00:23,207 --> 00:00:26,860 "Learning to Program With AI, Big Data, and the Cloud," 9 00:00:26,860 --> 00:00:30,287 and chapters 11 through 13 of our professional book, 10 00:00:30,287 --> 00:00:31,997 "Python For Programmers." 11 00:00:33,530 --> 00:00:35,380 The prerequisites for this part are 12 00:00:35,380 --> 00:00:38,800 Python Fundamentals LiveLessons parts one, two, and three, 13 00:00:38,800 --> 00:00:41,713 or equivalent Python programming experience. 14 00:00:42,930 --> 00:00:44,890 Many people find that as they work their way 15 00:00:44,890 --> 00:00:46,940 through the videos, it's helpful to have 16 00:00:46,940 --> 00:00:49,020 a copy of the book at hand. 17 00:00:49,020 --> 00:00:51,540 However, this is not required. 18 00:00:51,540 --> 00:00:53,830 There is additional information in the books 19 00:00:53,830 --> 00:00:56,640 that I don't cover as part of these videos. 20 00:00:56,640 --> 00:00:59,480 If you're interested in getting a copy of one of the books, 21 00:00:59,480 --> 00:01:02,850 you can find them in various print and electronic formats 22 00:01:02,850 --> 00:01:04,993 at the sites that are listed for you. 23 00:01:06,590 --> 00:01:08,950 If you're a college instructor, you'll probably want to 24 00:01:08,950 --> 00:01:11,877 take a look at "Intro to Python for Computer Science 25 00:01:11,877 --> 00:01:13,490 "and Data Science," which is our 26 00:01:13,490 --> 00:01:16,490 college textbook version of the book. 27 00:01:16,490 --> 00:01:18,210 This one is in full color. 28 00:01:18,210 --> 00:01:20,898 It's 880 pages, 240 pages more 29 00:01:20,898 --> 00:01:24,290 than "Python For Programmers," and in particular, 30 00:01:24,290 --> 00:01:26,760 it contains a ton of exercises. 31 00:01:26,760 --> 00:01:31,760 557 self-check exercises, as well as 471 additional 32 00:01:31,807 --> 00:01:35,620 end-of-chapter exercises and projects. 33 00:01:35,620 --> 00:01:37,580 Now, many professionals find that they like to 34 00:01:37,580 --> 00:01:40,130 work with the textbook version specifically 35 00:01:40,130 --> 00:01:42,243 because of the exercises. 36 00:01:43,850 --> 00:01:45,777 If you're interested in learning more about 37 00:01:45,777 --> 00:01:48,940 "Intro Python for Computer Science and Data Science," 38 00:01:48,940 --> 00:01:51,960 please take a look at the links that I've provided below. 39 00:01:51,960 --> 00:01:54,610 The first one is for an architectural diagram 40 00:01:54,610 --> 00:01:58,060 in which we show you the four-part modular structure 41 00:01:58,060 --> 00:02:00,081 of this book, and that really applies both 42 00:02:00,081 --> 00:02:03,357 to the college textbook and our professional book, 43 00:02:03,357 --> 00:02:05,730 "Python For Programmers," as well. 44 00:02:05,730 --> 00:02:08,969 The "For Programmers" book has one fewer chapter, 45 00:02:08,969 --> 00:02:12,500 but the overall architecture of the two books is the same, 46 00:02:12,500 --> 00:02:14,270 and also in the "For Programmers" book 47 00:02:14,270 --> 00:02:17,680 we have less of the lower-end pedagogical material 48 00:02:17,680 --> 00:02:19,390 that's really geared to people 49 00:02:19,390 --> 00:02:21,368 who are novices in programming. 50 00:02:21,368 --> 00:02:24,110 The second link is for the full table of contents, 51 00:02:24,110 --> 00:02:26,130 where you can see everything that we're going to cover 52 00:02:26,130 --> 00:02:28,240 throughout the book, and the third one is 53 00:02:28,240 --> 00:02:30,094 the book's preface, where you can learn more about 54 00:02:30,094 --> 00:02:33,710 our approach to teaching Python, data science, 55 00:02:33,710 --> 00:02:36,933 artificial intelligence, and various big data topics. 56 00:02:38,200 --> 00:02:40,240 I'd also like to recommend you take a look at 57 00:02:40,240 --> 00:02:42,200 these two additional links. 58 00:02:42,200 --> 00:02:44,670 The first one is for the full book cover, 59 00:02:44,670 --> 00:02:48,320 and in particular, if you look at the back cover copy, 60 00:02:48,320 --> 00:02:50,870 you'll see a nice summary of everything 61 00:02:50,870 --> 00:02:53,680 that we're going to do throughout these LiveLessons videos. 62 00:02:53,680 --> 00:02:55,770 And I'd also recommend that you take a moment 63 00:02:55,770 --> 00:02:58,701 to read through the technical and academic 64 00:02:58,701 --> 00:03:01,920 reviewer testimonials to really get a sense 65 00:03:01,920 --> 00:03:03,900 of all the things that they liked about 66 00:03:03,900 --> 00:03:07,460 how we presented Python and the various data science, 67 00:03:07,460 --> 00:03:09,523 AI, and big data topics. 68 00:03:10,870 --> 00:03:13,150 In this part, we're going to be focusing on 69 00:03:13,150 --> 00:03:16,943 lessons 11 through 13, on natural language processing, 70 00:03:16,943 --> 00:03:21,943 data mining Twitter, and IBM Watson and cognitive computing. 71 00:03:22,570 --> 00:03:25,640 In Lesson 11, we're going to take a look at 72 00:03:25,640 --> 00:03:29,620 natural language processing, or NLP for short. 73 00:03:29,620 --> 00:03:31,560 Now, every day, we have various 74 00:03:31,560 --> 00:03:34,550 natural language communications with other people. 75 00:03:34,550 --> 00:03:36,199 We send and receive text messages, 76 00:03:36,199 --> 00:03:38,380 we send and receive emails, 77 00:03:38,380 --> 00:03:41,400 we do Facebook posts and other people read them, 78 00:03:41,400 --> 00:03:43,500 or we send tweets and other people read them, 79 00:03:43,500 --> 00:03:45,320 or we read other people's tweets. 80 00:03:45,320 --> 00:03:49,310 These are all forms of natural language communication, 81 00:03:49,310 --> 00:03:51,622 and what natural language processing tries to do 82 00:03:51,622 --> 00:03:55,780 is enable computers to understand that language. 83 00:03:55,780 --> 00:03:58,296 And as you might expect in Python, 84 00:03:58,296 --> 00:04:02,160 there are libraries that can do this for you. 85 00:04:02,160 --> 00:04:03,710 So we're going to take advantage of 86 00:04:03,710 --> 00:04:06,340 several such open source libraries. 87 00:04:06,340 --> 00:04:08,513 We'll look at TextBlob, NLTK, Textatistic, 88 00:04:11,354 --> 00:04:13,400 and spaCy throughout this lesson. 89 00:04:13,400 --> 00:04:16,530 And we'll focus on English-language processing, 90 00:04:16,530 --> 00:04:19,480 but spaCy in particular does support 91 00:04:19,480 --> 00:04:21,850 several different spoken languages, 92 00:04:21,850 --> 00:04:24,180 and there are other libraries out there 93 00:04:24,180 --> 00:04:26,330 that you can take advantage of as well, 94 00:04:26,330 --> 00:04:29,346 in fact, many other natural language processing libraries 95 00:04:29,346 --> 00:04:32,310 are out there for other languages. 96 00:04:32,310 --> 00:04:34,450 Some of the things that you're going to see how to do 97 00:04:34,450 --> 00:04:38,753 in this lesson are basic tasks like tokenizing text 98 00:04:38,753 --> 00:04:42,290 into sentences and individual words, 99 00:04:42,290 --> 00:04:45,300 recognizing the parts of speech within text. 100 00:04:45,300 --> 00:04:48,648 So in order for a computer to understand text, 101 00:04:48,648 --> 00:04:52,155 it has to be able to recognize various aspects of the text 102 00:04:52,155 --> 00:04:54,661 and then assemble those pieces together 103 00:04:54,661 --> 00:04:57,448 into understanding of that text, 104 00:04:57,448 --> 00:05:00,407 and parts of speech are a key aspect of that. 105 00:05:00,407 --> 00:05:02,830 One of the most common things you'll do 106 00:05:02,830 --> 00:05:06,240 with natural language processing is sentiment analysis, 107 00:05:06,240 --> 00:05:09,242 where you look at text to figure out is it positive, 108 00:05:09,242 --> 00:05:11,600 is it negative, is it neutral? 109 00:05:11,600 --> 00:05:13,681 And this is commonly used, for example, 110 00:05:13,681 --> 00:05:16,824 when people are analyzing tweets from Twitter, 111 00:05:16,824 --> 00:05:18,887 they'll look at the text to see if it's 112 00:05:18,887 --> 00:05:21,800 saying something positive or negative, 113 00:05:21,800 --> 00:05:23,433 and in fact, that's one of the case studies 114 00:05:23,433 --> 00:05:26,350 that we'll do in Lesson 13. 115 00:05:26,350 --> 00:05:29,190 Here, we'll show you some basic sentiment analysis 116 00:05:29,190 --> 00:05:30,982 by just providing a couple of sentences 117 00:05:30,982 --> 00:05:33,100 to what's known as a TextBlob, 118 00:05:33,100 --> 00:05:36,380 and then using its built-in sentiment analyzer 119 00:05:36,380 --> 00:05:37,780 for that purpose. 120 00:05:37,780 --> 00:05:40,730 We'll also talk about various language detection 121 00:05:40,730 --> 00:05:42,452 and translation capabilities. 122 00:05:42,452 --> 00:05:44,920 We'll do this via the TextBlob library, 123 00:05:44,920 --> 00:05:48,565 which interacts with Google Translate for those purposes. 124 00:05:48,565 --> 00:05:51,708 We'll show you some cool visualizations 125 00:05:51,708 --> 00:05:53,530 in this lesson as well. 126 00:05:53,530 --> 00:05:56,490 So one of the ways to analyze text 127 00:05:56,490 --> 00:06:00,170 would be to do calculations like word frequencies, 128 00:06:00,170 --> 00:06:02,700 so you can see what words come up most frequently 129 00:06:02,700 --> 00:06:06,380 in a body of text or a corpus, as we refer to it. 130 00:06:06,380 --> 00:06:09,240 But those are just numbers, and sometimes it's 131 00:06:09,240 --> 00:06:12,730 much more interesting to show visualizations. 132 00:06:12,730 --> 00:06:16,410 So we'll go ahead and calculate word frequencies 133 00:06:16,410 --> 00:06:19,150 and we'll do a couple of different visualizations. 134 00:06:19,150 --> 00:06:22,060 One will be a simple bar chart, which is okay, 135 00:06:22,060 --> 00:06:23,820 and better than just numbers. 136 00:06:23,820 --> 00:06:25,840 But we'll also show you something really cool, 137 00:06:25,840 --> 00:06:29,022 which is the ability to create a word cloud 138 00:06:29,022 --> 00:06:32,176 in which larger words represent words 139 00:06:32,176 --> 00:06:36,000 that appear more frequently in a body of text. 140 00:06:36,000 --> 00:06:38,828 And as you're about to see later on in this lesson, 141 00:06:38,828 --> 00:06:42,423 you can do that with just a couple of lines of code. 142 00:06:44,070 --> 00:06:45,920 In Lesson 12, we're going to take a look at 143 00:06:45,920 --> 00:06:47,792 one of the most fun and interesting lessons, 144 00:06:47,792 --> 00:06:49,920 data mining Twitter. 145 00:06:49,920 --> 00:06:52,970 Twitter is a massive big data data source 146 00:06:52,970 --> 00:06:55,940 that just about anybody can tap into, 147 00:06:55,940 --> 00:06:58,240 and one of the most common things you'll do with it 148 00:06:58,240 --> 00:07:02,330 is actually sentiment analysis from back in Lesson 12. 149 00:07:02,330 --> 00:07:03,980 Now, as part of this lesson, 150 00:07:03,980 --> 00:07:06,760 we're going to be working with a Python library 151 00:07:06,760 --> 00:07:08,830 called Tweepy, and one of the things 152 00:07:08,830 --> 00:07:10,915 that's great about Tweepy is that it makes it 153 00:07:10,915 --> 00:07:14,673 super easy to connect to the Twitter web services 154 00:07:14,673 --> 00:07:17,140 and authenticate with them. 155 00:07:17,140 --> 00:07:20,150 You will need a Twitter developer account for that purpose. 156 00:07:20,150 --> 00:07:21,574 And once you are authenticated, 157 00:07:21,574 --> 00:07:24,101 it makes it easy to invoke the many different 158 00:07:24,101 --> 00:07:26,996 Twitter web services that are available 159 00:07:26,996 --> 00:07:30,214 and to ensure that you don't accidentally 160 00:07:30,214 --> 00:07:32,630 violate Twitter's terms of service 161 00:07:32,630 --> 00:07:34,760 with regard to the rate limits, 162 00:07:34,760 --> 00:07:38,390 meaning how many times you're allowed to invoke 163 00:07:38,390 --> 00:07:42,640 a particular web service over a 15-minute time interval. 164 00:07:42,640 --> 00:07:45,720 So I'll discuss all of those issues throughout this lesson. 165 00:07:45,720 --> 00:07:47,900 Now, once you're authenticated with Twitter 166 00:07:47,900 --> 00:07:50,508 via the Tweepy library, you're going to be able 167 00:07:50,508 --> 00:07:53,910 to take advantage of a Tweepy API object 168 00:07:53,910 --> 00:07:57,400 to do things like search through past tweets 169 00:07:57,400 --> 00:07:59,060 over the last seven days. 170 00:07:59,060 --> 00:08:02,226 You can't go beyond the past seven days with the free APIs. 171 00:08:02,226 --> 00:08:04,899 You're going to be able to tap into 1% 172 00:08:04,899 --> 00:08:08,527 of the live tweets going on at any given time, 173 00:08:08,527 --> 00:08:10,753 and if you think that's a small number, 174 00:08:10,753 --> 00:08:14,564 there's something like 800 million tweets a day 175 00:08:14,564 --> 00:08:17,956 at this point, so 1% still gives you access 176 00:08:17,956 --> 00:08:22,500 to seven or eight million of those tweets on a daily basis. 177 00:08:22,500 --> 00:08:26,564 So that's still big data, even using the free APIs. 178 00:08:26,564 --> 00:08:29,016 We'll show you how to work with tweet objects 179 00:08:29,016 --> 00:08:30,550 and their metadata. 180 00:08:30,550 --> 00:08:32,935 So as you probably know, a tweet is typically 181 00:08:32,935 --> 00:08:36,985 140 characters or less, and back in 2017, 182 00:08:36,985 --> 00:08:41,630 they made it capable of being up to 280 characters. 183 00:08:41,630 --> 00:08:45,279 But in addition to that, there's all sorts of metadata 184 00:08:45,279 --> 00:08:48,610 that Twitter maintains on a tweet by tweet basis. 185 00:08:48,610 --> 00:08:50,360 So we'll talk about some of that metadata, 186 00:08:50,360 --> 00:08:53,707 show you how to use some of that metadata as well. 187 00:08:53,707 --> 00:08:56,890 We'll do some natural language processing on tweets, 188 00:08:56,890 --> 00:09:00,004 looking for sentiment analysis in particular, 189 00:09:00,004 --> 00:09:02,791 whether a tweet is positive or negative. 190 00:09:02,791 --> 00:09:06,080 We'll also show you how to work with the Twitter Trends API 191 00:09:06,080 --> 00:09:07,888 to look at trending topics worldwide 192 00:09:07,888 --> 00:09:11,000 and in specific locations around the world. 193 00:09:11,000 --> 00:09:13,385 And we're going to do our first dynamic 194 00:09:13,385 --> 00:09:15,610 interactive maps as well, 195 00:09:15,610 --> 00:09:19,010 where we will take a bunch of tweets and actually 196 00:09:19,010 --> 00:09:21,540 plot the locations that they came from 197 00:09:21,540 --> 00:09:23,559 on an interactive map where you can 198 00:09:23,559 --> 00:09:26,210 click on a map marker, see the tweet, 199 00:09:26,210 --> 00:09:29,313 and the location on the map as well. 200 00:09:30,600 --> 00:09:32,930 To finish off Part Four, we're going to take a look 201 00:09:32,930 --> 00:09:37,530 at Lesson 13, IBM Watson and Cognitive Computing. 202 00:09:37,530 --> 00:09:40,490 Now, chances are most of you have probably seen 203 00:09:40,490 --> 00:09:43,658 an IBM Watson commercial on TV by now 204 00:09:43,658 --> 00:09:46,610 showing off some of its capabilities. 205 00:09:46,610 --> 00:09:48,830 What we're going to do in this lesson is 206 00:09:48,830 --> 00:09:51,470 introduce you to Watson and show you 207 00:09:51,470 --> 00:09:53,920 how to sign up for an account for working 208 00:09:53,920 --> 00:09:56,800 with their free Lite tier services, 209 00:09:56,800 --> 00:09:58,640 and one of the reasons we specifically 210 00:09:58,640 --> 00:10:01,877 chose IBM Watson for use in our books and videos 211 00:10:01,877 --> 00:10:05,128 versus some of the other cloud vendors like Amazon 212 00:10:05,128 --> 00:10:08,973 and Google and Microsoft is that IBM Watson 213 00:10:08,973 --> 00:10:12,602 does not require a credit card for you to sign up 214 00:10:12,602 --> 00:10:17,470 and experiment with some of these free tier Lite services, 215 00:10:17,470 --> 00:10:19,090 as they refer to them. 216 00:10:19,090 --> 00:10:20,920 So we'll show you how to get set up. 217 00:10:20,920 --> 00:10:23,845 We'll also go ahead and have you download and install 218 00:10:23,845 --> 00:10:28,310 the Watson developer cloud Python SDK. 219 00:10:28,310 --> 00:10:29,910 They actually provide SDKs for 220 00:10:29,910 --> 00:10:32,040 many different programming languages. 221 00:10:32,040 --> 00:10:36,037 And we'll then be able to use objects from that Python SDK 222 00:10:36,037 --> 00:10:40,350 to interact with many different Watson web services, 223 00:10:40,350 --> 00:10:42,290 and we'll take advantage in this lesson 224 00:10:42,290 --> 00:10:44,670 of three of them in particular. 225 00:10:44,670 --> 00:10:47,098 We'll use the text to speech service 226 00:10:47,098 --> 00:10:50,077 to take text and turn it into audio. 227 00:10:50,077 --> 00:10:54,410 We'll use the speech to text service to transcribe audio 228 00:10:54,410 --> 00:10:57,210 back into text, and we'll also take advantage 229 00:10:57,210 --> 00:11:00,120 of their language translation service as well, 230 00:11:00,120 --> 00:11:03,492 and we're going to mash up all three of those services 231 00:11:03,492 --> 00:11:06,560 into a little traveler's companion app 232 00:11:06,560 --> 00:11:09,583 where somebody can speak a question in English. 233 00:11:09,583 --> 00:11:12,759 We will then transcribe that into English text, 234 00:11:12,759 --> 00:11:17,430 translate it into Spanish, and then speak it in Spanish, 235 00:11:17,430 --> 00:11:20,870 and we'll do the reverse as well as a Spanish speaker 236 00:11:20,870 --> 00:11:24,020 responds to the question in English. 237 00:11:24,020 --> 00:11:26,370 So it's a really cool way of demonstrating 238 00:11:26,370 --> 00:11:29,617 some of the power of these online cloud-based services 239 00:11:29,617 --> 00:11:33,163 that are available to you from many cloud vendors.