1 00:00:00,261 --> 00:00:01,990 - [Instructor] Next up, we're going to talk 2 00:00:01,990 --> 00:00:05,415 a little bit about big data databases, and specifically, 3 00:00:05,415 --> 00:00:08,320 two different categories of those databases, 4 00:00:08,320 --> 00:00:12,820 NoSQL databases, and NewSQL databases as well. 5 00:00:12,820 --> 00:00:16,110 Now, of course big data is massive, and as such, 6 00:00:16,110 --> 00:00:18,600 requires massive amounts of storage. 7 00:00:18,600 --> 00:00:21,450 Typically, that storage is going to be spread across 8 00:00:21,450 --> 00:00:24,840 massive data centers world wide, and of course, 9 00:00:24,840 --> 00:00:27,780 those data centers are networked together. 10 00:00:27,780 --> 00:00:31,390 And, the concepts of NoSQL and NewSQL databases 11 00:00:31,390 --> 00:00:35,970 were specifically designed to deal with that type of data, 12 00:00:35,970 --> 00:00:38,300 where it's, just, so large, 13 00:00:38,300 --> 00:00:40,860 that it has to be stored worldwide. 14 00:00:40,860 --> 00:00:44,960 And those are challenges, and, sizes, 15 00:00:44,960 --> 00:00:48,740 that relational databases really were not designed for, 16 00:00:48,740 --> 00:00:49,960 in the first place. 17 00:00:49,960 --> 00:00:53,730 So, these new databases had to be created to handle, 18 00:00:53,730 --> 00:00:56,336 not only the massive amounts of data 19 00:00:56,336 --> 00:00:59,560 that are available to us in big data, but also, 20 00:00:59,560 --> 00:01:03,770 the speed with which that data gets created, as well. 21 00:01:03,770 --> 00:01:08,110 So, NoSQL itself originally meant what its name implies, 22 00:01:08,110 --> 00:01:11,420 that is not using sequel to manipulate the 23 00:01:11,420 --> 00:01:12,980 data structured quarried language, 24 00:01:12,980 --> 00:01:16,140 like we demonstrated earlier, in this lesson. 25 00:01:16,140 --> 00:01:19,900 But, nowadays, there's actually more and more sequel 26 00:01:19,900 --> 00:01:22,390 being used with big data, so, 27 00:01:22,390 --> 00:01:24,547 it's now said to mean 28 00:01:24,547 --> 00:01:26,490 "Not Only SQL." 29 00:01:26,490 --> 00:01:28,950 and in fact, later on in this lesson, 30 00:01:28,950 --> 00:01:32,570 when we get to the Spark application framework, 31 00:01:32,570 --> 00:01:34,045 one of the things that we'll use there 32 00:01:34,045 --> 00:01:39,025 is a concept called Spark SQL, to quarry distributed data 33 00:01:39,025 --> 00:01:42,453 that resides on many nodes in a cluster. 34 00:01:44,760 --> 00:01:48,960 Now, NoSQL databases are specifically meant for 35 00:01:48,960 --> 00:01:53,320 a combination of unstructured, and/or semi-structured data. 36 00:01:53,320 --> 00:01:57,570 So unstructured data are things like photos and videos 37 00:01:57,570 --> 00:02:00,620 that we're all posting on our social media accounts, 38 00:02:00,620 --> 00:02:02,850 various concepts with natural language, 39 00:02:02,850 --> 00:02:05,240 such as the text we type in emails, 40 00:02:05,240 --> 00:02:07,096 text messages with one another, 41 00:02:07,096 --> 00:02:10,450 social media posts on Twitter, and Facebook, 42 00:02:10,450 --> 00:02:12,070 and other sites as well. 43 00:02:12,070 --> 00:02:15,310 And semi-structured data is often 44 00:02:15,310 --> 00:02:17,680 JSON or XML documents 45 00:02:17,680 --> 00:02:22,090 that wrap unstructured data with additional information, 46 00:02:22,090 --> 00:02:24,020 usually called metadata. 47 00:02:24,020 --> 00:02:26,300 So, for instance, back when we were looking 48 00:02:26,300 --> 00:02:30,600 at tweets in Twitter, we saw that with respect to a tweet 49 00:02:30,600 --> 00:02:33,040 that was either 140 characters maximum, 50 00:02:33,040 --> 00:02:35,280 or 280 characters maximum, 51 00:02:35,280 --> 00:02:38,800 what we actually got back from the Twitter APIs 52 00:02:38,800 --> 00:02:41,730 was a much larger piece of data, 53 00:02:41,730 --> 00:02:43,610 in the form of a JSON object, 54 00:02:43,610 --> 00:02:47,650 that had tons and tons of additional metadata in it as well. 55 00:02:47,650 --> 00:02:51,050 So, often, semi-structured data is stuff like 56 00:02:51,050 --> 00:02:54,333 the tweets that we were getting back from Twitter. 57 00:02:55,550 --> 00:02:58,010 Now, as we work our way through the next couple of videos, 58 00:02:58,010 --> 00:03:01,020 we're going to talk about four major types 59 00:03:01,020 --> 00:03:06,020 of NoSQL databases, key-values stores, document data stores, 60 00:03:06,090 --> 00:03:09,100 columnar databases, which are kind of similar 61 00:03:09,100 --> 00:03:12,880 to SQL databases and relational databases. 62 00:03:12,880 --> 00:03:16,430 And, also a concept called graph databases as well. 63 00:03:16,430 --> 00:03:17,880 And one of the things that you'll find 64 00:03:17,880 --> 00:03:20,910 as you start diving in to these different database types, 65 00:03:20,910 --> 00:03:23,578 is that many of the database vendors out there 66 00:03:23,578 --> 00:03:26,220 in the big-data space, of course provide 67 00:03:26,220 --> 00:03:29,610 free tiers or free trials, that enable you to 68 00:03:29,610 --> 00:03:32,050 work with their tools, get used to using them, 69 00:03:32,050 --> 00:03:33,683 and determine if they're the right ones 70 00:03:33,683 --> 00:03:35,820 for your applications. 71 00:03:35,820 --> 00:03:39,660 And, in fact, when we get to our NoSQL example, 72 00:03:39,660 --> 00:03:41,960 we'll use the MongoDB database, 73 00:03:41,960 --> 00:03:46,500 and we'll specifically take advantage of their free version 74 00:03:46,500 --> 00:03:48,923 of what they call an Atlas Cluster.