1
00:00:06,900 --> 00:00:10,800
So we've described the various modules that make up ZF Essence. What? I'm

2
00:00:10,800 --> 00:00:14,100
how they fit in the picture of

3
00:00:14,900 --> 00:00:18,600
ZFS or ufs. Now, what I want to do is to

4
00:00:18,600 --> 00:00:22,500
look at sort of how these modules all interact with each other.

5
00:00:24,000 --> 00:00:28,800
So in this picture, here, we see the functional organization and

6
00:00:29,200 --> 00:00:33,900
I'm going to also show you the logical organization in a future

7
00:00:33,900 --> 00:00:37,600
sub. Lassen this picture. Here. We have a dotted line across the

8
00:00:37,600 --> 00:00:41,800
top, which divides the user from the kernel. So everything below that

9
00:00:41,800 --> 00:00:45,900
horizontal, dotted line is inside the kernel and this stuff above that

10
00:00:45,900 --> 00:00:49,700
is user. And, and we've also got the, the vertical

11
00:00:49,700 --> 00:00:53,700
dotted line which is sort of all dividing between the data path.

12
00:00:53,800 --> 00:00:57,800
Related stuff on the left and the sort of more management oriented stuff

13
00:00:57,800 --> 00:01:01,600
on the right. Now, you might find it a little odd that we would have in management

14
00:01:01,600 --> 00:01:05,900
interface to ZFS. But again, you need to remember that Z FS is

15
00:01:05,900 --> 00:01:09,900
doing much more than just being a filesystem. You have to be able to do

16
00:01:09,900 --> 00:01:13,700
things like tell it to take snapshots and manage its raid

17
00:01:13,700 --> 00:01:17,800
array and be able to create file

18
00:01:17,800 --> 00:01:21,700
systems and set properties on them, and all kinds of other things, which is

19
00:01:21,700 --> 00:01:23,800
sort of far more functionality than we have out.

20
00:01:23,800 --> 00:01:27,700
Out of just the file system itself. I mean it sometimes we have the ability to do that with

21
00:01:27,700 --> 00:01:31,500
ufs with configuration. And then there's a different set of things that do

22
00:01:31,500 --> 00:01:35,900
configuration for geom and Etc. And so in the case of ZFS

23
00:01:35,900 --> 00:01:39,900
because everything else is integrated, the management layer just gets integrated as well.

24
00:01:39,900 --> 00:01:43,300
Alright, so let's start on the left with the data paths. Here.

25
00:01:43,300 --> 00:01:47,600
We have. Just the traditional applications that are doing read and write system

26
00:01:47,600 --> 00:01:51,800
calls and they, of course think that they are talking to a

27
00:01:51,800 --> 00:01:53,700
traditional posix filesystem.

28
00:01:53,800 --> 00:01:57,700
System. So they come through the VFS interface just like it would for any other

29
00:01:57,700 --> 00:02:01,500
file system and into the ZFS posix layer,

30
00:02:01,500 --> 00:02:05,700
which is the thing that's going to be doing all of the interpretation of the metadata

31
00:02:05,700 --> 00:02:09,600
path name lookup and all the sorts of things. You'll notice that when we

32
00:02:09,600 --> 00:02:13,900
deal had a directory, we're going to go over and use the, the zap objects

33
00:02:13,900 --> 00:02:17,700
because there's a zap object for each directory, which is

34
00:02:17,700 --> 00:02:21,800
storing all of the names and mappings to inodes and other

35
00:02:21,800 --> 00:02:23,400
things that gets put in a directory.

36
00:02:24,600 --> 00:02:28,700
Also, as IO is done, that is going to have to put

37
00:02:28,700 --> 00:02:32,400
things into the intent log. Everything that gets done

38
00:02:32,400 --> 00:02:36,900
after a checkpoint has to be put in that intent log both data and

39
00:02:36,900 --> 00:02:40,900
metadata. So that after a crash, we're going to be

40
00:02:40,900 --> 00:02:44,800
able to roll back that log to get put in all the changes that

41
00:02:44,800 --> 00:02:48,600
happen since the last checkpoint. And of course, that there's ill

42
00:02:48,600 --> 00:02:52,900
itself is going to need to go down through the I/O system in order to get the data

43
00:02:52,900 --> 00:02:53,700
that's in that log.

44
00:02:53,800 --> 00:02:57,700
Committed unto stable store and that's

45
00:02:57,700 --> 00:03:01,800
going to have to happen with some regularity. For example, any time enough sink comes in,

46
00:03:01,800 --> 00:03:05,200
that's going to require that the zil up to that point,

47
00:03:05,200 --> 00:03:09,400
gets written to stable storage. So there's there's a fairly

48
00:03:09,400 --> 00:03:13,700
heavy traffic path from the zil down to the, to the I/O layer

49
00:03:13,700 --> 00:03:17,800
below the posix layer and the directory Leo. And once we sort of got figured out,

50
00:03:17,800 --> 00:03:21,800
you know, what's the file that we're working with? We then have to go into the data management

51
00:03:21,800 --> 00:03:23,700
layer, which is the thing that

52
00:03:23,800 --> 00:03:27,900
It's going to deal with getting us, the actual disk blocks that we need to store the data

53
00:03:27,900 --> 00:03:31,900
into and then of course is going to potentially be half into

54
00:03:31,900 --> 00:03:35,900
look things up in the ark. The cash to know if we're going to

55
00:03:35,900 --> 00:03:39,100
be overwriting existing blocks. We need the old contents and

56
00:03:40,200 --> 00:03:44,900
you know, if we're referencing directory blocks a good chance that they're going to be sitting in the cache

57
00:03:44,900 --> 00:03:48,700
and so on the Arc of course is backed up by the backup Ark.

58
00:03:48,700 --> 00:03:52,600
So the older stuff has been migrated so that may have to come

59
00:03:52,600 --> 00:03:53,300
wandering in.

60
00:03:53,800 --> 00:03:57,800
And meanwhile, all of this stuff that's doing I/O is all working through the

61
00:03:57,800 --> 00:04:01,900
io module there, which is in turn, cooperating with

62
00:04:01,900 --> 00:04:05,400
the devices. And the raid-z, if we're using that,

63
00:04:05,400 --> 00:04:09,900
and that's going to be sitting on top of geom. And again, it may

64
00:04:09,900 --> 00:04:13,800
be a virtual device from John, but generally, it's just that the very bottom of

65
00:04:13,800 --> 00:04:17,600
geom where the only thing that were essentially dealing with is the

66
00:04:17,600 --> 00:04:21,800
actual raw Hardware. Okay, moving over to the right next

67
00:04:21,800 --> 00:04:23,600
to the the other side of the data.

68
00:04:23,800 --> 00:04:27,800
Path there. We also have these things called Z valls as

69
00:04:27,800 --> 00:04:31,800
evil is made to look like a raw disk partition. That is, it

70
00:04:31,800 --> 00:04:35,500
is just a you could think of it as really just one giant file

71
00:04:36,000 --> 00:04:40,900
and it is one way of getting a slice off of a physical drive. But in

72
00:04:40,900 --> 00:04:44,600
fact what is generally done is that it really

73
00:04:44,600 --> 00:04:48,500
treated almost like a separate file system that has just one file in it. It

74
00:04:48,500 --> 00:04:52,500
looks to the outside world like it is a disk so you can see that it can be

75
00:04:52,500 --> 00:04:53,200
exported to

76
00:04:53,900 --> 00:04:57,800
And geom can do all of its regular Magic on that and pass a

77
00:04:57,800 --> 00:05:01,600
virtual volume back up. So you can for example, build a

78
00:05:01,600 --> 00:05:05,700
ufs filesystem on top of as evil or run a database on is Evol

79
00:05:05,700 --> 00:05:09,600
the benefit of running on the Z vowel is that first of all, you've got the

80
00:05:09,600 --> 00:05:13,200
intent log so that you can keep it more up to date without having to actually

81
00:05:13,200 --> 00:05:17,900
do the right. But the other thing is that you can take snapshots of as

82
00:05:17,900 --> 00:05:21,700
Evol so you can run aufs filesystem on it and it provides a

83
00:05:21,700 --> 00:05:23,600
really cheap way of taking snapshots.

84
00:05:23,900 --> 00:05:27,200
That ufs filesystem, which is is much cheaper than

85
00:05:27,200 --> 00:05:31,500
actually having that file system running on a regular hard

86
00:05:31,500 --> 00:05:35,800
piece of metal disc. Okay. Finally, we will go over to

87
00:05:35,800 --> 00:05:39,500
the management side there. There's a thing called

88
00:05:39,500 --> 00:05:43,800
Dev Z FS, which is the handle that you use to get access to

89
00:05:43,800 --> 00:05:46,700
the various commands that deal with ZFS.

90
00:05:46,700 --> 00:05:50,900
And the sort of things that you need to do is you can do

91
00:05:50,900 --> 00:05:53,300
configuration on devices. So you can

92
00:05:54,000 --> 00:05:58,900
increase the size of your raid pool or your raid-z pool, or you can do something called a

93
00:05:58,900 --> 00:06:02,700
scrub. One of the problems with hard disks

94
00:06:02,900 --> 00:06:06,500
is that they have this bad habit that you write something there and it gets written

95
00:06:06,500 --> 00:06:10,800
perfectly. Well, and then for no apparent reason, it just goes bad.

96
00:06:11,700 --> 00:06:15,800
And of course, if you're not accessing it, you won't know that if you actually try and

97
00:06:15,800 --> 00:06:19,900
read it, you'll get some kind of a read error and you'll discover that there's something wrong with

98
00:06:19,900 --> 00:06:23,700
it, but absent actually going out and reading it, you just don't know.

99
00:06:23,800 --> 00:06:27,900
That, that data is gone bad and if enough blocks of data go bad,

100
00:06:27,900 --> 00:06:31,900
at some point, you can lose a disk. And then when you go to reconstruct you find that some of

101
00:06:31,900 --> 00:06:35,700
the blocks that you need to do, the Reconstruction have horror of Horrors, gone

102
00:06:35,700 --> 00:06:39,900
bad. And so you can't complete your reconstruction. So one

103
00:06:39,900 --> 00:06:43,400
of the features that ZFS provides is what's called a scrub command

104
00:06:43,400 --> 00:06:47,700
and that says, go out onto the, my raid-z and read

105
00:06:47,700 --> 00:06:51,900
all the blocks that have been allocated by any filesystem. So it doesn't actually

106
00:06:51,900 --> 00:06:53,700
have to read every block on every disc.

107
00:06:53,800 --> 00:06:57,900
It just reads all the blocks that are actually in use and make sure that they can be

108
00:06:57,900 --> 00:07:01,300
read and if it finds one that doesn't read or doesn't read easily,

109
00:07:01,300 --> 00:07:05,100
you know, it has to be reread several times before it will read cleanly,

110
00:07:05,100 --> 00:07:09,800
then it will reconstruct that block and rewrite it or move it

111
00:07:09,800 --> 00:07:13,800
to some other place. So that in the future, you'll be

112
00:07:13,800 --> 00:07:17,900
able to get access to it. Other things, that the the management

113
00:07:17,900 --> 00:07:21,900
layer is responsible for is the send and receive which is essentially

114
00:07:21,900 --> 00:07:23,700
the dump and restore. So,

115
00:07:23,900 --> 00:07:27,700
Is coordinated with in ZFS? It's not a separate program, as it is in

116
00:07:27,700 --> 00:07:31,900
ufs. And then, finally, we have the data storage

117
00:07:31,900 --> 00:07:35,700
layer. This is the thing that's dealing with reservations and quotas

118
00:07:36,300 --> 00:07:40,900
and we'll be updating various things in the data management unit to make sure that

119
00:07:40,900 --> 00:07:44,800
those things can. In fact, be enforced, just to sort of

120
00:07:44,800 --> 00:07:48,700
recap. We have the filesystem and The Logical disk

121
00:07:48,700 --> 00:07:52,200
access. We've got the management of the pool and we've got the

122
00:07:52,200 --> 00:07:53,600
geom Import and Export.

123
00:07:53,800 --> 00:07:57,900
Port. So, we're importing to get the raw disks in and we're exporting. Z

124
00:07:57,900 --> 00:08:00,000
falls out to the outside world.