diff options
Diffstat (limited to 'en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv')
-rw-r--r-- | en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv | 4300 |
1 files changed, 4300 insertions, 0 deletions
diff --git a/en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv b/en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv new file mode 100644 index 0000000000..16dcca9e84 --- /dev/null +++ b/en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv @@ -0,0 +1,4300 @@ +0:00:09.469,0:00:11.309 +Hello my name is Marshall Kirk McKusick + +0:00:11.309,0:00:15.389 +and I've been around as long as dinosaurs +and mainframes have ruled the world + +0:00:15.389,0:00:18.429 +which is to say the sixties and seventies + +0:00:18.429,0:00:22.460 +however by 1970s a new breed of mammals had begun to show up +on the scene + +0:00:22.460,0:00:24.240 +known as mini computers + +0:00:24.240,0:00:28.230 +although they were just toys in the 1970s they would soon grow + +0:00:28.230,0:00:31.689 +and take over most of the computing market + +0:00:31.689,0:00:33.150 +In 1970 + +0:00:33.150,0:00:37.910 +at AT&T Bell laboratories two researchers Ken +Thompson and Dennis Ritchie began developing the + +0:00:37.910,0:00:39.900 +UNIX operating system + +0:00:39.900,0:00:42.040 +Ken Thompson who had been an alumnus at Berkeley + +0:00:42.040,0:00:46.100 +came back on a sabbatical in 1975 bringing UNIX +with him + +0:00:46.100,0:00:47.539 +In the year that he was there + +0:00:47.539,0:00:51.330 +he managed to get a number of graduate students interested +in UNIX + +0:00:51.330,0:00:53.940 +and by the time he left in 1976 + + +0:00:53.940,0:00:56.829 +Bill Joy has taken over in running the UNIX system + +0:00:56.829,0:01:00.470 +and in fact continuing to develop software for it. + +0:01:00.470,0:01:04.339 +Bill began packaging up the software that had +been developed under Berkeley UNIX and + +0:01:04.339,0:01:05.779 +and distributing it + +0:01:05.779,0:01:08.040 +as the Berkeley Software Distributions + +0:01:08.040,0:01:12.310 +whose name was quickly shortened to simply BSD + +0:01:12.310,0:01:16.330 +BSD continued to be distributed with +yearly distributions for almost fifteen + +0:01:16.330,0:01:17.490 +years + +0:01:17.490,0:01:21.920 +initially under Bill Joy and later under others including +yours truly. + +0:01:21.920,0:01:24.860 +By the late 1980s interest had began to grow + +0:01:24.860,0:01:27.400 +in freely redistributable software + +0:01:27.400,0:01:30.170 +so a number of us at Berkeley began separating +out + +0:01:30.170,0:01:32.649 +the AT&T proprietary bits of BSD + +0:01:32.649,0:01:35.710 +from those parts that were freely redistributable. + +0:01:35.710,0:01:40.590 +By the time of the final distribution at BSD +in 1992 + +0:01:40.590,0:01:43.620 +the entire distribution was freely redistributable. + +0:01:43.620,0:01:45.909 +I live in a capsule history here + +0:01:45.909,0:01:48.009 +but if you're interested in the entire story + +0:01:48.009,0:01:50.789 +I have this three-and-an-half hour epic + +0:01:50.789,0:01:54.590 +which is available from my website www.mckusick.com + +0:01:54.590,0:01:58.200 +that gives the entire history of Berkeley. + +0:01:58.200,0:02:00.239 +Following the final distribution from Berkeley + +0:02:00.239,0:02:01.450 +two groups sprung up + +0:02:01.450,0:02:03.600 +to continue supporting BSD + +0:02:03.600,0:02:08.080 +the first of this was the NetBSD whose primary +goal was to support + +0:02:08.080,0:02:10.459 +as many different architectures as possible + +0:02:10.459,0:02:14.769 +everything from your microwave oven all way +upto your cray XMP + +0:02:14.769,0:02:19.409 +In fact today NetBSD supports nearly +sixty architectures. + +0:02:19.409,0:02:22.419 +The other group that sprang up was FreeBSD. + +0:02:22.419,0:02:28.239 +Their goal was to bring up BSD and support +as wide a set of devices as possible on the + +0:02:28.239,0:02:29.719 +PC architecture. + +0:02:29.719,0:02:36.549 +They also had a goal of trying to make the + system as easy to install as possible to + +0:02:36.549,0:02:39.309 +attract by a wide group of developers + +0:02:39.309,0:02:42.319 +I chose to work primarily with the FreeBSD +group + +0:02:42.319,0:02:43.740 +both doing software + +0:02:43.740,0:02:46.140 +and also together with George Neville Neil + +0:02:46.140,0:02:51.069 +writing this book ""The Design and Implementation +of the FreeBSD Operating System"". + +0:02:51.069,0:02:52.060 +Together with this book + +0:02:52.060,0:02:53.959 +I developed a course + +0:02:53.959,0:02:56.500 +which runs for twelve chapters + +0:02:56.500,0:02:58.179 +and thirty hours. + +0:02:58.179,0:02:59.749 +The purpose of this video + +0:02:59.749,0:03:01.089 +is to give you a taste + +0:03:01.089,0:03:02.819 +of that course. + +0:03:02.819,0:03:07.249 +What follows are excerpts from the first lecture +of the course + +0:03:07.249,0:03:11.139 +which of course you can also get from my website +www.mckusick.com. + +0:03:11.139,0:03:13.069 + + +0:03:13.069,0:03:17.739 +Enjoy. + +0:03:17.739,0:03:22.239 +This class is nominally about FreeBSD +because well + +0:03:22.239,0:03:26.379 +that's what I know best and that's what +the textbook is organized around + +0:03:26.379,0:03:29.979 +but the fact of the matter is that it's really + +0:03:29.979,0:03:32.339 +a class about your UNIX and that + +0:03:32.339,0:03:36.539 +really covers sort of the broad range of things +in the open source arena as its FreeBSD + +0:03:36.539,0:03:37.689 +in Linux + +0:03:37.689,0:03:38.899 +which of course + +0:03:38.899,0:03:41.159 +you use a lot out + +0:03:41.159,0:03:41.550 +and + +0:03:41.550,0:03:44.349 +it also covers a commercial systems + +0:03:44.349,0:03:46.950 +%uh Solaris, HP-UX, + +0:03:46.950,0:03:49.279 +AIX and so on. + +0:03:49.279,0:03:52.419 +I am going to tend more towards the open +side + +0:03:52.419,0:03:56.389 +open source side of things.So it's really +going to be more FreeBSD in Linux than it's + +0:03:56.389,0:03:57.579 +going to be + +0:03:57.579,0:04:00.849 +Solaris and HP-UX and so on. + +0:04:00.849,0:04:06.959 +For the most part at the level of this course +we're dealing with the interfaces to the system + +0:04:06.959,0:04:07.329 +and + +0:04:07.329,0:04:11.599 +the fact that the matter is a those interfaces are highly +standardized at this point + +0:04:11.599,0:04:12.060 +and + +0:04:12.060,0:04:15.280 +whether it's FreeBSD or Linux or Solaris +or whatever + +0:04:15.280,0:04:19.460 +the Socket system call has to do the same +thing, it has to have the same arguments + +0:04:19.460,0:04:20.150 +in that, + +0:04:20.150,0:04:23.909 +it has to have the same effect + +0:04:23.909,0:04:27.319 +and so until you get down to the really nitty +details + +0:04:27.319,0:04:29.600 +of how they actually go about implementing +that + +0:04:29.600,0:04:31.960 +the differences are relatively minor. + +0:04:31.960,0:04:35.830 +So I would say that sixty to seventy percent +of the material that I'm covering + +0:04:35.830,0:04:40.779 +is just as true for FreeBSD as it would +be for Linux + +0:04:40.779,0:04:42.580 +or for Solaris + +0:04:42.580,0:04:44.659 +%uh AIX is a little bit + +0:04:44.659,0:04:45.629 +sort of off in the weeds + +0:04:45.629,0:04:48.709 +%uh as is HP-UX + +0:04:48.709,0:04:51.099 +but luckily we don't have to worry too much about +that. + +0:04:51.099,0:04:54.569 +Okay so + +0:04:54.569,0:04:59.279 +the other thing is that I'm going to assume that +all of you have used the system. I get + +0:04:59.279,0:05:00.910 +really sort of worried when people + +0:05:00.910,0:05:04.249 +you know raise the hands and ""Hey, what's a Shell?"" + +0:05:04.249,0:05:07.990 +or I don't +put a lot of code up but a one piece of code and someone said ""Why + +0:05:07.990,0:05:11.819 +are there two pipe symbols in the middle of +that that If statement?"". + +0:05:11.819,0:05:15.740 +No we're not programming the Shell we're programming +in C. + +0:05:15.740,0:05:19.970 +So hopefully you can tell the difference between +Shell scripts and C code. + +0:05:19.970,0:05:21.990 +so okay but I am but am gonna assume + +0:05:21.990,0:05:24.610 +you haven't really looked inside the system. + +0:05:24.610,0:05:28.289 +So I gonna start everything to at a very +high level. + +0:05:28.289,0:05:32.969 +The problem is I have already discovered you come +from a lot of different sort of + +0:05:32.969,0:05:33.819 +backgrounds + +0:05:33.819,0:05:35.180 +and + +0:05:35.180,0:05:36.280 +levels of knowledge + +0:05:36.280,0:05:37.900 +and so + +0:05:37.900,0:05:42.620 +the way that I find works best to sort of +be useful to everybody is that three pass + +0:05:42.620,0:05:43.860 +algorithm + +0:05:43.860,0:05:49.060 +so what I will do is start the first pass a very +broad brush high level + +0:05:49.060,0:05:50.569 +description of what's going on + +0:05:50.569,0:05:54.719 +and then I will go back and I'll go through the +same material again but at a lower level of + +0:05:54.719,0:05:55.300 +detail + +0:05:55.300,0:05:59.939 +then I finally go back and go through a very nittily +low-level of detail + +0:05:59.939,0:06:04.649 +and the fact of this is if you are learning new stuff +as I'm doing the high-level thing + +0:06:04.649,0:06:08.649 +you are gonna be utterly washed by the time I get to +low level niggly details + +0:06:08.649,0:06:10.699 +but since I'm going to do it topic by topic + +0:06:10.699,0:06:14.190 +when I get to the end of one of those nearly +low level niggly details + +0:06:14.190,0:06:17.900 +I'll give you a clue as I will say ""Brain +reset, I'm starting a new topic"" so even if + +0:06:17.900,0:06:19.330 +you're completely lost + +0:06:19.330,0:06:23.530 +you can now start listening again plus I'm gonna get +the broad brush up again. + +0:06:23.530,0:06:27.059 +okay and for those of you that know a lot of +this stuff already + +0:06:27.059,0:06:31.770 +you'll probably find the broad brush rather boring + +0:06:31.770,0:06:35.759 +but by the time we get down to nearly low level +details I think you'll actually + +0:06:35.759,0:06:37.860 +pick up some things that you will find + +0:06:37.860,0:06:39.710 +useful and interesting. + +0:06:39.710,0:06:43.759 +So in this way hopefully everybody will +get some + +0:06:43.759,0:06:47.699 +useful percentage of material out of the course. + +0:06:47.699,0:06:49.599 +I am gonna start out by just + +0:06:49.599,0:06:53.089 +walking through and giving you the + +0:06:53.089,0:06:56.919 +outline of what we're going to try and do here +here + +0:06:56.919,0:07:01.169 +As I said we're going to go roughly + +0:07:01.169,0:07:03.270 +just about two-and-an-half hours of lecture + +0:07:03.270,0:07:04.729 +about two hours forty minutes + +0:07:04.729,0:07:06.499 +per week + +0:07:06.499,0:07:07.619 +and + +0:07:07.619,0:07:11.770 +so we will start off this week with an introduction. + +0:07:11.770,0:07:13.860 +This is as I said we're going to start from the +top + +0:07:13.860,0:07:15.749 +and then just start working our way down + +0:07:15.749,0:07:19.350 +so the general thing I'm going to do is +to talk about the interface + +0:07:19.350,0:07:21.439 +%uh which is something that you + +0:07:21.439,0:07:25.319 +are presumably fairly familiar with since +you've worked with that system + +0:07:25.319,0:07:27.249 +and then + +0:07:27.249,0:07:29.739 +you have to sort of layout terminology + +0:07:29.739,0:07:32.080 +although we use normal English words + +0:07:32.080,0:07:34.419 +they have + +0:07:34.419,0:07:38.580 +sometimes rather bizarre meanings compared to their +common usage + +0:07:38.580,0:07:39.220 +and + +0:07:39.220,0:07:42.330 +so I will just sort of lay out the terminology +lay out the + +0:07:42.330,0:07:45.750 +the way we talk about how the system is structured + +0:07:45.750,0:07:50.780 +and this week we will also talk about the +basic services ""What is it that the kernel is + +0:07:50.780,0:07:52.929 +providing for us?"" + +0:07:52.929,0:07:54.060 +and then of course + +0:07:54.060,0:07:58.499 +we'll proceed to dive down in and and see how +that is done + +0:07:58.499,0:07:59.970 +so here in + +0:07:59.970,0:08:01.400 +Week number 2 + +0:08:01.400,0:08:05.450 +we're gonna look at the system from the +perspective of + +0:08:05.450,0:08:07.039 +something that + +0:08:07.039,0:08:08.720 +manages processes. + +0:08:08.720,0:08:12.170 +One way of looking at the kernel is it's really +just a + +0:08:12.170,0:08:16.440 +the resource manager and the resource that +its managing are things going to do with processes + +0:08:16.440,0:08:19.460 +So we'll look at a process, what the structure of +it is + +0:08:19.460,0:08:20.649 +and + +0:08:20.649,0:08:23.559 +talk about the different ways that they can +be structured. + +0:08:23.559,0:08:28.379 +Process can for example be an address space +and can have one thread running in it can have + +0:08:28.379,0:08:29.749 +multiple threads running in it. + +0:08:29.749,0:08:34.620 +so we'll talk about the different ways +that we think a process is. + +0:08:34.620,0:08:38.480 +We will look at the management of those processes + + +0:08:38.480,0:08:39.239 +we've got + +0:08:39.239,0:08:42.020 +to lay out the bits and pieces that +need to be managed + +0:08:42.020,0:08:44.660 +and then talk about + +0:08:44.660,0:08:47.190 +how we do that. + +0:08:47.190,0:08:51.740 +we'll talk about jails.. this is something +that you currently find only in FreeBSD + +0:08:51.740,0:08:55.060 +hasn't made it into + +0:08:55.060,0:08:56.320 +Linux yet although + +0:08:56.320,0:09:01.630 +the concept is being actively worked +on so my guess is that you'll see that + +0:09:01.630,0:09:03.500 +fairly soon. + +0:09:03.500,0:09:06.360 +we'll also then talk about scheduling + +0:09:06.360,0:09:10.579 +which is in essence how we decide what gets +to run, when it gets to run, how long it gets + +0:09:10.579,0:09:13.500 +to run, etc. + +0:09:13.500,0:09:14.330 +okay + +0:09:14.330,0:09:19.020 +The week after that we will go into virtual +memory. + +0:09:19.020,0:09:23.800 +Signals aren't really part of virtual memory +but they didn't fit into next week's + +0:09:23.800,0:09:26.400 +material so I just would dropped that at the +beginning + +0:09:26.400,0:09:29.850 +but the bulk of Week 3 is going to +be + +0:09:29.850,0:09:32.019 +the management of Virtual Memory. So we've got + +0:09:32.019,0:09:35.119 +a bunch of physical memory, a bunch of +processes that are + +0:09:35.119,0:09:37.940 +trying to use their address spaces + +0:09:37.940,0:09:39.590 +and we will talk about + +0:09:39.590,0:09:41.410 +essentially how you will make that all work + +0:09:41.410,0:09:43.510 +It's called a virtual memory because it's + +0:09:43.510,0:09:47.420 +sort of a cheat. We promise you the world and +then we deliver you + +0:09:47.420,0:09:51.480 +as small number of pages as we think we +can get away with. + +0:09:51.480,0:09:56.420 +Okay. So the first three weeks then essentially +get us through + +0:09:56.420,0:09:58.340 +looking at the world as if it was all + +0:09:58.340,0:10:00.560 +all about processes. + +0:10:00.560,0:10:03.880 +Then in Week 4 we change gears. we say +okay well you know + +0:10:03.880,0:10:07.570 +the kernel isn't just all about processes. You can sort of +look at it orthogonally and you can + +0:10:07.570,0:10:10.000 +say it's really just a giant I/O switch + +0:10:10.000,0:10:12.910 +it's just like a traffic cop that's just managing +these + +0:10:12.910,0:10:14.860 +I/O streams + +0:10:14.860,0:10:15.450 +and + +0:10:15.450,0:10:18.610 +so let's look at it from that perspective. + +0:10:18.610,0:10:19.310 +And + +0:10:19.310,0:10:24.740 +we'll start with special files, again this +sort of the interface when you talk about UNIX + +0:10:24.740,0:10:25.880 +systems, when you talk about + +0:10:25.880,0:10:27.950 +what's normally /dev + +0:10:27.950,0:10:34.170 +interface that gets you access +to the various I/O streams that are available + +0:10:34.170,0:10:37.220 +and we'll look at how that's organized and +the structure of it + +0:10:37.220,0:10:41.840 +which used to be fairly simple but in the +last decade has gotten + +0:10:41.840,0:10:43.670 +incredibly complicated. + +0:10:43.670,0:10:48.540 +We will also talk about pseudo terminals in +job control + +0:10:48.540,0:10:53.330 +this is about as interesting as watching the +grass grow but unfortunately it's + +0:10:53.330,0:10:55.490 +a major component of the system + +0:10:55.490,0:10:59.520 +and especially people that deal with system +administration have to know far more about + +0:10:59.520,0:11:06.520 +this than they probably ever thought they +wanted to. + +0:11:06.900,0:11:11.430 +Okay we will then continue in Week 5 with +the kernel I/O structure, + +0:11:11.430,0:11:16.090 +We will start with multiplexing of I/O. The +kernel of course has done this + +0:11:16.090,0:11:17.360 +always + +0:11:17.360,0:11:22.110 +but we're really talking more about how do +we export I/O multiplexing to + +0:11:22.110,0:11:25.970 +user applications. + +0:11:25.970,0:11:29.250 +We will then move into auto configuration strategy + +0:11:29.250,0:11:31.370 +Auto configuration + +0:11:31.370,0:11:32.770 +is what happens + +0:11:32.770,0:11:36.619 +typically or historically I guess you +could say as the system boots. + +0:11:36.619,0:11:39.500 +so all that stuff that comes out about + +0:11:39.500,0:11:40.810 +what + +0:11:40.810,0:11:43.550 +hardwares are on the machine and how it's all +interconnected + +0:11:43.550,0:11:47.350 +all of that is tied up in auto configuration + +0:11:47.350,0:11:50.040 +and that used to happen just once it boots + +0:11:50.040,0:11:52.000 +but in modern systems today + +0:11:52.000,0:11:55.839 +it's an ongoing process. It happens at boot +but it also happens + +0:11:55.839,0:12:00.550 +anytime you plug a new I/O device, a +PCMCIA card, + +0:12:00.550,0:12:03.680 +or you remove a disk or you put in a new disk. + +0:12:03.680,0:12:07.010 +or any sort of activity that changes the I/O + +0:12:07.010,0:12:08.360 +structure of the machine + +0:12:08.360,0:12:10.870 +auto configuration has to get fired back up + +0:12:10.870,0:12:13.050 +and figure out what's disappeared + +0:12:13.050,0:12:18.330 +and cleanup and figure out what new has arrived +to configure it in. + +0:12:18.330,0:12:19.320 +and then we'll talk + +0:12:19.320,0:12:23.870 +a little bit about the configuration of the +device driver + +0:12:23.870,0:12:27.390 +this actually gets into an area that + +0:12:27.390,0:12:28.660 +is + +0:12:28.660,0:12:33.440 +one well let me just give it as a bit +of advice to the class especially those of + +0:12:33.440,0:12:36.780 +you who work in system administration. + +0:12:36.780,0:12:42.010 +You really want to be careful that +you don't learn too much about device drivers + +0:12:42.010,0:12:44.670 +because there is really these three things that + +0:12:44.670,0:12:48.580 +it's not good to learn about and if you do +learn about it it's really good to keep it + +0:12:48.580,0:12:49.740 +to yourself + +0:12:49.740,0:12:51.949 +because if you become an expert or + +0:12:51.949,0:12:54.960 +viewed as an expert in any of these areas + +0:12:54.960,0:12:59.370 +you will become the designated stuccy for +that and your site you'll never get to do + +0:12:59.370,0:13:01.760 +anything + +0:13:01.760,0:13:02.610 +but that + +0:13:02.610,0:13:07.360 +so The three things that I highly +recommend not learning very much about are + +0:13:07.360,0:13:09.060 +device drivers, + +0:13:09.060,0:13:12.320 +send mail configuration files + +0:13:12.320,0:13:13.970 +or anything having to do + +0:13:13.970,0:13:19.350 +with LDAP or anything in +that general domain + +0:13:19.350,0:13:22.660 +because as I say + +0:13:22.660,0:13:24.900 +that will become your life's work + +0:13:24.900,0:13:25.920 +and + +0:13:25.920,0:13:32.920 +there's other things that you might find more interesting. +""Do you have a question?"" + +0:13:33.870,0:13:36.659 +so one of my students empathizes with my point + +0:13:36.659,0:13:39.640 +I believe you said you worked on that mail +system + +0:13:39.640,0:13:43.120 +so you you might know something about +Sendmail configuration files but you don't + +0:13:43.120,0:13:47.850 +have to answer that + +0:13:47.850,0:13:52.100 +okay so we're going to talk about what a device +driver does and really just sort of the entry + +0:13:52.100,0:13:53.170 +points to it + +0:13:53.170,0:13:57.180 +but we're not going to talk about how you +write such a thing, how you debug such a thing + +0:13:57.180,0:14:01.490 +or much of anything about it. I actually used +to teach an entire class believe it or not + +0:14:01.490,0:14:02.720 +about device drivers + +0:14:02.720,0:14:05.849 +but then I realized the error of my ways and I have +since + +0:14:05.849,0:14:12.580 + gone through and made a point of forgetting +every slide in that talk. + +0:14:12.580,0:14:16.860 +okay so then we will move on to File system + +0:14:16.860,0:14:21.540 +and as always we'll start at the high level +talk about the interface what is it that is + +0:14:21.540,0:14:23.020 +exported out of the system + +0:14:23.020,0:14:27.840 +and then we will start diving down in the C and +how do we go about implementing that + +0:14:27.840,0:14:29.010 +so + +0:14:29.010,0:14:31.010 +we'll start with the + +0:14:31.010,0:14:32.560 +so called + +0:14:32.560,0:14:33.680 +Block I/O system + +0:14:33.680,0:14:36.140 +it's historically been called buffer +cache + +0:14:36.140,0:14:38.590 +and you still hear it called that periodically + +0:14:38.590,0:14:42.720 +and the fact of the matter is that there isn't really +about buffer cache anymore, there is just one big + +0:14:42.720,0:14:44.620 +cache in it.Its the VM cache + +0:14:44.620,0:14:47.810 +and the Filesystem has a view into it +and + +0:14:47.810,0:14:50.829 +the processes have a view into it but at +the end of the day + +0:14:50.829,0:14:54.660 +you really don't want the same information +on two different + +0:14:54.660,0:14:56.030 +pages of memory + +0:14:56.030,0:14:59.390 +because that just leads to trouble. + +0:14:59.390,0:15:03.390 +But Filesystems think they have buffers and so +there's this maneuver where we make + +0:15:03.390,0:15:06.149 +these things that look like what historically +were buffers + +0:15:06.149,0:15:08.830 +that really just map into VM system + +0:15:08.830,0:15:11.720 +but they're still managed in the way that +they have been + +0:15:11.720,0:15:15.020 +managed historically + +0:15:15.020,0:15:20.670 +okay We will then get down into Filesystem implementation +the local file system if you will + +0:15:20.670,0:15:23.400 +and into also + +0:15:23.400,0:15:25.730 +soft updates and snapshots. + +0:15:25.730,0:15:26.440 + this + +0:15:26.440,0:15:31.100 +for the time being is something that you see +only in FreeBSD + +0:15:31.100,0:15:35.310 +the alternative to soft updates is journalling +which is %uh more commonly used + +0:15:35.310,0:15:39.630 +for example what is used by ext3 + +0:15:39.630,0:15:41.179 +and so I'll go through soft updates and + +0:15:41.179,0:15:45.260 +a lot of the issues in soft updates are the +same issues that you have to deal with journalling + +0:15:45.260,0:15:48.370 +what is it that we're protecting and how do we +go about doing that + +0:15:48.370,0:15:51.150 +and the difference is in the detail. + +0:15:51.150,0:15:54.630 +There is actually a paper in the back to your +notes if this is something that interests + +0:15:54.630,0:15:55.240 +you + +0:15:55.240,0:15:59.930 +it's a comparison of journalling versus +soft updates that was done + +0:15:59.930,0:16:02.120 +about five or eight years ago. + +0:16:02.120,0:16:08.460 +and not to spoil the punch line but the answers +they both work about are the same + +0:16:08.460,0:16:12.500 +Okay snapshots again is something that +if + +0:16:12.500,0:16:15.920 +you've worked with things like the network +appliance box you're probably quite + +0:16:15.920,0:16:19.640 +aware of what snapshots are and how they do +or don't work for you + +0:16:19.640,0:16:21.959 +this is the same functionality + +0:16:21.959,0:16:27.380 +in the Filesystem implemented in a +somewhat different way + +0:16:27.380,0:16:28.449 +okay so this + +0:16:28.449,0:16:31.940 +Week 6 is really going to be the local +file system + +0:16:31.940,0:16:34.750 +the disk connected to the machine +that we are dealing with. + +0:16:34.750,0:16:39.140 +Week 7 then we get into multiple +Filesystem support so how do we abstract out that + +0:16:39.140,0:16:41.190 +Filesystem layer + +0:16:41.190,0:16:46.430 +and support Multiple Filesystems at the +same time so for example in FreeBSD + +0:16:46.430,0:16:50.199 +you can of course run with their traditional +fast Filesystem + +0:16:50.199,0:16:54.540 +but if you happen to like the Linux Filesystem +better or you have to share a disk + +0:16:54.540,0:16:55.690 +with a Linux machine + +0:16:55.690,0:16:58.310 +you can run the ext2 or ext3 + +0:16:58.310,0:17:01.020 +and it will perfectly happily do that + +0:17:01.020,0:17:01.620 +so + +0:17:01.620,0:17:05.589 +we will have to look then at how do we provide +interface so that we can plug in all these different + +0:17:05.589,0:17:09.260 +Filesystems that we want to support + +0:17:09.260,0:17:12.250 +another area of which there's been a great + +0:17:12.250,0:17:15.309 +deal of growth at least in code complexity + + +0:17:15.309,0:17:17.840 +is so-called Volume Management + +0:17:17.840,0:17:19.370 +so in the + +0:17:19.370,0:17:24.480 +good old days a Filesystem lived on a disk or +piece of disk and that was that + +0:17:24.480,0:17:26.130 +but in this day and age + +0:17:26.130,0:17:31.150 +that won't do any more so we aggregate disks +together by striping them or RAID + +0:17:31.150,0:17:31.980 +arraying them + +0:17:31.980,0:17:33.380 +or various other things + +0:17:33.380,0:17:39.210 +and we need a whole layer in the system just to +manage those disks + +0:17:39.210,0:17:44.280 +we'll then get to the as an example of an alternative +Filesystem we're going to talk about the + +0:17:44.280,0:17:46.530 +Network Filesystem or NFS + +0:17:46.530,0:17:48.500 +but that's not because this is + +0:17:48.500,0:17:51.090 +the world's best remote file system + +0:17:51.090,0:17:55.240 +or the cleanest design or any of the +properties you might hope that + +0:17:55.240,0:17:57.049 +such a class as this one would have + +0:17:57.049,0:17:58.600 +but it's ubiquitous + +0:17:58.600,0:18:00.210 +very widely used + +0:18:00.210,0:18:01.350 +and + +0:18:01.350,0:18:06.850 +so we're going to talk about that one + +0:18:06.850,0:18:07.740 +okay we'll + +0:18:07.740,0:18:10.970 +then once again switch gears in Week 8 + +0:18:10.970,0:18:17.120 +and turn our attention to of Networking and +Interprocess communication + +0:18:17.120,0:18:18.200 +and + +0:18:18.200,0:18:23.210 +again we'll start from the very top so we'll +go through, we'll go with concepts, the terminology + +0:18:23.210,0:18:24.450 +that gets used + +0:18:24.450,0:18:30.230 +and what's the difference between domain +based addressing and an address domain you know + +0:18:30.230,0:18:30.910 +we'll go through + +0:18:30.910,0:18:34.910 + what the basic IPC services are, + +0:18:34.910,0:18:39.080 +essentially what are all the system calls that +have anything to do with networking + +0:18:39.080,0:18:40.590 +and + +0:18:40.590,0:18:43.720 +just sort of describe what each of them are +and I'm going to go through + +0:18:43.720,0:18:45.830 +a somewhat contrived example + +0:18:45.830,0:18:49.840 +that makes use of every one of those interfaces + +0:18:49.840,0:18:52.860 +and just to sort of show how they all connect +together + +0:18:52.860,0:18:54.169 +and for those of you that work + +0:18:54.169,0:18:57.400 +in networking or had done any kind of network +programming + +0:18:57.400,0:19:00.480 +if you're looking for a week to miss and the +Week 8 is the one to miss that's 'cause that's + +0:19:00.480,0:19:02.780 +the sort of most basic + +0:19:02.780,0:19:04.210 +lecture that I'm going to give + +0:19:04.210,0:19:07.910 +If you are not sure whether or not you need to +go through that, there is + +0:19:07.910,0:19:09.540 +one of the papers in the back + +0:19:09.540,0:19:12.620 +it is an introduction to Interprocess communication + +0:19:12.620,0:19:18.279 +read that paper if you say yeah yeah yeah +yeah yeah you are done with Week 8. + +0:19:18.279,0:19:20.590 +on the other hand if you don't come to Week +8 + +0:19:20.590,0:19:22.790 +and then in Week 9 I say + +0:19:22.790,0:19:26.860 +I call on you and say alright what is it + +0:19:26.860,0:19:30.560 +that listen system call does and you +can't tell me + +0:19:30.560,0:19:32.610 +you're gonna get a demerit + +0:19:32.610,0:19:34.340 +okay + +0:19:34.340,0:19:37.770 +then in Week 9 we will get into the actual + +0:19:37.770,0:19:41.419 +networking implementation itself, we go +through system layers as we did + +0:19:41.419,0:19:43.310 +in all the other areas + +0:19:43.310,0:19:44.130 +and + +0:19:44.130,0:19:48.330 +we will spend a significant portion of that +class talking about routing + +0:19:48.330,0:19:50.230 +routing + +0:19:50.230,0:19:53.610 +for those of you that haven't had the pleasure +of dealing with it + +0:19:53.610,0:19:55.540 +is a black art + +0:19:55.540,0:19:58.050 +or at least a dark science + +0:19:58.050,0:19:59.170 +and + +0:19:59.170,0:19:59.930 +so + +0:19:59.930,0:20:02.490 +we'll talk about it + +0:20:02.490,0:20:06.270 +from the perspective first of all of what +do we do locally within the machine + +0:20:06.270,0:20:10.090 +and then what are some of the bigger strategies +that we can use for doing routing + +0:20:10.090,0:20:11.910 +enterprise + +0:20:11.910,0:20:14.840 +wide routing or + +0:20:14.840,0:20:20.190 +area wide routing something like throughout the +state of California or throughout the US whatever + +0:20:20.190,0:20:25.379 +this again like device drivers is really +just sort of a nickel + +0:20:25.379,0:20:26.480 +tour through the + +0:20:27.800,0:20:31.820 +what the choices are what that the basic +strategies are that are used + +0:20:31.820,0:20:33.989 +If you're thinking you're going to walk out +of here + +0:20:33.989,0:20:36.110 +knowing how to set up a routing well sorry + +0:20:36.110,0:20:38.430 +we are not going to get that far + +0:20:38.430,0:20:41.559 +but you should at least have a pretty good idea +of what the issues are + +0:20:41.559,0:20:44.430 +and what the general solutions are + +0:20:44.430,0:20:48.950 +okay then finally in Week 10 well not finally +but next few weeks and + +0:20:48.950,0:20:52.380 +we will go through the Internet Protocols + +0:20:52.380,0:20:54.320 +primarily TCP/IP + +0:20:54.320,0:20:56.560 +and this is + +0:20:56.560,0:20:58.809 +what are the algorithms that are used + +0:20:58.809,0:21:01.030 +and I'm putting a particular emphasis + +0:21:01.030,0:21:03.050 +for this particular class + +0:21:03.050,0:21:05.080 +on + +0:21:05.080,0:21:07.730 +changes that have been made in the protocols + +0:21:07.730,0:21:14.310 +to deal with a lot of the sort of attacks that +we've been seeing the SYN attacks and + +0:21:14.310,0:21:16.880 +that sort of thing + +0:21:16.880,0:21:19.440 +rather than just a straight + +0:21:19.440,0:21:22.440 +iteration of what the actual protocols +are + +0:21:22.440,0:21:24.940 +I'll talk primarily about IPv4 + +0:21:24.940,0:21:31.940 +but I will also try and talk a bit about +IPv6 as well + +0:21:33.510,0:21:35.850 +all right so the first ten weeks are + +0:21:35.850,0:21:38.100 +sort of the kernel course + +0:21:38.100,0:21:40.800 +now we attack two weeks at the end + +0:21:40.800,0:21:42.010 +to talk about + +0:21:42.010,0:21:43.990 +sort of the bigger picture of + +0:21:43.990,0:21:48.240 +System Tuning,Crash dump analysis that level of +thing + +0:21:48.240,0:21:52.940 +The idea is to really consolidate what +we figured out or talked about in the first + +0:21:52.940,0:21:54.710 +ten weeks and + +0:21:54.710,0:21:58.760 +how that applies to tools that we have available +to us to + +0:21:58.760,0:22:00.760 +look at what the system is doing, + +0:22:00.760,0:22:02.649 + analyze what the system is doing + +0:22:02.649,0:22:03.650 +and hopefully + +0:22:03.650,0:22:04.720 +improve + +0:22:04.720,0:22:07.130 +the performance of what the system is doing + +0:22:07.130,0:22:07.750 +and + +0:22:07.750,0:22:12.169 +for the most part the kind of tuning that I'm +talking about is not + +0:22:12.169,0:22:14.740 +going in and hack hack hacking your kernel + +0:22:14.740,0:22:16.510 +because the fact that the matter is + +0:22:16.510,0:22:18.600 +most of the time you can't do that anyway + +0:22:18.600,0:22:22.340 +so it's more looking at it from the perspective +of saying + +0:22:22.340,0:22:26.390 +is this system running badly because it doesn't +have enough memory on it? + +0:22:26.390,0:22:29.470 +or is it running badly because there isn't enough +I/O capacity? + +0:22:29.470,0:22:33.549 +or is it running badly because it's got +enough I/O capacity but + +0:22:33.549,0:22:35.940 +certain drives are being overloaded + +0:22:35.940,0:22:37.309 +or is it + +0:22:37.309,0:22:42.220 +being overrun because we're simply trying +to do too much on this machine? etc. + +0:22:42.220,0:22:45.440 +so that's the sort of level of thing that we're +looking at it + +0:22:45.440,0:22:47.080 +but tied into + +0:22:47.080,0:22:52.130 +lot of concepts that we talked before so we can talk +about active virtual memory + +0:22:52.130,0:22:53.710 +and what that means + +0:22:53.710,0:22:55.120 +and + +0:22:55.120,0:22:58.750 +essentially measure what it is and hopefully +then you will understand in the context of what + +0:22:58.750,0:23:00.690 +we talked about in the VM section + +0:23:00.690,0:23:03.990 +what that really means + +0:23:03.990,0:23:07.460 +the Crash dump analysis is one of these +topics that + +0:23:07.460,0:23:08.730 +you are gonna love or hate + +0:23:08.730,0:23:12.530 +you actually have to deal with crashed +dumps + +0:23:12.530,0:23:13.679 +its people find it invaluable + +0:23:13.679,0:23:15.580 +and if you don't have to deal with Crash dumps + +0:23:15.580,0:23:18.790 +it's an incredible mass of boring detail + +0:23:18.790,0:23:23.240 +the only good part of it is that that's the +whole session is only about an hour long + +0:23:23.240,0:23:25.529 +If it interests you, listen closely + +0:23:25.529,0:23:28.950 +and if it bores you, well, its only an hour long + +0:23:28.950,0:23:32.880 +okay lastly we'll talk a little bit about +security issues + +0:23:32.880,0:23:36.250 +again this is really more to the tools that +are available + +0:23:36.250,0:23:40.750 +to deal with security staff as opposed to a +complete tutorial on + +0:23:40.750,0:23:45.120 +how to implement security so those of you +that deal with security + +0:23:45.120,0:23:48.400 +this is just gonna to be sort of security one oh +one + +0:23:48.400,0:23:50.029 +for those of you + +0:23:50.029,0:23:51.500 +that have but + +0:23:51.500,0:23:54.399 +you'll have to deal with it but haven't really +thought about it + +0:23:54.399,0:23:58.549 +it'll probably scare you to death and +you wonder how to keep the machines from + +0:23:58.549,0:24:02.840 +being hijacked everyday + +0:24:02.840,0:24:08.030 +Okay so that's in essence what we're going +to try and do here + +0:24:08.030,0:24:15.030 +anybody have any comments, questions, thoughts. +No? All right well. + +0:24:16.130,0:24:17.840 +Let's get started + +0:24:17.840,0:24:22.180 +we will be begin on page fifteen with an +overview of the kernel. + +0:24:22.180,0:24:26.040 +Hopefully nobody's lost yet. + +0:24:26.040,0:24:29.310 +What's a kernel? All right. + +0:24:29.310,0:24:31.370 +so starting at the very top + +0:24:31.370,0:24:33.070 +the big broad brush + +0:24:33.070,0:24:35.140 +what we have is + +0:24:35.140,0:24:38.330 +a UNIX virtual machine and + +0:24:38.330,0:24:41.660 + virtual machines are actually something +that has been around + +0:24:41.660,0:24:44.539 +as a concept since the sixties + +0:24:44.539,0:24:48.919 + difference is really just sort of the level +of the interface that people have dealt with + +0:24:48.919,0:24:51.360 +when they talk about Virtual Machines + +0:24:51.360,0:24:53.610 +in the 1960s + +0:24:53.610,0:24:56.770 +computers were these enormous things you would +have + +0:24:56.770,0:24:58.870 +your computer room would be something that'd be + +0:24:58.870,0:25:01.909 +three times the size of this conference +room if you had + +0:25:01.909,0:25:03.230 +a computer + +0:25:03.230,0:25:05.530 +the computer itself was + +0:25:05.530,0:25:07.840 +tall as a refrigerator freezer + +0:25:07.840,0:25:08.950 +imagine + +0:25:08.950,0:25:13.909 +five or eight or ten of these units +side by side that itself made up the computer + +0:25:13.909,0:25:16.080 +that would be one big + +0:25:16.080,0:25:20.030 +for the core processor and the one which +should be the floating point unit and several + +0:25:20.030,0:25:24.080 +of them that would be the memory the core memory +literally the core memory + +0:25:24.080,0:25:29.110 +and then they'd be other rows of these +disk drives which were about the size of the washing + +0:25:29.110,0:25:29.660 +machine + +0:25:29.660,0:25:34.169 +and then behind that since you couldn't store +everything on disks so + +0:25:34.169,0:25:36.300 +then you had rows of tape drives + +0:25:36.300,0:25:37.880 +and then you had this little + +0:25:37.880,0:25:39.610 +set of sort of + +0:25:39.610,0:25:43.330 +munchkins that would run around and and tend +to the machine and they'd mount tapes and take + +0:25:43.330,0:25:46.710 +off tapes and mount disc packs and remove disc packs +because + +0:25:46.710,0:25:49.760 +the drives themselves were very expensive and +so + +0:25:49.760,0:25:53.110 +you wouldn't just as today we have a + + +0:25:53.110,0:25:56.090 +one spindle that was dedicated just to one set +of platters + +0:25:56.090,0:25:57.130 +you could take out a + +0:25:57.130,0:25:59.460 +set of platters and put in another + +0:25:59.460,0:26:02.540 +hundred megabytes set of platters and these are +platters that are + +0:26:02.540,0:26:05.280 +this big around and it's like six or eight +of them and + +0:26:05.280,0:26:09.140 + giant head assemblies they comes rumbling in and +out + +0:26:09.140,0:26:12.440 +anyway one of these giant giant machines + +0:26:12.440,0:26:17.380 +that costs many millions of dollars would run +at about ten + +0:26:17.380,0:26:21.120 +million instructions per second, 10 mips + +0:26:21.120,0:26:21.630 +and 10 mips + +0:26:21.630,0:26:28.330 + was more computing power than anybody +could possibly imagine using in a single application + +0:26:28.330,0:26:28.880 +just + +0:26:28.880,0:26:31.050 +by contrast you know this + +0:26:31.050,0:26:34.070 +four-year-old laptop here is probably on +the order of + +0:26:34.070,0:26:36.440 +one or two hundred mips + +0:26:36.440,0:26:37.140 +but anyway + +0:26:37.140,0:26:40.760 +people couldn't really view what we would +do with a lot of computing power + +0:26:40.760,0:26:44.640 +and the other thing was that you didn't have +a notion of sort of an operating system that had + +0:26:44.640,0:26:45.890 +applications running on it + +0:26:45.890,0:26:46.760 +because + +0:26:46.760,0:26:50.160 +everybody wanted to write straight to +the raw hardware + +0:26:50.160,0:26:51.750 +and so + +0:26:51.750,0:26:55.900 +what IBM who was a big manufacturer +of machines in those days + +0:26:55.900,0:26:59.060 +did what they came up with this thing called +the VM + +0:26:59.060,0:27:00.770 +and this was a little + +0:27:00.770,0:27:02.549 +you'd call an operating system really + +0:27:02.549,0:27:05.130 +but what it did is it cloned + +0:27:05.130,0:27:09.270 +independent copies of the machine that worked just +like the original machines so you could boot + +0:27:09.270,0:27:11.769 +something that you thought it was an operating +system + +0:27:11.769,0:27:13.380 +on top of VM + +0:27:13.380,0:27:16.750 +so you take one least ten mip machines and +it would clone + +0:27:16.750,0:27:20.050 +six identical one mip copies + +0:27:20.050,0:27:22.030 +and then you could boot + +0:27:22.030,0:27:24.700 +whatever you wanted on each one of those machines +so + +0:27:24.700,0:27:29.510 +if you were doing database stuff you would boot your +database because database cannot ran on the raw hardware + +0:27:29.510,0:27:32.920 +or if you're doing payroll who would boot up the payroll +program + +0:27:32.920,0:27:37.950 +or if you actually tried to service +users you could boot a time sharing batch thing + +0:27:37.950,0:27:40.790 +that would read card images and print +stuff out + +0:27:40.790,0:27:44.460 +or they even had TSO the Time Sharing +Option where you could interactively sit + +0:27:44.460,0:27:45.559 +and type and send + +0:27:45.559,0:27:47.560 +stuffs in and get answers back + +0:27:47.560,0:27:48.570 + and + +0:27:48.570,0:27:51.429 +also you could boot TSO so whatever set +of + +0:27:51.429,0:27:52.219 + + +0:27:52.219,0:27:55.339 +things you need you could boot them and they ran +independently as if they were running on their + +0:27:55.339,0:27:56.470 +own machine + +0:27:56.470,0:28:03.150 +but all the VM did was it give you an exact +raw copy of the hardware + +0:28:03.150,0:28:04.529 +so when UNIX came along + +0:28:04.529,0:28:07.350 +they sort of liked the notion of + +0:28:07.350,0:28:11.509 +providing the concept of independent +things that you could operate in + +0:28:11.509,0:28:13.610 +but they wanted it at a higher level + +0:28:13.610,0:28:15.610 +so you're looking really to do it + +0:28:15.610,0:28:17.480 +instead of at the raw hardware level + +0:28:17.480,0:28:19.679 +to do it at a process level + +0:28:19.679,0:28:23.799 +and the idea that then was that the interface you +would program to would be what we think of as + +0:28:23.799,0:28:26.090 +a System call interface today + +0:28:26.090,0:28:27.849 +and the idea then was that + +0:28:27.849,0:28:30.740 +you would be given a process or set of processes + +0:28:30.740,0:28:34.990 +and those were independent. your process +couldn't affect + +0:28:34.990,0:28:38.830 +the address space of another processor. You couldn't reach +over and mess around with their addresses, + +0:28:38.830,0:28:41.030 +you couldn't mess around with their I/O +channels + +0:28:41.030,0:28:43.179 +you could slow them down by + +0:28:43.179,0:28:44.299 +being a pig but + +0:28:44.299,0:28:47.980 +that was about the only way that you could affect +other processes + +0:28:47.980,0:28:48.480 +and + +0:28:48.480,0:28:49.830 +so + +0:28:49.830,0:28:52.669 +what the interfaces that they had there + +0:28:52.669,0:28:58.660 +was one that had these characteristics +had a paged virtual address space + +0:28:58.660,0:29:02.980 +so you didn't have to know as in the old days how much physical +memory is on the machine and make your application + +0:29:02.980,0:29:04.740 +fit into that amount of memory + +0:29:04.740,0:29:07.950 +you just had what looked like a large + +0:29:07.950,0:29:11.710 +uniform address space even if the underlying +hardware had segments or some other + +0:29:11.710,0:29:13.580 +hardware brain damage + +0:29:13.580,0:29:17.390 +it looked to you like he just had a big uniform +address space and + +0:29:17.390,0:29:21.070 +the size of your address space was independent +of the amount of memory that was on your machine + +0:29:21.070,0:29:23.900 +your address space couldn't be bigger than amount of +physical memory + +0:29:23.900,0:29:26.499 +cause we sort of move pages around underneath + +0:29:26.499,0:29:29.320 +whatever part address space was actually +active + +0:29:29.320,0:29:34.260 +and there's obviously limits to this if +you are trying to run a 1 gigabyte of + +0:29:34.260,0:29:35.630 +application on top of + +0:29:35.630,0:29:37.240 +ten megabytes of memory + +0:29:37.240,0:29:40.880 +it's probably going to bring new meaning to +same day service + +0:29:40.880,0:29:45.519 +but if you're willing to wait long enough it +will eventually move the pages around and you will + +0:29:45.519,0:29:49.740 +progress through getting your application run + +0:29:49.740,0:29:53.890 +another thing was dealing with software +interrupts + +0:29:53.890,0:29:55.789 +in the old days + +0:29:55.789,0:29:58.749 +you had to understand how the hardware worked + +0:29:58.749,0:30:03.900 +in order to deal with exceptional conditions +so for example if you did a divide by zero + +0:30:03.900,0:30:08.170 +the hardware would jump through some +vector location or + +0:30:08.170,0:30:08.630 +something + +0:30:08.630,0:30:12.799 +and you had know how that worked and make +sure that you had your program + +0:30:12.799,0:30:16.510 +usually some little bit of assembly language +set up to deal with that + +0:30:16.510,0:30:19.870 +and UNIX said let's let's get away +from the hardware here + +0:30:19.870,0:30:22.080 +and so they did this thing called signals + +0:30:22.080,0:30:25.700 +and so they just define a set of the signals is that +if you do divide by zero + +0:30:25.700,0:30:29.529 +you simply register a routine you +want to have called you don't have to know + +0:30:29.529,0:30:31.220 +how the hardware figured it out + +0:30:31.220,0:30:36.740 +you just know that that routine is going to get +called and you can deal with it at that point + +0:30:36.740,0:30:40.960 +well we got set of timers and counters to keep +track of what we're doing, this is really more + +0:30:40.960,0:30:43.490 +for counting than anything else but + +0:30:43.490,0:30:46.970 +applications may want to have access to that. + +0:30:46.970,0:30:51.720 +we have a set of identifiers that we're +going to use for things like accounting, + +0:30:51.720,0:30:54.830 +protection and scheduling and so on + +0:30:54.830,0:30:55.820 +and one of the + +0:30:55.820,0:31:00.320 +the early philosophies of UNIX was to try +and keep it simple. + +0:31:00.320,0:31:02.630 +operating systems have gotten very baroque + +0:31:02.630,0:31:04.490 +in particular the thing that + +0:31:04.490,0:31:07.350 +pre dated UNIX was a thing called +Multix + +0:31:07.350,0:31:12.820 +Multix was was a joint project between +Honeywell, a big computer manufacturer of the + +0:31:12.820,0:31:15.740 +time + +0:31:15.740,0:31:17.129 +AT&T bell laboratories + +0:31:17.129,0:31:19.750 +the big industrial laboratory at that time + +0:31:19.750,0:31:21.380 +and MIT + +0:31:21.380,0:31:23.430 +a big university then and + +0:31:23.430,0:31:24.690 +still today + +0:31:24.690,0:31:29.259 +and those three organizations got +together to try and build this + +0:31:29.259,0:31:31.400 +time sharing operating system + +0:31:31.400,0:31:32.280 +and it + +0:31:32.280,0:31:33.770 +it just got bigger and + +0:31:33.770,0:31:37.160 +more grandiose and more complex and never +finished + +0:31:37.160,0:31:38.979 +because as soon as they sort of see + +0:31:38.979,0:31:42.709 +oh we know how to do that but we could +do this other thing too and so then they would tear it + +0:31:42.709,0:31:43.429 +apart and + +0:31:43.429,0:31:46.440 +they never really got to something that + +0:31:46.440,0:31:48.210 +could be put into production + +0:31:48.210,0:31:49.919 +and so the + +0:31:49.919,0:31:50.570 +AT&T + +0:31:50.570,0:31:54.340 +Bell laboratories decided to pull out of +that project + +0:31:54.340,0:31:55.940 +and + +0:31:55.940,0:32:00.000 +the two of the people that had been working on +that project, Ken Thompson and Dennis Richie + +0:32:00.000,0:32:04.390 +were sort of bummed because they were now +back to typing cards and putting them through + +0:32:04.390,0:32:05.259 +card readers and + +0:32:05.259,0:32:07.960 +they had gotten used to the idea that you could +actually + +0:32:07.960,0:32:11.559 +sit at an ASSR33 teletype and interact +with your computer + +0:32:11.559,0:32:13.440 +and so + +0:32:13.440,0:32:18.230 +they found an old %uh PDP-8 sitting off in +the corner that had been abandoned + +0:32:18.230,0:32:22.120 +and started working on this little tiny operating +system which they called UNIX + +0:32:22.120,0:32:26.549 +which eventually moved to the PDP-11 and +became what we have today + +0:32:26.549,0:32:28.050 +but because it was + +0:32:28.050,0:32:32.120 +they were coming first of all from Multix +where everything had been done and + +0:32:32.120,0:32:34.110 +in great grandiose detail + +0:32:34.110,0:32:37.549 +and because they're fundamentally were two + of them working on it and they wanted to get something + +0:32:37.549,0:32:38.370 +done and + +0:32:38.370,0:32:40.130 +within a year or so + +0:32:40.130,0:32:41.529 +one of their philosophies was + +0:32:41.529,0:32:44.099 +let's find the one way of doing things + +0:32:44.099,0:32:48.180 +let's not have eight ways from Sunday let's just +get the one way + +0:32:48.180,0:32:53.860 +and that's what we will provide. So what is +the sort of core set of things that we need. + +0:32:53.860,0:32:58.620 +well first thing is when it comes to identifiers, +let's not have you know + +0:32:58.620,0:33:00.430 +eighty thousand different identifiers + +0:33:00.430,0:33:03.140 +so they came up with process identifiers, + +0:33:03.140,0:33:09.620 +user identifier and at that time a single group +identifier and later expanded + +0:33:09.620,0:33:14.200 +and they used that sort of identifiers for everything +so its used for counting, used for making + +0:33:14.200,0:33:17.410 +protection decisions, used for scheduling +decisions + +0:33:17.410,0:33:19.470 +and + +0:33:19.470,0:33:24.279 +again it was the simplicity of thing which +was what was driving their decision + +0:33:24.279,0:33:28.840 +but they're really sort of two key ideas +that they had + +0:33:28.840,0:33:30.880 +that really made the difference that + +0:33:30.880,0:33:32.539 +that's what set them up side + +0:33:32.539,0:33:34.749 +from what everybody else had done before them + +0:33:34.749,0:33:35.450 +and which + +0:33:35.450,0:33:39.740 +in retrospect is something that has been pervasive +more or less ever since + +0:33:39.740,0:33:41.869 +the first of these was the notion + +0:33:41.869,0:33:44.840 +that we have a unique descriptor space + +0:33:44.840,0:33:46.289 +that is + +0:33:46.289,0:33:51.250 +given a descriptor it can reference +any I/O device + +0:33:51.250,0:33:53.650 +so or even any kind of I/O channel + +0:33:53.650,0:33:58.270 +so you can have a descriptor for terminal +or descriptor for a file or descriptive for + +0:33:58.270,0:34:02.240 +a disk or descriptor for a pipe or descriptor +for a socket + +0:34:02.240,0:34:03.500 +and + +0:34:03.500,0:34:04.790 +you don't need to know + +0:34:04.790,0:34:07.940 +what it references in order to be able to read +and write that thing + +0:34:07.940,0:34:11.290 +so if I hand you a descriptor +you can read from that the descriptor or you can write + +0:34:11.290,0:34:13.259 +to that descriptor + +0:34:13.259,0:34:15.189 +and + +0:34:15.189,0:34:17.359 +the correct thing will happen + +0:34:17.359,0:34:19.089 +and you'd say well + +0:34:19.089,0:34:23.629 +that's so obvious I mean how else could you +possibly think of doing it? + +0:34:23.629,0:34:25.179 +well predating UNIX + +0:34:25.179,0:34:28.059 +everything was done with + +0:34:28.059,0:34:29.379 +a little subsystem + +0:34:29.379,0:34:33.419 +that would open a file, read a file, write a +file, close a file + +0:34:33.419,0:34:37.429 +and there was another set of system calls which +would open a terminal, read a terminal, write terminal, + +0:34:37.429,0:34:38.089 +close terminal + +0:34:38.089,0:34:39.210 +and yet another one + +0:34:39.210,0:34:42.409 +which was create a pipe, read a pipe, +write a pipe and so on. + +0:34:42.409,0:34:47.699 +so if you are just a drop dead stupid +program like say CAD + +0:34:47.699,0:34:51.579 +you would have to have code in there and say was +my input a terminal which in case I need to + +0:34:51.579,0:34:53.159 +use the read terminal + +0:34:53.159,0:34:57.419 +or is it a file which in case I need +to use read file or is it a pipe in which in case + +0:34:57.419,0:34:59.189 +I need to use read pipe + +0:34:59.189,0:35:01.860 +and so the program itself had to have all +this + +0:35:01.860,0:35:02.859 +coding in it + +0:35:02.859,0:35:04.409 +whereas when they went to + +0:35:04.409,0:35:07.159 +the uniform descriptor space + +0:35:07.159,0:35:09.630 +CAD doesn't know it doesn't need to know +it just says + +0:35:09.630,0:35:10.819 +read my input, + +0:35:10.819,0:35:13.979 +write the output + +0:35:13.979,0:35:17.059 +and it works and we add a new type of descriptor + +0:35:17.059,0:35:17.600 +and + +0:35:17.600,0:35:21.700 +CAD just continues to work just as it always +did. + +0:35:21.700,0:35:24.199 +So this proved to be a very powerful construct + +0:35:24.199,0:35:27.019 +and pretty much every operating system after +UNIX + +0:35:27.019,0:35:28.659 +did that there's + +0:35:28.659,0:35:30.210 +one exception of %uh + +0:35:30.210,0:35:32.549 +large company in the Pacific North-West + +0:35:32.549,0:35:35.830 +that still has not quite uniform descriptor +space + +0:35:35.830,0:35:38.380 +but %uh that's part of their legacy that really + +0:35:38.380,0:35:39.900 +they're working on that. + +0:35:39.900,0:35:42.009 +Longhorn will be here. + +0:35:42.009,0:35:43.939 +and anyway + +0:35:43.939,0:35:46.190 +this set of facilities then + +0:35:46.190,0:35:50.150 +makes up the UNIX virtual machine + +0:35:50.150,0:35:51.559 +and + +0:35:51.559,0:35:55.559 +in some sense we still see virtual machines +being used today in fact we're seeing sort + +0:35:55.559,0:35:56.749 +of a reversion + +0:35:56.749,0:36:01.429 +back to some of the IBM stuff in things +like the VMware + +0:36:01.429,0:36:03.079 +which is + +0:36:03.079,0:36:07.029 +essentially allow you to go back to booting +native operating systems again so sort of + +0:36:07.029,0:36:08.280 +interesting to watch + +0:36:08.280,0:36:09.060 +that the sort of + +0:36:09.060,0:36:12.919 +pendulum of back going back and forth +of what's the correct layer + +0:36:12.919,0:36:14.609 +for for doing + +0:36:14.609,0:36:18.890 +virtual machines + +0:36:18.890,0:36:22.499 +Okay? so far so good? + +0:36:22.499,0:36:24.719 +all right so I said that there were + +0:36:24.719,0:36:27.160 +two key ideas that UNIX had + +0:36:27.160,0:36:30.279 +the first of these being the uniform descriptor +space + +0:36:30.279,0:36:35.819 +the second one which was really critical was +this notion of processes as a commodity + +0:36:35.819,0:36:37.309 +item + +0:36:37.309,0:36:40.220 +so here on Page 17 I've tried to lay +it out + +0:36:40.220,0:36:41.090 +the + +0:36:41.090,0:36:44.159 +that the components that make up a process + +0:36:44.159,0:36:45.759 +and + +0:36:45.759,0:36:50.359 +what do I really mean when I say a process as +a commodity item + +0:36:50.359,0:36:53.650 +okay leading up to + +0:36:53.650,0:36:54.689 +UNIX + +0:36:54.689,0:36:56.800 +the systems that pre-dated it, + +0:36:56.800,0:36:59.200 +processes were these very large + +0:36:59.200,0:37:02.169 +heavyweight expensive things + +0:37:02.169,0:37:02.779 +and + +0:37:02.779,0:37:04.539 +if you look at + +0:37:04.539,0:37:08.629 +MVS which was the operating system +that ran on IBM for doing multiple processing + +0:37:08.629,0:37:10.509 +and + +0:37:10.509,0:37:13.799 +the system administrator would decide at boot +time + +0:37:13.799,0:37:17.019 +what degree of multiprocessing they wish +to support + +0:37:17.019,0:37:18.140 +so they'd say well + +0:37:18.140,0:37:20.739 +well, we'll let upto six things happen at once + +0:37:20.739,0:37:22.490 +and so as part of booting up + +0:37:22.490,0:37:24.419 +they would create six + +0:37:24.419,0:37:25.349 +processes + +0:37:25.349,0:37:30.059 +and now you as a user if you wanted to do +something let's say you wanted to + +0:37:30.059,0:37:32.009 +compile and run a program + +0:37:32.009,0:37:34.960 +you would be given a process + +0:37:34.960,0:37:36.019 +and it was up to you + +0:37:36.019,0:37:39.369 +to figure out how to stage what you needed +done + +0:37:39.369,0:37:39.819 +and + +0:37:39.819,0:37:43.930 +that this was often fairly complex + +0:37:43.930,0:37:47.880 +and so you would have to write out all the +steps that you wanted + +0:37:47.880,0:37:50.300 +in this wonderful thing called JCL + +0:37:50.300,0:37:52.259 +Job Control Language. + +0:37:52.259,0:37:56.650 +Job Control Language was send mail configuration +file of the sixties + +0:37:56.650,0:38:00.679 +there where people whose sole job at the company +was how to put this stuff together 'cause + +0:38:00.679,0:38:04.189 +all you had to do is get one extra space or +a missing comma + +0:38:04.189,0:38:05.000 +something in there + +0:38:05.000,0:38:08.630 +and the whole thing would just blow up. it would +just sort of spit the card deck back at + +0:38:08.630,0:38:09.799 +you and say well + +0:38:09.799,0:38:13.500 +somewhere in there is a mistake that's sort of +in the general area of this card + +0:38:13.500,0:38:15.549 +and I can't deal with it. Fix it. + +0:38:15.549,0:38:16.489 +and of course + +0:38:16.489,0:38:20.550 +in those days it wasn't just a matter of hitting +carriage when you know make carriage return you have to + +0:38:20.550,0:38:25.239 +get your deck pull out the card, and type the +new one, put it back in and re-submit it + +0:38:25.239,0:38:28.729 +As heaven forbid you couldn't touch that +card reader you know, it had to be done by + +0:38:28.729,0:38:29.970 +an operator + +0:38:29.970,0:38:32.869 +so the card deck will read through it would +disappear and + +0:38:32.869,0:38:36.800 +you know if you're lucky a few minutes later +if you were not lucky a few hours later + +0:38:36.800,0:38:37.849 +you would get + +0:38:37.849,0:38:39.570 +a print out + +0:38:39.570,0:38:43.419 +which was what had happened and then you could +look at it and you know + +0:38:43.419,0:38:47.209 +I put a comma in the wrong place I guess +I get to do it all again + +0:38:47.209,0:38:49.930 +so + +0:38:49.930,0:38:54.940 +the thing you would need to do there for compiling and running a program + +0:38:54.940,0:38:59.579 +was you'd have to break into these steps. well +I need to run the preprocessor + +0:38:59.579,0:39:04.670 +and so clean out whatever gump that was left +over on that process from the previous user + +0:39:04.670,0:39:06.240 +put the preprocessor in there + +0:39:06.240,0:39:10.530 +and then read from this file here let's +say I gotta put it somewhere so creative + +0:39:10.530,0:39:12.510 +scratch file over on this disk and + +0:39:12.510,0:39:17.299 +it was excruciating detail like how many cylinders +and how many tracks and this and that + +0:39:17.299,0:39:19.139 +blocks blah blah blah + +0:39:19.139,0:39:23.119 +and don't forget any of those parameters 'cause +it'll spit it out if you do + +0:39:23.119,0:39:26.890 +and so then it would run the first step in that +if its successful then you'd have sitting + +0:39:26.890,0:39:28.899 +in this scratch file that you had created + +0:39:28.899,0:39:33.100 +the output of the preprocessor and then +you'd load the first pass of the compiler + +0:39:33.100,0:39:36.930 +and you say now read from that scratch file +and create this other scratch file over here and + +0:39:36.930,0:39:39.450 +when thats successful and we need to delete that +one + +0:39:39.450,0:39:43.830 +and then load the second pass, put that back +into another scratch file and then we run this + +0:39:43.830,0:39:45.950 +assembler, and the optimizer then the + +0:39:45.950,0:39:47.750 +loader this and that + +0:39:47.750,0:39:49.410 +finally run the program + +0:39:49.410,0:39:50.900 +and if all goes well + +0:39:50.900,0:39:57.029 +you know at step sixteen out comes the answer + +0:39:57.029,0:39:58.129 +forty two. so UNIX + +0:39:58.129,0:40:00.819 +said, look this is silly + +0:40:00.819,0:40:02.880 +a lot of this is just + +0:40:02.880,0:40:04.310 +bookkeeping + +0:40:04.310,0:40:07.249 +and computers do bookkeeping really well + +0:40:07.249,0:40:12.179 +and you'll recall yeah but it's going to take +all these cycles it's like + +0:40:12.179,0:40:16.309 +computers are supposed to be labor-saving +devices right? so + +0:40:16.309,0:40:20.150 +they came up with this notion that they would +create processes on the fly as needed + +0:40:20.150,0:40:21.159 +you had + +0:40:21.159,0:40:25.549 +you've had a preprocessor in two +steps of the compiler and then + +0:40:25.549,0:40:27.109 +optimizer and then a loader + +0:40:27.109,0:40:29.410 +we just create Boom seven processes + +0:40:29.410,0:40:31.920 +and we connect them together with pipes + +0:40:31.920,0:40:35.180 +and so we take the input and you know run +through in + +0:40:35.180,0:40:38.270 +through the pipes and you know out the end +you get the the + +0:40:38.270,0:40:39.629 +executable + +0:40:39.629,0:40:40.030 +and + +0:40:40.030,0:40:42.880 +we will simply create each of these processes + +0:40:42.880,0:40:44.650 +and + +0:40:44.650,0:40:46.549 +so you as a user just + +0:40:46.549,0:40:49.479 +type you know the C compiler and it just + +0:40:49.479,0:40:52.429 +fork these things pipe them together got the result + +0:40:52.429,0:40:53.640 +and + +0:40:53.640,0:40:57.509 +then once it was done with this processes is +just threw them away so any time you'd create a + +0:40:57.509,0:41:00.479 +new process and it came to you pristine clean + +0:41:00.479,0:41:04.239 +and you needed a bunch of things it did +put everything in intermediate files + +0:41:04.239,0:41:07.549 +the fact of the matter is in the early days + +0:41:07.549,0:41:08.129 +those computers + +0:41:08.129,0:41:11.910 +didn't really have enough memory to support +all that stuff at once so + +0:41:11.910,0:41:15.809 +behind you those pipes were actually implemented +as files + +0:41:15.809,0:41:19.319 +but you didn't have at least to remember to create +them and delete them + +0:41:19.319,0:41:20.200 +and deal with them + +0:41:20.200,0:41:24.020 +as far as you were concerned it just look stuff +flowing through pipes and of course today it + +0:41:24.020,0:41:24.490 +just + +0:41:24.490,0:41:27.989 +does flow through pipes in memory + +0:41:27.989,0:41:29.439 +okay so + +0:41:29.439,0:41:33.689 +this notion then that that we're just gonna +create processes on the fly is needed and + +0:41:33.689,0:41:35.559 +connect them together as needed + +0:41:35.559,0:41:38.039 +it was a novel concept + +0:41:38.039,0:41:43.599 +and it wasn't that somehow mysteriously figured +out how to create processes cheaply + +0:41:43.599,0:41:44.839 +cause they hadn't + +0:41:44.839,0:41:46.180 +they were still + +0:41:46.180,0:41:49.959 +really expensive to create + +0:41:49.959,0:41:52.210 +but that extra effort + +0:41:52.210,0:41:53.029 +was + +0:41:53.029,0:41:56.089 +worth it because it was saving a lot of programming +time + +0:41:56.089,0:41:59.809 +so my favorite example is you run ls + +0:41:59.809,0:42:01.810 +so we have to create a process + +0:42:01.810,0:42:04.259 +load the ls binary into it + +0:42:04.259,0:42:06.180 +it prints a line or two on your screen + +0:42:06.180,0:42:10.609 +and we tear the entire thing down and return +all its resources back to the system + +0:42:10.609,0:42:14.979 +more than ninety percent of the cost of running +ls is creating and destroying the process + +0:42:14.979,0:42:19.239 +a tiny fraction of it is actually running +ls + +0:42:19.239,0:42:24.259 +but it goes so fast, who cares right + +0:42:24.259,0:42:25.749 +so the point is that + +0:42:25.749,0:42:30.039 +that concept of just creating things as +needed + +0:42:30.039,0:42:31.780 +again was very powerful + +0:42:31.780,0:42:35.709 +and is one that is just pervasive today + +0:42:35.709,0:42:38.639 +okay so what is a process actually made up +of + +0:42:38.639,0:42:43.179 +it gets some amount of CPU time or at +least we do dearly hope that it gets some + +0:42:43.179,0:42:46.050 +amount of CPU time, the lack of getting +CPU time + +0:42:46.050,0:42:46.670 +that makes it + +0:42:46.670,0:42:47.979 +a computer so sluggish + +0:42:47.979,0:42:49.409 +of course + +0:42:49.409,0:42:51.920 +others really boils down to scheduling + +0:42:51.920,0:42:54.249 +and we're going to talk about scheduling + +0:42:54.249,0:42:56.279 +probably more than you care to + +0:42:56.279,0:42:59.219 +in a couple weeks time + +0:42:59.219,0:43:01.619 +we have the asynchronous events + +0:43:01.619,0:43:04.569 +these are the external events that + +0:43:04.569,0:43:05.659 +are coming in + +0:43:05.659,0:43:07.679 +so + +0:43:07.679,0:43:10.169 +they may be either things that + +0:43:10.169,0:43:14.339 +were coming in from the outside world like +start, stop and quit + +0:43:14.339,0:43:15.279 +oh + +0:43:15.279,0:43:18.170 +out-of-band data arrival notification that kind +of thing + +0:43:18.170,0:43:22.339 +or it may in fact be things that the program +is bringing down upon itself + +0:43:22.339,0:43:25.590 +such as a segment fault, a divide by zero + +0:43:25.590,0:43:26.910 +and some other + +0:43:26.910,0:43:31.959 +what would normally be viewed as incorrect +operation + +0:43:31.959,0:43:35.849 +and so we'll talk about that when we talk about +signals + +0:43:35.849,0:43:37.039 +every program + +0:43:37.039,0:43:38.899 +gets some amount of memory + +0:43:38.899,0:43:42.659 +it gets an initial amount when it starts +up injured generally allocates more as it + +0:43:42.659,0:43:45.229 +goes along + +0:43:45.229,0:43:49.429 +this of course we will deal with very extensively +will spend an entire week on it + +0:43:49.429,0:43:54.249 +when we talk about how virtual memory is implemented + +0:43:54.249,0:43:54.609 +and + +0:43:54.609,0:43:57.429 +then we get I/O descriptors + +0:43:57.429,0:44:02.259 +I used to say that every program had to have +at least one I/O descriptor since + +0:44:02.259,0:44:04.910 +it absolutely had no input + +0:44:04.910,0:44:06.329 +absolutely no output + +0:44:06.329,0:44:09.049 +then it was sort of pointless + +0:44:09.049,0:44:12.900 +of course I had to have one of my students +come up and point out to me there is an a + +0:44:12.900,0:44:13.849 +class of programs + +0:44:13.849,0:44:16.469 +which don't need I/O descriptors + +0:44:16.469,0:44:17.440 +and that is + +0:44:17.440,0:44:19.549 +these things called benchmarks + +0:44:19.549,0:44:23.249 +it just compute something all we really care +about is how long it takes them to compute + +0:44:23.249,0:44:24.959 +we don't actually care what the answer is + +0:44:24.959,0:44:26.019 +In theory we don't + +0:44:26.019,0:44:29.779 +I personally like my benchmark stop with +something so I can see it there + +0:44:29.779,0:44:31.489 +doing computing the right thing + +0:44:31.489,0:44:33.169 +but in theory + +0:44:33.169,0:44:35.919 +that wouldn't be necessary + +0:44:35.919,0:44:38.650 +outside of that class of programs + +0:44:38.650,0:44:42.670 +everything needs some sort of descriptors and +of course we'll talk about descriptors + +0:44:42.670,0:44:43.659 +quite extensively + +0:44:43.659,0:44:47.349 +as we go through the I/O subsystem + +0:44:47.349,0:44:50.969 +okay so the executive summary is that processes +are + +0:44:50.969,0:44:54.969 +the fundamental service that is provided by +UNIX + +0:44:54.969,0:44:58.430 +and + +0:44:58.430,0:45:02.849 +what we're going to spend essentially the +next two and a half weeks working on + +0:45:02.849,0:45:04.769 +is + +0:45:04.769,0:45:07.079 +what what makes up processes + +0:45:07.079,0:45:10.180 +we'll go into much more detail about each of these +four points + +0:45:10.180,0:45:11.769 +and + +0:45:11.769,0:45:13.630 +then how do we actually go about + +0:45:13.630,0:45:14.390 +providing that + +0:45:14.390,0:45:16.639 +bit of service + +0:45:16.639,0:45:17.900 +the next thing that I'm + +0:45:17.900,0:45:22.210 +going to do now is this go through and lay +out some of the terminology that + +0:45:22.210,0:45:23.239 +we have when + +0:45:23.239,0:45:25.130 +we're talking about processes + +0:45:25.130,0:45:29.229 +so this is sort of the big picture here were +on page eighteen + +0:45:29.229,0:45:30.669 +and + +0:45:30.669,0:45:33.669 +you can see we have sort of three bits that +make up + +0:45:33.669,0:45:36.640 +the system + +0:45:36.640,0:45:39.029 +we have the currently running user process + +0:45:39.029,0:45:41.180 +and then what we call the top half of the kernel + +0:45:41.180,0:45:43.699 +and the bottom half of the kernel + +0:45:43.699,0:45:47.049 +now this would be a picture for a uniprocessor + +0:45:47.049,0:45:49.299 +so one CPU + +0:45:49.299,0:45:51.209 +if we had a multiprocessor + +0:45:51.209,0:45:54.009 +%uh then we would have + +0:45:54.009,0:45:57.130 +one instance of the kernel + +0:45:57.130,0:45:59.529 +but multiple instances of the user process + +0:45:59.529,0:46:02.879 +but for any given CPU on a multiprocessor + +0:46:02.879,0:46:05.709 +it is running exactly one process + +0:46:05.709,0:46:09.309 +so you may think they we're running for four-five +processes all at once + +0:46:09.309,0:46:14.319 +but the fact of the matter is that any instant +in time there's only one process which is + +0:46:14.319,0:46:16.299 +actually running + +0:46:16.299,0:46:18.609 +and + +0:46:18.609,0:46:21.429 +that is the one that we have loaded in the system + +0:46:21.429,0:46:25.199 +now we give the illusion that were running +lots of things because we switch between them + +0:46:25.199,0:46:26.100 +rather quickly + +0:46:26.100,0:46:29.269 +so it looks like things are happening in all +windows at once + +0:46:29.269,0:46:31.430 +but in reality + +0:46:31.430,0:46:33.619 +that's not really happening + +0:46:33.619,0:46:36.440 +okay so there is a set of properties that I want to +look at + +0:46:36.440,0:46:40.899 +that had to do with each one of these parts here + +0:46:40.899,0:46:44.359 +but just to sort of look at it from the +big picture perspective + +0:46:44.359,0:46:45.970 +what you see here + +0:46:45.970,0:46:47.180 +is + +0:46:47.180,0:46:51.549 +there is boundary between the user process +and the top half of the kernel + +0:46:51.549,0:46:54.949 +which is really just like a glorified sovereignty +call + +0:46:54.949,0:46:59.539 +it's a lot like calling into a library routine +like calling strcat, strcpy or something + +0:46:59.539,0:47:00.319 +like that + +0:47:00.319,0:47:03.679 +when you do a system call + +0:47:03.679,0:47:05.650 +we take that same set of parameters + +0:47:05.650,0:47:08.009 +now this is sort of + +0:47:08.009,0:47:09.780 +brick Wall here if you will + +0:47:09.780,0:47:11.380 +that is protecting + +0:47:11.380,0:47:13.680 +the top half of the kernel + +0:47:13.680,0:47:15.299 +from the application + +0:47:15.299,0:47:18.899 +I'll go more into some detail about how that +actually gets implemented + +0:47:18.899,0:47:22.729 +but in essence you can think of it +is is there sort of this whaling Wall and these little + +0:47:22.729,0:47:24.990 +chinks there and you can sort of push a request +through + +0:47:24.990,0:47:28.230 +and somebody other sides sort of pulls that +looks at it and decides whether they're going + +0:47:28.230,0:47:28.690 +to + +0:47:28.690,0:47:30.769 +dain to provide service to you + +0:47:30.769,0:47:34.229 +and if they do then they sort of send it back + +0:47:34.229,0:47:37.649 +well like a library where you can just sort +of reach in and walk around if you want to + +0:47:37.649,0:47:38.290 +you + +0:47:38.290,0:47:40.950 +good programming practices you don't do that +but + +0:47:40.950,0:47:43.049 +you could + +0:47:43.049,0:47:44.579 +all right so + +0:47:44.579,0:47:49.089 +the the top half of the kernel is really looks +a lot like + +0:47:49.089,0:47:50.509 +a big library + +0:47:50.509,0:47:53.509 +%uh it just happens to be a library +routines + +0:47:53.509,0:47:57.599 +that deal with things where processes need +to interact with each other + +0:47:57.599,0:48:01.399 +in fact for many people they don't understand +for what's the difference between the C + +0:48:01.399,0:48:03.259 +library and the top half of the kernel + +0:48:03.259,0:48:08.020 +if it's something that you're doing that +no other process needs to know about + +0:48:08.020,0:48:09.799 +then it can be in the C library + +0:48:09.799,0:48:13.829 +so if you call strcat to concatenate two +strings together + +0:48:13.829,0:48:17.599 +nobody else needs to know you're doing that +you don't need to coordinate with anybody + +0:48:17.599,0:48:19.000 +else that you're doing that + +0:48:19.000,0:48:20.160 +it's just happening + +0:48:20.160,0:48:21.979 +so that goes in the C library. + +0:48:21.979,0:48:24.489 +on the other hand if you're reading or writing +the file + +0:48:24.489,0:48:28.029 +there may be other processes that are also +reading and writing that file + +0:48:28.029,0:48:29.910 +and therefore that + +0:48:29.910,0:48:31.579 +has to be done by the kernel + +0:48:31.579,0:48:33.120 +because they can coordinate + +0:48:33.120,0:48:37.189 +all the different processes that are trying to access +that file. + +0:48:37.189,0:48:40.529 +so the top half of the kernel is pretty straightforward +code + +0:48:40.529,0:48:45.539 +it looks a lot like any other library that +you would write if you look at top half kernel + +0:48:45.539,0:48:49.640 +code you know you see all read, come in +it's got these parameters we Mark around we + +0:48:49.640,0:48:53.719 +get some data that we put it in the buffer and +we return back + +0:48:53.719,0:48:57.470 +and in fact writing code for the top half of +the kernel is + +0:48:57.470,0:48:59.729 +not all that difficult to do + +0:48:59.729,0:49:00.989 +it's + +0:49:00.989,0:49:01.959 +you have + +0:49:01.959,0:49:05.939 +for many of the same properties that you would +when you're writing user level application + +0:49:05.939,0:49:07.529 +code + +0:49:07.529,0:49:11.779 +the bottom half of the kernel is where things +start to get nasty + +0:49:11.779,0:49:14.820 +because the bottom half of the kernel is the part +of the system + +0:49:14.820,0:49:18.769 +that deals with all of the asynchronous events +in the system + +0:49:18.769,0:49:22.179 +is things like device drivers, + +0:49:22.179,0:49:23.779 +timers + +0:49:23.779,0:49:25.010 +that level of thing + +0:49:25.010,0:49:28.029 +that are driven by hardware events + +0:49:28.029,0:49:28.659 +so + +0:49:28.659,0:49:31.459 +for example a packet arrives on the network + +0:49:31.459,0:49:33.670 +that causes an interrupt to come and + +0:49:33.670,0:49:36.729 +that will be handled by the bottom half of +the kernel + +0:49:36.729,0:49:38.829 +and historically + +0:49:38.829,0:49:43.079 +when an interrupt came in it preempted whatever +else was going on + +0:49:43.079,0:49:45.400 +and it ran until it finished and then it returned + +0:49:45.400,0:49:46.539 +and it could not + +0:49:46.539,0:49:49.439 +go to sleep to wait for resources or other +things + +0:49:49.439,0:49:51.339 +%uh in current systems + +0:49:51.339,0:49:54.549 +you can actually go to sleep in the interrupt driver +and waiting for + +0:49:54.549,0:49:56.739 +some other activity to complete + +0:49:56.739,0:49:58.259 +it is however + +0:49:58.259,0:50:00.799 +not a good idea to do that + +0:50:00.799,0:50:01.909 +because + +0:50:01.909,0:50:06.739 +the usual case of most device drivers is they +can finish whatever they're doing in an interrupt + +0:50:06.739,0:50:08.579 +without ever blocking + +0:50:08.579,0:50:09.580 +and so + +0:50:09.580,0:50:13.649 +when an interrupt comes in we assume that you're +not going to sleep + +0:50:13.649,0:50:14.710 +and if you actually + +0:50:14.710,0:50:17.219 +then go to sleep.oh man + +0:50:17.219,0:50:20.469 +you didn't tell us you're going to do this we +have to go off to do a whole lot of other work + +0:50:20.469,0:50:23.029 +that we had originally planned on doing + +0:50:23.029,0:50:25.460 +so if you go to sleep in a device driver + +0:50:25.460,0:50:28.209 +you are taking a very serious performance hit + +0:50:28.209,0:50:31.019 +so it's highly recommended that you don't +do that + +0:50:31.019,0:50:33.130 +but if you have to you can + +0:50:33.130,0:50:35.809 +on it's because of this historic behavior +or + +0:50:35.809,0:50:39.899 +of not being able to sleep in the bottom half +of the kernel + +0:50:39.899,0:50:42.119 +that you have certain properties that have + +0:50:42.119,0:50:44.769 +taken over in device drivers + +0:50:44.769,0:50:45.940 +and that is + +0:50:45.940,0:50:50.369 +that a device driver should be handed all +the resources it needs to get his job done + +0:50:50.369,0:50:54.490 +you don't give a disk device driver +Go read this + +0:50:54.490,0:50:56.549 +and put it somewhere + +0:50:56.549,0:50:57.580 +you have to say + +0:50:57.580,0:50:59.410 +Go read this particular block + +0:50:59.410,0:51:02.650 +here is a chunk of memory that I want that + data to put in + +0:51:02.650,0:51:03.959 +and + +0:51:03.959,0:51:06.169 +notify me when it's done + +0:51:06.169,0:51:06.970 +because + +0:51:06.970,0:51:10.660 +things like allocating memory are classic +places where you end up having to go to sleep + +0:51:10.660,0:51:12.939 +to wait for stuff to happen + +0:51:12.939,0:51:14.449 +and + +0:51:14.449,0:51:16.390 +historically you couldn't do that + +0:51:16.390,0:51:18.640 +even currently don't want to have to do that + +0:51:18.640,0:51:23.400 +so device drivers generally have all +resources pre allocated + +0:51:23.400,0:51:25.169 +and then they can just go + +0:51:25.169,0:51:27.279 +the one place where this doesn't work + +0:51:27.279,0:51:29.029 +is the network + +0:51:29.029,0:51:30.929 +and in particular + +0:51:30.929,0:51:34.630 +you don't know when somebody's going to send +packets to you + +0:51:34.630,0:51:37.040 +you say well you're looking to open connections + +0:51:37.040,0:51:39.360 +but if you're doing something like IP forwarding + +0:51:39.360,0:51:40.969 +there's no + +0:51:40.969,0:51:45.039 +top half state it's dealing with this packets +they're just coming in on one interface being + +0:51:45.039,0:51:46.719 +sent out on another interface + +0:51:46.719,0:51:50.630 +they never pass through any part of the top +half of the kernel + +0:51:50.630,0:51:53.529 +and so in the case of network device drivers + +0:51:53.529,0:51:56.149 +they need to allocate memory + +0:51:56.149,0:51:56.640 +and + +0:51:56.640,0:51:58.829 +if memory gets into short supply + +0:51:58.829,0:52:01.689 +and they try to allocate memory and it's not +available + +0:52:01.689,0:52:05.049 +they historically couldn't wait for memory to be +available + +0:52:05.049,0:52:08.380 +and even in practice today don't wait + + +0:52:08.380,0:52:09.580 +for memory to become available + +0:52:09.580,0:52:12.469 +they simply drop the packet on the floor + +0:52:12.469,0:52:18.109 +it's like well I didn't have any place to +put it sorry oops + +0:52:18.109,0:52:20.940 +now that doesn't cause incorrect behavior + +0:52:20.940,0:52:24.369 +because the higher level protocols will retransmit + +0:52:24.369,0:52:29.140 +but it does cause great performance problems +because retransmission means that connections + +0:52:29.140,0:52:29.879 +stall + +0:52:29.879,0:52:31.110 +they have to back up + +0:52:31.110,0:52:33.010 +they have to resend data + +0:52:33.010,0:52:33.739 +and so on + +0:52:33.739,0:52:38.739 +so you really want to avoid dropping packets +if you can possibly help it + +0:52:38.739,0:52:42.029 +and consequently + +0:52:42.029,0:52:43.420 +we tend to + +0:52:43.420,0:52:46.499 +pre allocate a certain amount of memory for +the network drivers + +0:52:46.499,0:52:48.299 +and + +0:52:48.299,0:52:52.169 +we try very hard to make sure that we're not +going to run out of memory but + +0:52:52.169,0:52:54.869 +if packets come fast enough and we can't deal +with them + +0:52:54.869,0:52:57.940 +as quickly as they are arriving then over short period +of time + +0:52:57.940,0:53:03.489 +we get to the point where we simply have to start +dropping packets + +0:53:03.489,0:53:07.649 +okay this is a part of kernel that you do not wish to +write code for + +0:53:07.649,0:53:10.919 +because it is extremely difficult to +debug + +0:53:10.919,0:53:12.759 +you get these bugs where + +0:53:12.759,0:53:18.779 +the only time it happens is on the third Tuesday +when there's a full moon + +0:53:18.779,0:53:19.300 +and + +0:53:19.300,0:53:24.199 +we have a disk interrupt followed by %uh a +terminal character coming in + +0:53:24.199,0:53:28.289 +and the network packet arriving of size fifteen +twenty two + +0:53:28.289,0:53:30.109 +and when all those things happened + +0:53:30.109,0:53:32.719 +the system panics + +0:53:32.719,0:53:37.380 +and of course there's like it panics +cause you're following some bad pointer + +0:53:37.380,0:53:40.969 +something that should have been there +but was freed some time in the distant past + +0:53:40.969,0:53:42.930 +we are not sure when + +0:53:42.930,0:53:44.049 +and + +0:53:44.049,0:53:47.400 +try to debug things like that is extremely +difficult + +0:53:47.400,0:53:48.509 +and you can + +0:53:48.509,0:53:52.120 +think well I think I found the problem but +it's not reproduceable + +0:53:52.120,0:53:55.530 +you know you have to wait for the next third +Tuesday with a full moon and blah blah blah + +0:53:55.530,0:53:56.950 +to happen + +0:53:56.950,0:53:57.469 +and + +0:53:57.469,0:54:01.449 +you know so you sort of statistically +guess that you fix that you know I was getting + +0:54:01.449,0:54:03.510 +this bug once every three days + +0:54:03.510,0:54:06.099 +and now it's gone for two weeks without happening + +0:54:06.099,0:54:07.239 +did you fix that? + +0:54:07.239,0:54:08.969 +or if you've been lucky + +0:54:08.969,0:54:10.459 +and and it's + +0:54:10.459,0:54:14.349 +that coupled with the fact that you're +dealing with hardware + +0:54:14.349,0:54:18.049 +and hardware rarely works the way it's documented +to work + +0:54:18.049,0:54:21.770 +and so you know they're doing everything that +it says you're supposed to do + +0:54:21.770,0:54:26.260 +it still doesn't work because you didn't set +the fiddle bit over on that other place over + +0:54:26.260,0:54:26.660 +there + +0:54:26.660,0:54:30.479 +that's not documented anywhere but if it's +not said it doesn't work + +0:54:30.479,0:54:33.769 +occasionally + +0:54:33.769,0:54:36.110 +so this is another reason that you really want +of avoid + +0:54:36.110,0:54:40.459 +dealing with this part of the system if +you can possibly help + +0:54:40.459,0:54:44.369 +okay but lets go through and and look at some +of the properties here starting up at + +0:54:44.369,0:54:45.789 +the user process + +0:54:45.789,0:54:47.980 +we're running with + +0:54:47.980,0:54:51.449 +preemptive scheduling + +0:54:51.449,0:54:53.409 +now there's several caveats here + +0:54:53.409,0:54:55.239 +preemptive scheduling is the default + +0:54:55.239,0:54:56.970 +so called shared scheduler + +0:54:56.970,0:55:01.360 +that is what you normally use there are other +schedulers like the real time scheduler + +0:55:01.360,0:55:02.869 +where what I'm saying isn't that true + +0:55:02.869,0:55:05.709 +we'll talk about some of the schedulers was +later + +0:55:05.709,0:55:09.930 +but the usual scheduler that you're running +on under UNIX is a shared scheduler + +0:55:09.930,0:55:13.229 +and under the shared scheduler user applications + +0:55:13.229,0:55:15.159 +run with pre emptive scheduling + +0:55:15.159,0:55:17.449 +and pre emptive scheduling means that + +0:55:17.449,0:55:20.019 +you run at the whim of the system + +0:55:20.019,0:55:21.420 +if it wants you to run + +0:55:21.420,0:55:22.140 +you run + +0:55:22.140,0:55:25.490 +once you to start running you have no guarantee +of how long you're going to run + +0:55:25.490,0:55:29.370 +it might like to run for three instructions +and then decide it doesn't like you many more + +0:55:29.370,0:55:31.150 +it wants to run something else + +0:55:31.150,0:55:35.920 +or you might get to run for several seconds +and in a row with the with no intervening + +0:55:35.920,0:55:37.469 +things interrupting you + +0:55:37.469,0:55:39.719 +you just don't know + +0:55:39.719,0:55:40.969 +and + +0:55:40.969,0:55:42.839 +really all you know is + +0:55:42.839,0:55:43.569 +that + +0:55:43.569,0:55:48.239 +they claim that they're using statistics +and that and that the statistics are fair + +0:55:48.239,0:55:55.059 +and so on average you're going to get a reasonable +amount of time but thats + +0:55:55.059,0:55:57.129 +up to the system you don't control that + +0:55:57.129,0:55:58.439 +the real point here + +0:55:58.439,0:56:01.940 +is that you don't have any way of creating +a critical section + +0:56:01.940,0:56:04.950 +you can't say okay I don't want to be interrupted + +0:56:04.950,0:56:07.429 +during this particular sequence of things + +0:56:07.429,0:56:09.809 +so you have to program + +0:56:09.809,0:56:13.469 +assuming that you may be interrupted at any +point + +0:56:13.469,0:56:14.979 +okay + +0:56:14.979,0:56:18.909 +the next thing is that when you're running +in a user process + +0:56:18.909,0:56:20.719 +you are running in + +0:56:20.719,0:56:24.150 +with the processor in what's called unprivileged +mode + +0:56:24.150,0:56:28.109 +one of the requirements for running any kind +of a UNIX system + +0:56:28.109,0:56:31.759 +is that you have to have a processor that +support privileged and unprivileged + +0:56:31.759,0:56:33.709 +two different modes of operation + +0:56:33.709,0:56:37.049 +in privileged mode which is what the kernel +runs in + +0:56:37.049,0:56:38.950 +the entire repertoire + +0:56:38.950,0:56:40.869 +of the hardware is available + +0:56:40.869,0:56:45.339 +by this I mean you can set all the registers +you can fiddle with the memory management + +0:56:45.339,0:56:47.460 +unit you can initiate I/O + +0:56:47.460,0:56:50.519 +you can access any memory anywhere + +0:56:50.519,0:56:51.919 +etc + +0:56:51.919,0:56:56.540 +when you're running in unprivileged +mode which is what user processes run in and + +0:56:56.540,0:57:00.709 +this a large subset of the instructions which +you cannot execute + +0:57:00.709,0:57:03.480 +you cannot initiate I/O on + +0:57:03.480,0:57:04.209 +devices + +0:57:04.209,0:57:06.770 +you cannot change the memory mapping + +0:57:06.770,0:57:10.209 +you cannot access memory that's not part of +your address space + +0:57:10.209,0:57:13.299 +you cannot execute certain instructions +like halt + +0:57:13.299,0:57:15.589 +and + +0:57:15.589,0:57:19.039 +so in general you are protected + +0:57:19.039,0:57:21.789 +from manipulating anything that's outside of your +address space + +0:57:21.789,0:57:23.759 +this of course is desirable because + +0:57:23.759,0:57:27.059 +when you're running in this unprivileged +mode + +0:57:27.059,0:57:28.300 +you're protected + +0:57:28.300,0:57:31.910 +from other processes manipulating you +and they're protected from you manipulating + +0:57:31.910,0:57:33.079 +them + +0:57:33.079,0:57:36.430 +for those of you that have had that misfortune +to have to use + +0:57:36.430,0:57:39.339 +early versions of windows up to about ninety +eight + +0:57:39.339,0:57:42.470 +they always ran with the processor +running in privileged mode + +0:57:42.470,0:57:44.009 +even in applications + +0:57:44.009,0:57:46.459 +and so either maliciously or accidentally + +0:57:46.459,0:57:50.000 +you could stop on other people address space +or you could stop on the kernel + +0:57:50.000,0:57:53.020 +and a lot of the blue screen of death was +people just + +0:57:53.020,0:57:56.319 +following wild pointers and trashing different +parts of the system + +0:57:56.319,0:57:58.819 +taking everything down + +0:57:58.819,0:58:00.020 +it also makes it + +0:58:00.020,0:58:02.320 +far easier to + +0:58:02.320,0:58:05.459 +implement things like viruses and worms and +other things because + +0:58:05.459,0:58:09.619 +a user application can we rewrite the boot +block on the disk they can just the write down + +0:58:09.619,0:58:13.109 +and manipulate the registers that allow them +to do whatever they want + +0:58:13.109,0:58:16.730 +whereas when you're running in unprivileged +mode you can't write those kinds of + +0:58:16.730,0:58:20.179 +of things + +0:58:20.179,0:58:24.119 +so modern versions of Windows anything from about +2000 on + +0:58:24.119,0:58:26.630 +now run with privileged and unprivileged mode + +0:58:26.630,0:58:28.649 +but UNIX has always required that + +0:58:28.649,0:58:30.219 +and so when you're running an + +0:58:30.219,0:58:31.319 + user process + +0:58:31.319,0:58:33.389 +you cannot block I mean + +0:58:33.389,0:58:37.969 +you cannot execute the instructions which +cause a context switching to occur + +0:58:37.969,0:58:40.349 +you can't pick what's going to run next + +0:58:40.349,0:58:43.140 +you can't make that thing run next all you can +do + +0:58:43.140,0:58:45.189 +is go to the operating system and say + +0:58:45.189,0:58:49.269 +hey I've got nothing to do. pick somebody else +to run + +0:58:49.269,0:58:53.449 +and the operating system is the think they can +then execute the instructions which cause + +0:58:53.449,0:58:57.609 +a different process to be loaded + +0:58:57.609,0:58:59.049 +and run + +0:58:59.049,0:59:03.400 +alright.finally while you're in a user application you're +running on a user stack + +0:59:03.400,0:59:06.410 +that's part of the user's address space + +0:59:06.410,0:59:07.889 +so + +0:59:07.889,0:59:10.819 +part of creating a process gives you a runtime +stack + +0:59:10.819,0:59:14.369 +as part of a virtual address space and so it +can be + +0:59:14.369,0:59:18.199 +more or less up to the limits of the hardware +as big as you want it to be + +0:59:18.199,0:59:19.949 +so if you are running on thirty two-bit processor + +0:59:19.949,0:59:22.819 +you're stack can get the 2 gigabytes + +0:59:22.819,0:59:23.319 +and + +0:59:23.319,0:59:26.839 +the what this means is that anytime you +allocate local variables + +0:59:26.839,0:59:28.529 +you don't have to worry about Oh + +0:59:28.529,0:59:30.609 +is that gonna overrun my stack? + +0:59:30.609,0:59:31.610 +so if you need + +0:59:31.610,0:59:35.519 +a hundred thousand double precision floating +point numbers + +0:59:35.519,0:59:37.189 +you can just as a local variable allocate + +0:59:37.189,0:59:40.269 +an array of size a hundred-thousand type +double + +0:59:40.269,0:59:44.029 +and it just decrements your stack pointer by +hundred hundred thousand bytes + +0:59:44.029,0:59:45.009 +away you go + +0:59:45.009,0:59:47.299 +it's just virtual address space + +0:59:47.299,0:59:49.020 +as you'll see when we get into the kernel + +0:59:49.020,0:59:50.210 +that ceases to be the case |