0:00:09.469,0:00:11.309
Hello my name is Marshall Kirk McKusick
0:00:11.309,0:00:15.389
and I've been around as long as dinosaurs
and mainframes have ruled the world
0:00:15.389,0:00:18.429
which is to say the sixties and seventies
0:00:18.429,0:00:22.460
however by 1970s a new breed of mammals had begun to show up
on the scene
0:00:22.460,0:00:24.240
known as mini computers
0:00:24.240,0:00:28.230
although they were just toys in the 1970s they would soon grow
0:00:28.230,0:00:31.689
and take over most of the computing market
0:00:31.689,0:00:33.150
In 1970
0:00:33.150,0:00:37.910
at AT&T Bell laboratories two researchers Ken
Thompson and Dennis Ritchie began developing the
0:00:37.910,0:00:39.900
UNIX operating system
0:00:39.900,0:00:42.040
Ken Thompson who had been an alumnus at Berkeley
0:00:42.040,0:00:46.100
came back on a sabbatical in 1975 bringing UNIX
with him
0:00:46.100,0:00:47.539
In the year that he was there
0:00:47.539,0:00:51.330
he managed to get a number of graduate students interested
in UNIX
0:00:51.330,0:00:53.940
and by the time he left in 1976
0:00:53.940,0:00:56.829
Bill Joy has taken over in running the UNIX system
0:00:56.829,0:01:00.470
and in fact continuing to develop software for it.
0:01:00.470,0:01:04.339
Bill began packaging up the software that had
been developed under Berkeley UNIX and
0:01:04.339,0:01:05.779
and distributing it
0:01:05.779,0:01:08.040
as the Berkeley Software Distributions
0:01:08.040,0:01:12.310
whose name was quickly shortened to simply BSD
0:01:12.310,0:01:16.330
BSD continued to be distributed with
yearly distributions for almost fifteen
0:01:16.330,0:01:17.490
years
0:01:17.490,0:01:21.920
initially under Bill Joy and later under others including
yours truly.
0:01:21.920,0:01:24.860
By the late 1980s interest had began to grow
0:01:24.860,0:01:27.400
in freely redistributable software
0:01:27.400,0:01:30.170
so a number of us at Berkeley began separating
out
0:01:30.170,0:01:32.649
the AT&T proprietary bits of BSD
0:01:32.649,0:01:35.710
from those parts that were freely redistributable.
0:01:35.710,0:01:40.590
By the time of the final distribution at BSD
in 1992
0:01:40.590,0:01:43.620
the entire distribution was freely redistributable.
0:01:43.620,0:01:45.909
I live in a capsule history here
0:01:45.909,0:01:48.009
but if you're interested in the entire story
0:01:48.009,0:01:50.789
I have this three-and-an-half hour epic
0:01:50.789,0:01:54.590
which is available from my website www.mckusick.com
0:01:54.590,0:01:58.200
that gives the entire history of Berkeley.
0:01:58.200,0:02:00.239
Following the final distribution from Berkeley
0:02:00.239,0:02:01.450
two groups sprung up
0:02:01.450,0:02:03.600
to continue supporting BSD
0:02:03.600,0:02:08.080
the first of this was the NetBSD whose primary
goal was to support
0:02:08.080,0:02:10.459
as many different architectures as possible
0:02:10.459,0:02:14.769
everything from your microwave oven all way
upto your cray XMP
0:02:14.769,0:02:19.409
In fact today NetBSD supports nearly
sixty architectures.
0:02:19.409,0:02:22.419
The other group that sprang up was FreeBSD.
0:02:22.419,0:02:28.239
Their goal was to bring up BSD and support
as wide a set of devices as possible on the
0:02:28.239,0:02:29.719
PC architecture.
0:02:29.719,0:02:36.549
They also had a goal of trying to make the
system as easy to install as possible to
0:02:36.549,0:02:39.309
attract by a wide group of developers
0:02:39.309,0:02:42.319
I chose to work primarily with the FreeBSD
group
0:02:42.319,0:02:43.740
both doing software
0:02:43.740,0:02:46.140
and also together with George Neville Neil
0:02:46.140,0:02:51.069
writing this book ""The Design and Implementation
of the FreeBSD Operating System"".
0:02:51.069,0:02:52.060
Together with this book
0:02:52.060,0:02:53.959
I developed a course
0:02:53.959,0:02:56.500
which runs for twelve chapters
0:02:56.500,0:02:58.179
and thirty hours.
0:02:58.179,0:02:59.749
The purpose of this video
0:02:59.749,0:03:01.089
is to give you a taste
0:03:01.089,0:03:02.819
of that course.
0:03:02.819,0:03:07.249
What follows are excerpts from the first lecture
of the course
0:03:07.249,0:03:11.139
which of course you can also get from my website
www.mckusick.com.
0:03:11.139,0:03:13.069
0:03:13.069,0:03:17.739
Enjoy.
0:03:17.739,0:03:22.239
This class is nominally about FreeBSD
because well
0:03:22.239,0:03:26.379
that's what I know best and that's what
the textbook is organized around
0:03:26.379,0:03:29.979
but the fact of the matter is that it's really
0:03:29.979,0:03:32.339
a class about your UNIX and that
0:03:32.339,0:03:36.539
really covers sort of the broad range of things
in the open source arena as its FreeBSD
0:03:36.539,0:03:37.689
in Linux
0:03:37.689,0:03:38.899
which of course
0:03:38.899,0:03:41.159
you use a lot out
0:03:41.159,0:03:41.550
and
0:03:41.550,0:03:44.349
it also covers a commercial systems
0:03:44.349,0:03:46.950
%uh Solaris, HP-UX,
0:03:46.950,0:03:49.279
AIX and so on.
0:03:49.279,0:03:52.419
I am going to tend more towards the open
side
0:03:52.419,0:03:56.389
open source side of things.So it's really
going to be more FreeBSD in Linux than it's
0:03:56.389,0:03:57.579
going to be
0:03:57.579,0:04:00.849
Solaris and HP-UX and so on.
0:04:00.849,0:04:06.959
For the most part at the level of this course
we're dealing with the interfaces to the system
0:04:06.959,0:04:07.329
and
0:04:07.329,0:04:11.599
the fact that the matter is a those interfaces are highly
standardized at this point
0:04:11.599,0:04:12.060
and
0:04:12.060,0:04:15.280
whether it's FreeBSD or Linux or Solaris
or whatever
0:04:15.280,0:04:19.460
the Socket system call has to do the same
thing, it has to have the same arguments
0:04:19.460,0:04:20.150
in that,
0:04:20.150,0:04:23.909
it has to have the same effect
0:04:23.909,0:04:27.319
and so until you get down to the really nitty
details
0:04:27.319,0:04:29.600
of how they actually go about implementing
that
0:04:29.600,0:04:31.960
the differences are relatively minor.
0:04:31.960,0:04:35.830
So I would say that sixty to seventy percent
of the material that I'm covering
0:04:35.830,0:04:40.779
is just as true for FreeBSD as it would
be for Linux
0:04:40.779,0:04:42.580
or for Solaris
0:04:42.580,0:04:44.659
%uh AIX is a little bit
0:04:44.659,0:04:45.629
sort of off in the weeds
0:04:45.629,0:04:48.709
%uh as is HP-UX
0:04:48.709,0:04:51.099
but luckily we don't have to worry too much about
that.
0:04:51.099,0:04:54.569
Okay so
0:04:54.569,0:04:59.279
the other thing is that I'm going to assume that
all of you have used the system. I get
0:04:59.279,0:05:00.910
really sort of worried when people
0:05:00.910,0:05:04.249
you know raise the hands and ""Hey, what's a Shell?""
0:05:04.249,0:05:07.990
or I don't
put a lot of code up but a one piece of code and someone said ""Why
0:05:07.990,0:05:11.819
are there two pipe symbols in the middle of
that that If statement?"".
0:05:11.819,0:05:15.740
No we're not programming the Shell we're programming
in C.
0:05:15.740,0:05:19.970
So hopefully you can tell the difference between
Shell scripts and C code.
0:05:19.970,0:05:21.990
so okay but I am but am gonna assume
0:05:21.990,0:05:24.610
you haven't really looked inside the system.
0:05:24.610,0:05:28.289
So I gonna start everything to at a very
high level.
0:05:28.289,0:05:32.969
The problem is I have already discovered you come
from a lot of different sort of
0:05:32.969,0:05:33.819
backgrounds
0:05:33.819,0:05:35.180
and
0:05:35.180,0:05:36.280
levels of knowledge
0:05:36.280,0:05:37.900
and so
0:05:37.900,0:05:42.620
the way that I find works best to sort of
be useful to everybody is that three pass
0:05:42.620,0:05:43.860
algorithm
0:05:43.860,0:05:49.060
so what I will do is start the first pass a very
broad brush high level
0:05:49.060,0:05:50.569
description of what's going on
0:05:50.569,0:05:54.719
and then I will go back and i'll go through the
same material again but at a lower level of
0:05:54.719,0:05:55.300
detail
0:05:55.300,0:05:59.939
then i finally go back and go through a very nittily
low-level of detail
0:05:59.939,0:06:04.649
and the fact of this is if you are learning new stuff
as I'm doing the high-level thing
0:06:04.649,0:06:08.649
you are gonna be utterly washed by the time I get to
low level niggly details
0:06:08.649,0:06:10.699
but since I'm going to do it topic by topic
0:06:10.699,0:06:14.190
when I get to the end of one of those nearly
low level niggly details
0:06:14.190,0:06:17.900
i'll give you a clue as i will say ""Brain
reset, I'm starting a new topic"" so even if
0:06:17.900,0:06:19.330
you're completely lost
0:06:19.330,0:06:23.530
you can now start listening again plus I'm gonna get
the broad brush up again.
0:06:23.530,0:06:27.059
okay and for those of you that know a lot of
this stuff already
0:06:27.059,0:06:31.770
you'll probably find the broad brush rather boring
0:06:31.770,0:06:35.759
but by the time we get down to nearly low level
details I think you'll actually
0:06:35.759,0:06:37.860
pick up some things that you will find
0:06:37.860,0:06:39.710
useful and interesting.
0:06:39.710,0:06:43.759
So in this way hopefully everybody will
get some
0:06:43.759,0:06:47.699
useful percentage of material out of the course.
0:06:47.699,0:06:49.599
I am gonna start out by just
0:06:49.599,0:06:53.089
walking through and giving you the
0:06:53.089,0:06:56.919
outline of what we're going to try and do here
here
0:06:56.919,0:07:01.169
As i said we're going to go roughly
0:07:01.169,0:07:03.270
just about two-and-an-half hours of lecture
0:07:03.270,0:07:04.729
about two hours forty minutes
0:07:04.729,0:07:06.499
per week
0:07:06.499,0:07:07.619
and
0:07:07.619,0:07:11.770
so we will start off this week with an introduction.
0:07:11.770,0:07:13.860
This is as I said we're going to start from the
top
0:07:13.860,0:07:15.749
and then just start working our way down
0:07:15.749,0:07:19.350
so the general thing I'm going to do is
to talk about the interface
0:07:19.350,0:07:21.439
%uh which is something that you
0:07:21.439,0:07:25.319
are presumably fairly familiar with since
you've worked with that system
0:07:25.319,0:07:27.249
and then
0:07:27.249,0:07:29.739
you have to sort of layout terminology
0:07:29.739,0:07:32.080
although we use normal english words
0:07:32.080,0:07:34.419
they have
0:07:34.419,0:07:38.580
sometimes rather bizarre meanings compared to their
common usage
0:07:38.580,0:07:39.220
and
0:07:39.220,0:07:42.330
so I will just sort of lay out the terminology
lay out the
0:07:42.330,0:07:45.750
the way we talk about how the system is structured
0:07:45.750,0:07:50.780
and this week we will also talk about the
basic services ""What is it that the kernel is
0:07:50.780,0:07:52.929
providing for us?""
0:07:52.929,0:07:54.060
and then of course
0:07:54.060,0:07:58.499
we'll proceed to dive down in and and see how
that is done
0:07:58.499,0:07:59.970
so here in
0:07:59.970,0:08:01.400
Week number 2
0:08:01.400,0:08:05.450
we're gonna look at the system from the
perspective of
0:08:05.450,0:08:07.039
something that
0:08:07.039,0:08:08.720
manages processes.
0:08:08.720,0:08:12.170
One way of looking at the kernel is it's really
just a
0:08:12.170,0:08:16.440
the resource manager and the resource that
its managing are things going to do with processes
0:08:16.440,0:08:19.460
So we'll look at a process, what the structure of
it is
0:08:19.460,0:08:20.649
and
0:08:20.649,0:08:23.559
talk about the different ways that they can
be structured.
0:08:23.559,0:08:28.379
Process can for example be an address space
and can have one thread running in it can have
0:08:28.379,0:08:29.749
multiple threads running in it.
0:08:29.749,0:08:34.620
so we'll talk about the different ways
that we think a process is.
0:08:34.620,0:08:38.480
We will look at the management of those processes
0:08:38.480,0:08:39.239
we've got
0:08:39.239,0:08:42.020
to lay out the bits and pieces that
need to be managed
0:08:42.020,0:08:44.660
and then talk about
0:08:44.660,0:08:47.190
how we do that.
0:08:47.190,0:08:51.740
we'll talk about jails.. this is something
that you currently find only in FreeBSD
0:08:51.740,0:08:55.060
hasn't made it into
0:08:55.060,0:08:56.320
Linux yet although
0:08:56.320,0:09:01.630
the concept is being actively worked
on so my guess is that you'll see that
0:09:01.630,0:09:03.500
fairly soon.
0:09:03.500,0:09:06.360
we'll also then talk about scheduling
0:09:06.360,0:09:10.579
which is in essence how we decide what gets
to run, when it gets to run, how long it gets
0:09:10.579,0:09:13.500
to run, etc.
0:09:13.500,0:09:14.330
okay
0:09:14.330,0:09:19.020
The week after that we will go into virtual
memory.
0:09:19.020,0:09:23.800
Signals aren't really part of virtual memory
but they didn't fit into next week's
0:09:23.800,0:09:26.400
material so I just would dropped that at the
beginning
0:09:26.400,0:09:29.850
but the bulk of Week 3 is going to
be
0:09:29.850,0:09:32.019
the management of Virtual Memory. So we've got
0:09:32.019,0:09:35.119
a bunch of physical memory, a bunch of
processes that are
0:09:35.119,0:09:37.940
trying to use their address spaces
0:09:37.940,0:09:39.590
and we will talk about
0:09:39.590,0:09:41.410
essentially how you will make that all work
0:09:41.410,0:09:43.510
It's called a virtual memory because it's
0:09:43.510,0:09:47.420
sort of a cheat. We promise you the world and
then we deliver you
0:09:47.420,0:09:51.480
as small number of pages as we think we
can get away with.
0:09:51.480,0:09:56.420
Okay. So the first three weeks then essentially
get us through
0:09:56.420,0:09:58.340
looking at the world as if it was all
0:09:58.340,0:10:00.560
all about processes.
0:10:00.560,0:10:03.880
Then in Week 4 we change gears. we say
okay well you know
0:10:03.880,0:10:07.570
the kernel isn't just all about processes. You can sort of
look at it orthogonally and you can
0:10:07.570,0:10:10.000
say it's really just a giant I/O switch
0:10:10.000,0:10:12.910
it's just like a traffic cop that's just managing
these
0:10:12.910,0:10:14.860
I/O streams
0:10:14.860,0:10:15.450
and
0:10:15.450,0:10:18.610
so let's look at it from that perspective.
0:10:18.610,0:10:19.310
And
0:10:19.310,0:10:24.740
we'll start with special files, again this
sort of the interface when you talk about UNIX
0:10:24.740,0:10:25.880
systems, when you talk about
0:10:25.880,0:10:27.950
what's normally /dev
0:10:27.950,0:10:34.170
interface that gets you access
to the various I/O streams that are available
0:10:34.170,0:10:37.220
and we'll look at how that's organized and
the structure of it
0:10:37.220,0:10:41.840
which used to be fairly simple but in the
last decade has gotten
0:10:41.840,0:10:43.670
incredibly complicated.
0:10:43.670,0:10:48.540
We will also talk about pseudo terminals in
job control
0:10:48.540,0:10:53.330
this is about as interesting as watching the
grass grow but unfortunately it's
0:10:53.330,0:10:55.490
a major component of the system
0:10:55.490,0:10:59.520
and especially people that deal with system
administration have to know far more about
0:10:59.520,0:11:06.520
this than they probably ever thought they
wanted to.
0:11:06.900,0:11:11.430
Okay we will then continue in Week 5 with
the kernel I/O structure,
0:11:11.430,0:11:16.090
We will start with multiplexing of I/O. The
kernel of course has done this
0:11:16.090,0:11:17.360
always
0:11:17.360,0:11:22.110
but we're really talking more about how do
we export I/O multiplexing to
0:11:22.110,0:11:25.970
user applications.
0:11:25.970,0:11:29.250
We will then move into auto configuration strategy
0:11:29.250,0:11:31.370
Auto configuration
0:11:31.370,0:11:32.770
is what happens
0:11:32.770,0:11:36.619
typically or historically I guess you
could say as the system boots.
0:11:36.619,0:11:39.500
so all that stuff that comes out about
0:11:39.500,0:11:40.810
what
0:11:40.810,0:11:43.550
hardwares are on the machine and how it's all
interconnected
0:11:43.550,0:11:47.350
all of that is tied up in auto configuration
0:11:47.350,0:11:50.040
and that used to happen just once it boots
0:11:50.040,0:11:52.000
but in modern systems today
0:11:52.000,0:11:55.839
it's an ongoing process. It happens at boot
but it also happens
0:11:55.839,0:12:00.550
anytime you plug a new I/O device, a
PCMCIA card,
0:12:00.550,0:12:03.680
or you remove a disk or you put in a new disk.
0:12:03.680,0:12:07.010
or any sort of activity that changes the I/O
0:12:07.010,0:12:08.360
structure of the machine
0:12:08.360,0:12:10.870
auto configuration has to get fired back up
0:12:10.870,0:12:13.050
and figure out what's disappeared
0:12:13.050,0:12:18.330
and cleanup and figure out what new has arrived
to configure it in.
0:12:18.330,0:12:19.320
and then we'll talk
0:12:19.320,0:12:23.870
a little bit about the configuration of the
device driver
0:12:23.870,0:12:27.390
this actually gets into an area that
0:12:27.390,0:12:28.660
is
0:12:28.660,0:12:33.440
one well let me just give it as a bit
of advice to the class esspecially those of
0:12:33.440,0:12:36.780
you who work in system administration.
0:12:36.780,0:12:42.010
You really want to be careful that
you don't learn too much about device drivers
0:12:42.010,0:12:44.670
because there is really these three things that
0:12:44.670,0:12:48.580
it's not good to learn about and if you do
learn about it it's really good to keep it
0:12:48.580,0:12:49.740
to yourself
0:12:49.740,0:12:51.949
because if you become an expert or
0:12:51.949,0:12:54.960
viewed as an expert in any of these areas
0:12:54.960,0:12:59.370
you will become the designated stuccy for
that and your site you'll never get to do
0:12:59.370,0:13:01.760
anything
0:13:01.760,0:13:02.610
but that
0:13:02.610,0:13:07.360
so The three things that I highly
recommend not learning very much about are
0:13:07.360,0:13:09.060
device drivers,
0:13:09.060,0:13:12.320
send mail configuration files
0:13:12.320,0:13:13.970
or anything having to do
0:13:13.970,0:13:19.350
with LDAP or anything in
that general domain
0:13:19.350,0:13:22.660
because as I say
0:13:22.660,0:13:24.900
that will become your life's work
0:13:24.900,0:13:25.920
and
0:13:25.920,0:13:32.920
there's other things that you might find more interesting.
""Do you have a question?""
0:13:33.870,0:13:36.659
so one of my students empathizes with my point
0:13:36.659,0:13:39.640
I believe you said you worked on that mail
system
0:13:39.640,0:13:43.120
so you you might know something about
Sendmail configuration files but you don't
0:13:43.120,0:13:47.850
have to answer that
0:13:47.850,0:13:52.100
okay so we're going to talk about what a device
driver does and really just sort of the entry
0:13:52.100,0:13:53.170
points to it
0:13:53.170,0:13:57.180
but we're not going to talk about how you
write such a thing, how you debug such a thing
0:13:57.180,0:14:01.490
or much of anything about it. I actually used
to teach an entire class believe it or not
0:14:01.490,0:14:02.720
about device drivers
0:14:02.720,0:14:05.849
but then I realized the error of my ways and I have
since
0:14:05.849,0:14:12.580
gone through and made a point of forgetting
every slide in that talk.
0:14:12.580,0:14:16.860
okay so then we will move on to File system
0:14:16.860,0:14:21.540
and as always we'll start at the high level
talk about the interface what is it that is
0:14:21.540,0:14:23.020
exported out of the system
0:14:23.020,0:14:27.840
and then we will start diving down in the C and
how do we go about implementing that
0:14:27.840,0:14:29.010
so
0:14:29.010,0:14:31.010
we'll start with the
0:14:31.010,0:14:32.560
so called
0:14:32.560,0:14:33.680
Block I/O system
0:14:33.680,0:14:36.140
it's historically been called buffer
cache
0:14:36.140,0:14:38.590
and you still hear it called that periodically
0:14:38.590,0:14:42.720
and the fact of the matter is that there isn't really
about buffer cache anymore, there is just one big
0:14:42.720,0:14:44.620
cache in it.Its the VM cache
0:14:44.620,0:14:47.810
and the Filesystem has a view into it
and
0:14:47.810,0:14:50.829
the processes have a view into it but at
the end of the day
0:14:50.829,0:14:54.660
you really don't want the same information
on two different
0:14:54.660,0:14:56.030
pages of memory
0:14:56.030,0:14:59.390
because that just leads to trouble.
0:14:59.390,0:15:03.390
But Filesystems think they have buffers and so
there's this manouver where we make
0:15:03.390,0:15:06.149
these things that look like what historically
were buffers
0:15:06.149,0:15:08.830
that really just map into VM system
0:15:08.830,0:15:11.720
but they're still managed in the way that
they have been
0:15:11.720,0:15:15.020
managed historically
0:15:15.020,0:15:20.670
okay We will then get down into Filesystem implementation
the local file system if you will
0:15:20.670,0:15:23.400
and into also
0:15:23.400,0:15:25.730
soft updates and snapshots.
0:15:25.730,0:15:26.440
this
0:15:26.440,0:15:31.100
for the time being is something that you see
only in FreeBSD
0:15:31.100,0:15:35.310
the alternative to soft updates is journalling
which is %uh more commonly used
0:15:35.310,0:15:39.630
for example what is used by ext3
0:15:39.630,0:15:41.179
and so i'll go through soft updates and
0:15:41.179,0:15:45.260
a lot of the issues in soft updates are the
same issues that you have to deal with journalling
0:15:45.260,0:15:48.370
what is it that we're protecting and how do we
go about doing that
0:15:48.370,0:15:51.150
and the difference is in the detail.
0:15:51.150,0:15:54.630
There is actually a paper in the back to your
notes if this is something that interests
0:15:54.630,0:15:55.240
you
0:15:55.240,0:15:59.930
it's a comparison of journalling versus
soft updates that was done
0:15:59.930,0:16:02.120
about five or eight years ago.
0:16:02.120,0:16:08.460
and not to spoil the punch line but the answers
they both work about are the same
0:16:08.460,0:16:12.500
Okay snapshots again is something that
if
0:16:12.500,0:16:15.920
you've worked with things like the network
appilance box you're probably quite
0:16:15.920,0:16:19.640
aware of what snapshots are and how they do
or don't work for you
0:16:19.640,0:16:21.959
this is the same functionality
0:16:21.959,0:16:27.380
in the Filesystem implemented in a
somewhat different way
0:16:27.380,0:16:28.449
okay so this
0:16:28.449,0:16:31.940
Week 6 is really going to be the local
file system
0:16:31.940,0:16:34.750
the disk connected to the machine
that we are dealing with.
0:16:34.750,0:16:39.140
Week 7 then we get into multiple
Filesystem support so how do we abstract out that
0:16:39.140,0:16:41.190
Filesystem layer
0:16:41.190,0:16:46.430
and support Multiple Filesystems at the
same time so for example in FreeBSD
0:16:46.430,0:16:50.199
you can of course run with their traditional
fast Filesystem
0:16:50.199,0:16:54.540
but if you happen to like the Linux Filesystem
better or you have to share a disk
0:16:54.540,0:16:55.690
with a Linux machine
0:16:55.690,0:16:58.310
you can run the ext2 or ext3
0:16:58.310,0:17:01.020
and it will perfectly happily do that
0:17:01.020,0:17:01.620
so
0:17:01.620,0:17:05.589
we will have to look then at how do we provide
interface so that we can plug in all these different
0:17:05.589,0:17:09.260
Filesystems that we want to support
0:17:09.260,0:17:12.250
another area of which there's been a great
0:17:12.250,0:17:15.309
deal of growth at least in code complexity
0:17:15.309,0:17:17.840
is so-called Volume Management
0:17:17.840,0:17:19.370
so in the
0:17:19.370,0:17:24.480
good old days a Filesystem lived on a disk or
piece of disk and that was that
0:17:24.480,0:17:26.130
but in this day and age
0:17:26.130,0:17:31.150
that won't do any more so we aggregate disks
together by striping them or RAID
0:17:31.150,0:17:31.980
arraying them
0:17:31.980,0:17:33.380
or various other things
0:17:33.380,0:17:39.210
and we need a whole layer in the system just to
manage those disks
0:17:39.210,0:17:44.280
we'll then get to the as an example of an alternative
Filesystem we're going to talk about the
0:17:44.280,0:17:46.530
Network Filesystem or NFS
0:17:46.530,0:17:48.500
but that's not because this is
0:17:48.500,0:17:51.090
the world's best remote file system
0:17:51.090,0:17:55.240
or the cleanest design or any of the
properties you might hope that
0:17:55.240,0:17:57.049
such a class as this one would have
0:17:57.049,0:17:58.600
but it's ubiquitous
0:17:58.600,0:18:00.210
very widely used
0:18:00.210,0:18:01.350
and
0:18:01.350,0:18:06.850
so we're going to talk about that one
0:18:06.850,0:18:07.740
okay we'll
0:18:07.740,0:18:10.970
then once again switch gears in Week 8
0:18:10.970,0:18:17.120
and turn our attention to of Networking and
Interprocess communication
0:18:17.120,0:18:18.200
and
0:18:18.200,0:18:23.210
again we'll start from the very top so we'll
go through, we'll go with concepts, the terminology
0:18:23.210,0:18:24.450
that gets used
0:18:24.450,0:18:30.230
and what's the difference between domain
based addressing and an address domain you know
0:18:30.230,0:18:30.910
we'll go through
0:18:30.910,0:18:34.910
what the basic IPC services are,
0:18:34.910,0:18:39.080
essentially what are all the system calls that
have anything to do with networking
0:18:39.080,0:18:40.590
and
0:18:40.590,0:18:43.720
just sort of describe what each of them are
and I'm going to go through
0:18:43.720,0:18:45.830
a somewhat contrived example
0:18:45.830,0:18:49.840
that makes use of every one of those interfaces
0:18:49.840,0:18:52.860
and just to sort of show how they all connect
together
0:18:52.860,0:18:54.169
and for those of you that work
0:18:54.169,0:18:57.400
in networking or had done any kind of network
programming
0:18:57.400,0:19:00.480
if you're looking for a week to miss and the
Week 8 is the one to miss that's 'cause that's
0:19:00.480,0:19:02.780
the sort of most basic
0:19:02.780,0:19:04.210
lecture that I'm going to give
0:19:04.210,0:19:07.910
If you are not sure whether or not you need to
go through that, there is
0:19:07.910,0:19:09.540
one of the papers in the back
0:19:09.540,0:19:12.620
it is an introduction to Interprocess communication
0:19:12.620,0:19:18.279
read that paper if you say yeah yeah yeah
yeah yeah you are done with Week 8.
0:19:18.279,0:19:20.590
on the other hand if you dont come to Week
8
0:19:20.590,0:19:22.790
and then in Week 9 I say
0:19:22.790,0:19:26.860
I call on you and say alright what is it
0:19:26.860,0:19:30.560
that listen system call does and you
can't tell me
0:19:30.560,0:19:32.610
you're gonna get a demerit
0:19:32.610,0:19:34.340
okay
0:19:34.340,0:19:37.770
then in Week 9 we will get into the actual
0:19:37.770,0:19:41.419
networking implementation itself, we go
through system layers as we did
0:19:41.419,0:19:43.310
in all the other areas
0:19:43.310,0:19:44.130
and
0:19:44.130,0:19:48.330
we will spend a significant portion of that
class talking about routing
0:19:48.330,0:19:50.230
routing
0:19:50.230,0:19:53.610
for those of you that haven't had the pleasure
of dealing with it
0:19:53.610,0:19:55.540
is a black art
0:19:55.540,0:19:58.050
or at least a dark science
0:19:58.050,0:19:59.170
and
0:19:59.170,0:19:59.930
so
0:19:59.930,0:20:02.490
we'll talk about it
0:20:02.490,0:20:06.270
from the perspective first of all of what
do we do locally within the machine
0:20:06.270,0:20:10.090
and then what are some of the bigger strategies
that we can use for doing routing
0:20:10.090,0:20:11.910
enterprise
0:20:11.910,0:20:14.840
wide routing or
0:20:14.840,0:20:20.190
area wide routing something like throughout the
state of California or throughout the US whatever
0:20:20.190,0:20:25.379
this again like device drivers is really
just sort of a nickel
0:20:25.379,0:20:26.480
tour through the
0:20:27.800,0:20:31.820
what the choices are what that the basic
strategies are that are used
0:20:31.820,0:20:33.989
If you're thinking you're going to walk out
of here
0:20:33.989,0:20:36.110
knowing how to set up a routing well sorry
0:20:36.110,0:20:38.430
we are not going to get that far
0:20:38.430,0:20:41.559
but you should at least have a pretty good idea
of what the issues are
0:20:41.559,0:20:44.430
and what the general solutions are
0:20:44.430,0:20:48.950
okay then finally in Week 10 well not finally
but next few weeks and
0:20:48.950,0:20:52.380
we will go through the Internet Protocols
0:20:52.380,0:20:54.320
primarily TCP/IP
0:20:54.320,0:20:56.560
and this is
0:20:56.560,0:20:58.809
what are the algorithms that are used
0:20:58.809,0:21:01.030
and I'm putting a particular emphasis
0:21:01.030,0:21:03.050
for this particular class
0:21:03.050,0:21:05.080
on
0:21:05.080,0:21:07.730
changes that have been made in the protocols
0:21:07.730,0:21:14.310
to deal with a lot of the sort of attacks that
we've been seeing the SYN attacks and
0:21:14.310,0:21:16.880
that sort of thing
0:21:16.880,0:21:19.440
rather than just a straight
0:21:19.440,0:21:22.440
iteration of what the the actual protocols
are
0:21:22.440,0:21:24.940
i'll talk primarily about IPv4
0:21:24.940,0:21:31.940
but I will also try and talk a bit about
IPv6 as well
0:21:33.510,0:21:35.850
all right so the first ten weeks are
0:21:35.850,0:21:38.100
sort of the kernel course
0:21:38.100,0:21:40.800
now we attack two weeks at the end
0:21:40.800,0:21:42.010
to talk about
0:21:42.010,0:21:43.990
sort of the bigger picture of
0:21:43.990,0:21:48.240
System Tuning,Crash dump analysis that level of
thing
0:21:48.240,0:21:52.940
The idea is to really consolidate what
we figured out or talked about in the first
0:21:52.940,0:21:54.710
ten weeks and
0:21:54.710,0:21:58.760
how that applies to tools that we have available
to us to
0:21:58.760,0:22:00.760
look at what the system is doing,
0:22:00.760,0:22:02.649
analyze what the system is doing
0:22:02.649,0:22:03.650
and hopefully
0:22:03.650,0:22:04.720
improve
0:22:04.720,0:22:07.130
the performance of what the system is doing
0:22:07.130,0:22:07.750
and
0:22:07.750,0:22:12.169
for the most part the kind of tuning that I'm
talking about is not
0:22:12.169,0:22:14.740
going in and hack hack hacking your kernel
0:22:14.740,0:22:16.510
because the fact that the matter is
0:22:16.510,0:22:18.600
most of the time you can't do that anyway
0:22:18.600,0:22:22.340
so it's more looking at it from the perspective
of saying
0:22:22.340,0:22:26.390
is this system running badly because it doesn't
have enough memory on it?
0:22:26.390,0:22:29.470
or is it running badly because there isn't enough
I/O capacity?
0:22:29.470,0:22:33.549
or is it running badly because it's got
enough I/O capacity but
0:22:33.549,0:22:35.940
certain drives are being overloaded
0:22:35.940,0:22:37.309
or is it
0:22:37.309,0:22:42.220
being overrun because we're simply trying
to do too much on this machine?,etc.
0:22:42.220,0:22:45.440
so that's the sort of level of thing that we're
looking at it
0:22:45.440,0:22:47.080
but tied into
0:22:47.080,0:22:52.130
lot of concepts that we talked before so we can talk
about active virtual memory
0:22:52.130,0:22:53.710
and what that means
0:22:53.710,0:22:55.120
and
0:22:55.120,0:22:58.750
essentially measure what it is and hopefully
then you will understand in the context of what
0:22:58.750,0:23:00.690
we talked about in the VM section
0:23:00.690,0:23:03.990
what that really means
0:23:03.990,0:23:07.460
the Crash dump analysis is one of these
topics that
0:23:07.460,0:23:08.730
you are gonna love or hate
0:23:08.730,0:23:12.530
you actually have to deal with crashed
dumps
0:23:12.530,0:23:13.679
its people find it invaluable
0:23:13.679,0:23:15.580
and if you don't have to deal with Crash dumps
0:23:15.580,0:23:18.790
it's an incredible mass of boring detail
0:23:18.790,0:23:23.240
the only good part of it is that that's the
whole session is only about an hour long
0:23:23.240,0:23:25.529
If it interests you, listen closely
0:23:25.529,0:23:28.950
and if it bores you, well, its only an hour long
0:23:28.950,0:23:32.880
okay lastly we'll talk a little bit about
security issues
0:23:32.880,0:23:36.250
again this is really more to the tools that
are available
0:23:36.250,0:23:40.750
to deal with security staff as opposed to a
complete tutorial on
0:23:40.750,0:23:45.120
how to implement security so those of you
that deal with security
0:23:45.120,0:23:48.400
this is just gonna to be sort of security one oh
one
0:23:48.400,0:23:50.029
for those of you
0:23:50.029,0:23:51.500
that have but
0:23:51.500,0:23:54.399
you'll have to deal with it but haven't really
thought about it
0:23:54.399,0:23:58.549
it'll probably scare you to death and
you wonder how to keep the machines from
0:23:58.549,0:24:02.840
being hijacked everyday
0:24:02.840,0:24:08.030
Okay so that's in essence what we're going
to try and do here
0:24:08.030,0:24:15.030
anybody have any comments, questions, thoughts.
No? All right well.
0:24:16.130,0:24:17.840
Let's get started
0:24:17.840,0:24:22.180
we will be begin on page fifteen with an
overview of the kernel.
0:24:22.180,0:24:26.040
Hopefully nobody's lost yet.
0:24:26.040,0:24:29.310
What's a kernel? All right.
0:24:29.310,0:24:31.370
so starting at the very top
0:24:31.370,0:24:33.070
the big broad brush
0:24:33.070,0:24:35.140
what we have is
0:24:35.140,0:24:38.330
a UNIX virtual machine and
0:24:38.330,0:24:41.660
virtual machines are actually something
that has been around
0:24:41.660,0:24:44.539
as a concept since the sixties
0:24:44.539,0:24:48.919
difference is really just sort of the level
of the interface that people have dealt with
0:24:48.919,0:24:51.360
when they talk about Virtual Machines
0:24:51.360,0:24:53.610
in the 1960s
0:24:53.610,0:24:56.770
computers were these enormous things you would
have
0:24:56.770,0:24:58.870
your computer room would be something that'd be
0:24:58.870,0:25:01.909
three times the size of this conference
room if you had
0:25:01.909,0:25:03.230
a computer
0:25:03.230,0:25:05.530
the computer itself was
0:25:05.530,0:25:07.840
tall as a refrigerator freezer
0:25:07.840,0:25:08.950
imagine
0:25:08.950,0:25:13.909
five or eight or ten of these units
side by side that itself made up the computer
0:25:13.909,0:25:16.080
that would be one big
0:25:16.080,0:25:20.030
for the core processor and the one which
should be the floating point unit and several
0:25:20.030,0:25:24.080
of them that would be the memory the core momory
literally the core memory
0:25:24.080,0:25:29.110
and then they'd be other rows of these
disk drives which were about the size of the washing
0:25:29.110,0:25:29.660
machine
0:25:29.660,0:25:34.169
and then behind that since you couldn't store
everything on disks so
0:25:34.169,0:25:36.300
then you had rows of tape drives
0:25:36.300,0:25:37.880
and then you had this little
0:25:37.880,0:25:39.610
set of sort of
0:25:39.610,0:25:43.330
munchkins that would run around and and tend
to the machine and they'd mount tapes and take
0:25:43.330,0:25:46.710
off tapes and mount disc packs and remove disc packs
because
0:25:46.710,0:25:49.760
the drives themselves were very expensive and
so
0:25:49.760,0:25:53.110
you wouldn't just as today we have a
0:25:53.110,0:25:56.090
one spindle that was dedicated just to one set
of platters
0:25:56.090,0:25:57.130
you could take out a
0:25:57.130,0:25:59.460
set of platters and put in another
0:25:59.460,0:26:02.540
hundred megabytes set of platters and these are
platters that are
0:26:02.540,0:26:05.280
this big around and it's like six or eight
of them and
0:26:05.280,0:26:09.140
giant head assemblies they comes rumbling in and
out
0:26:09.140,0:26:12.440
anyway one of these giant giant machines
0:26:12.440,0:26:17.380
that costs many millions of dollars would run
at about ten
0:26:17.380,0:26:21.120
million instructions per second, 10 mips
0:26:21.120,0:26:21.630
and 10 mips
0:26:21.630,0:26:28.330
was more computing power than anybody
could possibly imagine using in a single application
0:26:28.330,0:26:28.880
just
0:26:28.880,0:26:31.050
by contrast you know this
0:26:31.050,0:26:34.070
four-year-old laptop here is probably on
the order of
0:26:34.070,0:26:36.440
one or two hundred mips
0:26:36.440,0:26:37.140
but anyway
0:26:37.140,0:26:40.760
people couldn't really view what we would
do with a lot of computing power
0:26:40.760,0:26:44.640
and the other thing was that you didn't have
a notion of sort of an operating system that had
0:26:44.640,0:26:45.890
applications running on it
0:26:45.890,0:26:46.760
because
0:26:46.760,0:26:50.160
everybody wanted to write straight to
the raw hardware
0:26:50.160,0:26:51.750
and so
0:26:51.750,0:26:55.900
what IBM who was a big manufacturer
of machines in those days
0:26:55.900,0:26:59.060
did what they came up with this thing called
the VM
0:26:59.060,0:27:00.770
and this was a little
0:27:00.770,0:27:02.549
you'd call an operating system really
0:27:02.549,0:27:05.130
but what it did is it cloned
0:27:05.130,0:27:09.270
independent copies of the machine that worked just
like the original machines so you could boot
0:27:09.270,0:27:11.769
something that you thought it was an operating
system
0:27:11.769,0:27:13.380
on top of VM
0:27:13.380,0:27:16.750
so you take one least ten mip machines and
it would clone
0:27:16.750,0:27:20.050
six identical one mip copies
0:27:20.050,0:27:22.030
and then you could boot
0:27:22.030,0:27:24.700
whatever you wanted on each one of those machines
so
0:27:24.700,0:27:29.510
if you were doing database stuff you would boot your
database because database cannot ran on the raw hardware
0:27:29.510,0:27:32.920
or if you're doing payroll who would boot up the payroll
program
0:27:32.920,0:27:37.950
or if you actually tried to service
users you could boot a time sharing batch thing
0:27:37.950,0:27:40.790
that would read card images and print
stuff out
0:27:40.790,0:27:44.460
or they even had TSO the Time Sharing
Option where you could interactively sit
0:27:44.460,0:27:45.559
and type and send
0:27:45.559,0:27:47.560
stuffs in and get answers back
0:27:47.560,0:27:48.570
and
0:27:48.570,0:27:51.429
also you could boot TSO so whatever set
of
0:27:51.429,0:27:52.219
0:27:52.219,0:27:55.339
things you need you could boot them and they ran
independently as if they were running on their
0:27:55.339,0:27:56.470
own machine
0:27:56.470,0:28:03.150
but all the VM did was it give you an exact
raw copy of the hardware
0:28:03.150,0:28:04.529
so when UNIX came along
0:28:04.529,0:28:07.350
they sort of liked the notion of
0:28:07.350,0:28:11.509
providing the concept of independent
things that you could operate in
0:28:11.509,0:28:13.610
but they wanted it at a higher level
0:28:13.610,0:28:15.610
so you're looking really to do it
0:28:15.610,0:28:17.480
instead of at the raw hardware level
0:28:17.480,0:28:19.679
to do it at a process level
0:28:19.679,0:28:23.799
and the idea that then was that the interface you
would program to would be what we think of as
0:28:23.799,0:28:26.090
a System call interface today
0:28:26.090,0:28:27.849
and the idea then was that
0:28:27.849,0:28:30.740
you would be given a process or set of processes
0:28:30.740,0:28:34.990
and those were independent. your process
couldn't affect
0:28:34.990,0:28:38.830
the address space of another processor. You couldn't reach
over and mess around with their addresses,
0:28:38.830,0:28:41.030
you couldn't mess around with their I/O
channels
0:28:41.030,0:28:43.179
you could slow them down by
0:28:43.179,0:28:44.299
being a pig but
0:28:44.299,0:28:47.980
that was about the only way that you could affect
other processes
0:28:47.980,0:28:48.480
and
0:28:48.480,0:28:49.830
so
0:28:49.830,0:28:52.669
what the interfaces that they had there
0:28:52.669,0:28:58.660
was one that had these characteristics
had a a paged virtual address space
0:28:58.660,0:29:02.980
so you din't have to know as in the old days how much physical
memory is on the machine and make your application
0:29:02.980,0:29:04.740
fit into that amount of memory
0:29:04.740,0:29:07.950
you just had what looked like a large
0:29:07.950,0:29:11.710
uniform address space even if the underlying
hardware had segments or some other
0:29:11.710,0:29:13.580
hardware brain damage
0:29:13.580,0:29:17.390
it looked to you like he just had a big uniform
address space and
0:29:17.390,0:29:21.070
the size of your address space was independent
of the amount of memory that was on your machine
0:29:21.070,0:29:23.900
your address space couldn't be bigger than amount of
physical memory
0:29:23.900,0:29:26.499
cause we sort of move pages around underneath
0:29:26.499,0:29:29.320
whatever part address space was actually
active
0:29:29.320,0:29:34.260
and there's obviously limits to this if
you are trying to run a 1 gigabyte of
0:29:34.260,0:29:35.630
application on top of
0:29:35.630,0:29:37.240
ten megabytes of memory
0:29:37.240,0:29:40.880
it's probably going to bring new meaning to
same day service
0:29:40.880,0:29:45.519
but if you're willing to wait long enough it
will eventually move the pages around and you will
0:29:45.519,0:29:49.740
progress through getting your application run
0:29:49.740,0:29:53.890
another thing was dealing with software
interrupts
0:29:53.890,0:29:55.789
in the old days
0:29:55.789,0:29:58.749
you had to understand how the hardware worked
0:29:58.749,0:30:03.900
in order to deal with exceptional conditions
so for example if you did a divide by zero
0:30:03.900,0:30:08.170
the hardware would jump through some
vector location or
0:30:08.170,0:30:08.630
something
0:30:08.630,0:30:12.799
and you had know how that worked and make
sure that you had your program
0:30:12.799,0:30:16.510
usually some little bit of assembly language
set up to deal with that
0:30:16.510,0:30:19.870
and UNIX said let's let's get away
from the hardware here
0:30:19.870,0:30:22.080
and so they did this thing called signals
0:30:22.080,0:30:25.700
and so they just define a set of the signals is that
if you do divide by zero
0:30:25.700,0:30:29.529
you simply register a routine you
want to have called you don't have to know
0:30:29.529,0:30:31.220
how the hardware figured it out
0:30:31.220,0:30:36.740
you just know that that routine is going to get
called and you can deal with it at that point
0:30:36.740,0:30:40.960
well we got set of timers and counters to keep
track of what we're doing, this is really more
0:30:40.960,0:30:43.490
for counting than anything else but
0:30:43.490,0:30:46.970
applications may want to have access to that.
0:30:46.970,0:30:51.720
we have a set of identifiers that we're
going to use for things like accounting,
0:30:51.720,0:30:54.830
protection and scheduling and so on
0:30:54.830,0:30:55.820
and one of the
0:30:55.820,0:31:00.320
the early philosophies of UNIX was to try
and keep it simple.
0:31:00.320,0:31:02.630
operating systems have gotten very baroque
0:31:02.630,0:31:04.490
in particular the thing that
0:31:04.490,0:31:07.350
pre dated UNIX was a thing called
Multix
0:31:07.350,0:31:12.820
Multix was was a joint project between
Honeywell, a big computer manufacturer of the
0:31:12.820,0:31:15.740
time
0:31:15.740,0:31:17.129
AT&T bell laboratories
0:31:17.129,0:31:19.750
the big industrial labratory at that time
0:31:19.750,0:31:21.380
and MIT
0:31:21.380,0:31:23.430
a big university then and
0:31:23.430,0:31:24.690
still today
0:31:24.690,0:31:29.259
and those three organizations got
together to try and build this
0:31:29.259,0:31:31.400
time sharing operating system
0:31:31.400,0:31:32.280
and it
0:31:32.280,0:31:33.770
it just got bigger and
0:31:33.770,0:31:37.160
more grandiose and more complex and never
finished
0:31:37.160,0:31:38.979
because as soon as they sort of see
0:31:38.979,0:31:42.709
oh we know how to do that but we could
do this other thing too and so then they would tear it
0:31:42.709,0:31:43.429
apart and
0:31:43.429,0:31:46.440
they never really got to something that
0:31:46.440,0:31:48.210
could be put into production
0:31:48.210,0:31:49.919
and so the
0:31:49.919,0:31:50.570
AT&T
0:31:50.570,0:31:54.340
Bell laboratories decided to pull out of
that project
0:31:54.340,0:31:55.940
and
0:31:55.940,0:32:00.000
the two of the people that had been working on
that project, Ken Thompson and Dennis Richie
0:32:00.000,0:32:04.390
were sort of bummed because they were now
back to typing cards and putting them through
0:32:04.390,0:32:05.259
card readers and
0:32:05.259,0:32:07.960
they had gotten used to the idea that you could
actually
0:32:07.960,0:32:11.559
sit at an ASSR33 teletype and interact
with your computer
0:32:11.559,0:32:13.440
and so
0:32:13.440,0:32:18.230
they found an old %uh PDP-8 sitting off in
the corner that had been abandoned
0:32:18.230,0:32:22.120
and started working on this little tiny operating
system which they called UNIX
0:32:22.120,0:32:26.549
which eventually moved to the PDP-11 and
became what we have today
0:32:26.549,0:32:28.050
but because it was
0:32:28.050,0:32:32.120
they were coming first of all from Multix
where everything had been done and
0:32:32.120,0:32:34.110
in great grandiose detail
0:32:34.110,0:32:37.549
and because they're fundamentally were two
of them working on it and they wanted to get something
0:32:37.549,0:32:38.370
done and
0:32:38.370,0:32:40.130
within a year or so
0:32:40.130,0:32:41.529
one of their philosophies was
0:32:41.529,0:32:44.099
let's find the one way of doing things
0:32:44.099,0:32:48.180
let's not have eight ways from Sunday let's just
get the one way
0:32:48.180,0:32:53.860
and that's what we will provide. So what is
the sort of core set of things that we need.
0:32:53.860,0:32:58.620
well first thing is when it comes to identifiers,
let's not have you know
0:32:58.620,0:33:00.430
eighty thousand different identifiers
0:33:00.430,0:33:03.140
so they came up with process identifiers,
0:33:03.140,0:33:09.620
user identifier and at that time a single group
identifier and later expanded
0:33:09.620,0:33:14.200
and they used that sort of identifiers for everything
so its used for counting, used for making
0:33:14.200,0:33:17.410
protection decisions, used for scheduling
decisions
0:33:17.410,0:33:19.470
and
0:33:19.470,0:33:24.279
again it was the simplicity of thing which
was what was driving their decision
0:33:24.279,0:33:28.840
but they're really sort of two key ideas
that they had
0:33:28.840,0:33:30.880
that really made the difference that
0:33:30.880,0:33:32.539
that's what set them up side
0:33:32.539,0:33:34.749
from what everybody else had done before them
0:33:34.749,0:33:35.450
and which
0:33:35.450,0:33:39.740
in retrospect is something that has been pervasive
more or less ever since
0:33:39.740,0:33:41.869
the first of these was the notion
0:33:41.869,0:33:44.840
that we have a unique descriptor space
0:33:44.840,0:33:46.289
that is
0:33:46.289,0:33:51.250
given a descriptor it can reference
any I/O device
0:33:51.250,0:33:53.650
so or even any kind of I/O channel
0:33:53.650,0:33:58.270
so you can have a descriptor for terminal
or descriptor for a file or descriptive for
0:33:58.270,0:34:02.240
a disk or descriptor for a pipe or descriptor
for a socket
0:34:02.240,0:34:03.500
and
0:34:03.500,0:34:04.790
you don't need to know
0:34:04.790,0:34:07.940
what it references in order to be able to read
and write that thing
0:34:07.940,0:34:11.290
so if i hand you a descriptor
you can read from that the descriptor or you can write
0:34:11.290,0:34:13.259
to that descriptor
0:34:13.259,0:34:15.189
and
0:34:15.189,0:34:17.359
the correct thing will happen
0:34:17.359,0:34:19.089
and you'd say well
0:34:19.089,0:34:23.629
that's so obvious I mean how else could you
possibly think of doing it?
0:34:23.629,0:34:25.179
well predating UNIX
0:34:25.179,0:34:28.059
everything was done with
0:34:28.059,0:34:29.379
a little subsystem
0:34:29.379,0:34:33.419
that would open a file, read a file, write a
file, close a file
0:34:33.419,0:34:37.429
and there was another set of system calls which
would open a terminal,read a terminal, write terminal,
0:34:37.429,0:34:38.089
close terminal
0:34:38.089,0:34:39.210
and yet another one
0:34:39.210,0:34:42.409
which was create a pipe,read a pipe,
write a pipe and so on.
0:34:42.409,0:34:47.699
so if you are just a drop dead stupid
program like say CAD
0:34:47.699,0:34:51.579
you would have to have code in there and say was
my input a terminal which in case I need to
0:34:51.579,0:34:53.159
use the read terminal
0:34:53.159,0:34:57.419
or is it a file which in case i need
to use read file or is it a pipe in which in case
0:34:57.419,0:34:59.189
i need to use read pipe
0:34:59.189,0:35:01.860
and so the program itself had to have all
this
0:35:01.860,0:35:02.859
coding in it
0:35:02.859,0:35:04.409
whereas when they went to
0:35:04.409,0:35:07.159
the uniform descriptor space
0:35:07.159,0:35:09.630
CAD doesn't know it doesn't need to know
it just says
0:35:09.630,0:35:10.819
read my input,
0:35:10.819,0:35:13.979
write the output
0:35:13.979,0:35:17.059
and it works and we add a new type of descriptor
0:35:17.059,0:35:17.600
and
0:35:17.600,0:35:21.700
CAD just continues to work just as it always
did.
0:35:21.700,0:35:24.199
So this proved to be a very powerful construct
0:35:24.199,0:35:27.019
and pretty much every operating system after
UNIX
0:35:27.019,0:35:28.659
did that there's
0:35:28.659,0:35:30.210
one exception of %uh
0:35:30.210,0:35:32.549
large company in the Pacific North-West
0:35:32.549,0:35:35.830
that still has not quite uniform descriptor
space
0:35:35.830,0:35:38.380
but %uh that's part of their legacy that really
0:35:38.380,0:35:39.900
they're working on that.
0:35:39.900,0:35:42.009
Longhorn will be here.
0:35:42.009,0:35:43.939
and anyway
0:35:43.939,0:35:46.190
this set of facilities then
0:35:46.190,0:35:50.150
makes up the UNIX virtual machine
0:35:50.150,0:35:51.559
and
0:35:51.559,0:35:55.559
in some sense we still see virtual machines
being used today in fact we're seeing sort
0:35:55.559,0:35:56.749
of a reversion
0:35:56.749,0:36:01.429
back to some of the IBM stuff in things
like the VMware
0:36:01.429,0:36:03.079
which is
0:36:03.079,0:36:07.029
essentially allow you to go back to booting
native operating systems again so sort of
0:36:07.029,0:36:08.280
interesting to watch
0:36:08.280,0:36:09.060
that the sort of
0:36:09.060,0:36:12.919
pendulum of back going back and forth
of what's the correct layer
0:36:12.919,0:36:14.609
for for doing
0:36:14.609,0:36:18.890
virtual machines
0:36:18.890,0:36:22.499
Okay? so far so good?
0:36:22.499,0:36:24.719
all right so i said that there were
0:36:24.719,0:36:27.160
two key ideas that UNIX had
0:36:27.160,0:36:30.279
the first of these being the uniform descriptor
space
0:36:30.279,0:36:35.819
the second one which was really critical was
this notion of processes as a commodity
0:36:35.819,0:36:37.309
item
0:36:37.309,0:36:40.220
so here on Page 17 I've tried to lay
it out
0:36:40.220,0:36:41.090
the
0:36:41.090,0:36:44.159
that the components that make up a process
0:36:44.159,0:36:45.759
and
0:36:45.759,0:36:50.359
what do I really mean when I say a process as
a commodity item
0:36:50.359,0:36:53.650
okay leading up to
0:36:53.650,0:36:54.689
UNIX
0:36:54.689,0:36:56.800
the systems that pre-dated it,
0:36:56.800,0:36:59.200
processes were these very large
0:36:59.200,0:37:02.169
heavyweight expensive things
0:37:02.169,0:37:02.779
and
0:37:02.779,0:37:04.539
if you look at
0:37:04.539,0:37:08.629
MVS which was the operating system
that ran on IBM for doing multiple processing
0:37:08.629,0:37:10.509
and
0:37:10.509,0:37:13.799
the system administrator would decide at boot
time
0:37:13.799,0:37:17.019
what degree of multiprocessing they wish
to support
0:37:17.019,0:37:18.140
so they'd say well
0:37:18.140,0:37:20.739
well, we'll let upto six things happen at once
0:37:20.739,0:37:22.490
and so as part of booting up
0:37:22.490,0:37:24.419
they would create six
0:37:24.419,0:37:25.349
processes
0:37:25.349,0:37:30.059
and now you as a user if you wanted to do
something let's say you wanted to
0:37:30.059,0:37:32.009
compile and run a program
0:37:32.009,0:37:34.960
you would be given a process
0:37:34.960,0:37:36.019
and it was up to you
0:37:36.019,0:37:39.369
to figure out how to stage what you needed
done
0:37:39.369,0:37:39.819
and
0:37:39.819,0:37:43.930
that this was often fairly complex
0:37:43.930,0:37:47.880
and so you would have to write out all the
steps that you wanted
0:37:47.880,0:37:50.300
in this wonderful thing called JCL
0:37:50.300,0:37:52.259
Job Control Language.
0:37:52.259,0:37:56.650
Job Control Language was send mail configuration
file of the sixties
0:37:56.650,0:38:00.679
there where people whose sole job at the company
was how to put this stuff together 'cause
0:38:00.679,0:38:04.189
all you had to do is get one extra space or
a missing comma
0:38:04.189,0:38:05.000
something in there
0:38:05.000,0:38:08.630
and the whole thing would just blow up. it would
just sort of spit the card deck back at
0:38:08.630,0:38:09.799
you and say well
0:38:09.799,0:38:13.500
somewhere in there is a mistake that's sort of
in the general area of this card
0:38:13.500,0:38:15.549
and I can't deal with it. Fix it.
0:38:15.549,0:38:16.489
and of course
0:38:16.489,0:38:20.550
in those days it wasn't just a matter of hitting
carriage when you know make carriage return you have to
0:38:20.550,0:38:25.239
get your deck pull out the card, and type the
new one, put it back in and re-submit it
0:38:25.239,0:38:28.729
As heaven forbid you couldnt touch that
card reader you know, it had to be done by
0:38:28.729,0:38:29.970
an operator
0:38:29.970,0:38:32.869
so the card deck will read through it would
disappear and
0:38:32.869,0:38:36.800
you know if you're lucky a few minutes later
if you were not lucky a few hours later
0:38:36.800,0:38:37.849
you would get
0:38:37.849,0:38:39.570
a print out
0:38:39.570,0:38:43.419
which was what had happened and then you could
look at it and you know
0:38:43.419,0:38:47.209
I put a comma in the wrong place I guess
I get to do it all again
0:38:47.209,0:38:49.930
so
0:38:49.930,0:38:54.940
the thing you would need to do there for compiling and running a program
0:38:54.940,0:38:59.579
was you'd have to break into these steps. well
I need to run the the preprocessor
0:38:59.579,0:39:04.670
and so clean out whatever gump that was left
over on that process from the previous user
0:39:04.670,0:39:06.240
put the preprocessor in there
0:39:06.240,0:39:10.530
and then read from this file here let's
say I gotta put it somewhere so creative
0:39:10.530,0:39:12.510
scratch file over on this disk and
0:39:12.510,0:39:17.299
it was excruciating detail like how many cylinders
and how many tracks and this and that
0:39:17.299,0:39:19.139
blocks blah blah blah
0:39:19.139,0:39:23.119
and don't forget any of those parameters 'cause
it'll spit it out if you do
0:39:23.119,0:39:26.890
and so then it would run the first step in that
if its successful then you'd have sitting
0:39:26.890,0:39:28.899
in this scratch file that you had created
0:39:28.899,0:39:33.100
the output of the preprocessor and then
you'd load the first pass of the compiler
0:39:33.100,0:39:36.930
and you say now read from that scratch file
and create this other scratch file over here and
0:39:36.930,0:39:39.450
when thats successful and we need to delete that
one
0:39:39.450,0:39:43.830
and then load the second pass, put that back
into another scratch file and then we run this
0:39:43.830,0:39:45.950
assembler, and the optimizer then the
0:39:45.950,0:39:47.750
loader this and that
0:39:47.750,0:39:49.410
finally run the program
0:39:49.410,0:39:50.900
and if all goes well
0:39:50.900,0:39:57.029
you know at step sixteen out comes the answer
0:39:57.029,0:39:58.129
forty two. so UNIX
0:39:58.129,0:40:00.819
said, look this is silly
0:40:00.819,0:40:02.880
a lot of this is just
0:40:02.880,0:40:04.310
bookkeeping
0:40:04.310,0:40:07.249
and computers do bookkeeping really well
0:40:07.249,0:40:12.179
and you'll recall yeah but it's going to take
all these cycles it's like
0:40:12.179,0:40:16.309
computers are supposed to be labor-saving
devices right? so
0:40:16.309,0:40:20.150
they came up with this notion that they would
create processes on the fly as needed
0:40:20.150,0:40:21.159
you had
0:40:21.159,0:40:25.549
you've had a preprocessor in two
steps of the compiler and then
0:40:25.549,0:40:27.109
optimizer and then a loader
0:40:27.109,0:40:29.410
we just create Boom seven processes
0:40:29.410,0:40:31.920
and we connect them together with pipes
0:40:31.920,0:40:35.180
and so we take the input and you know run
through in
0:40:35.180,0:40:38.270
through the pipes and you know out the end
you get the the
0:40:38.270,0:40:39.629
executable
0:40:39.629,0:40:40.030
and
0:40:40.030,0:40:42.880
we will simply create each of these processes
0:40:42.880,0:40:44.650
and
0:40:44.650,0:40:46.549
so you as a user just
0:40:46.549,0:40:49.479
type you know the C compiler and it just
0:40:49.479,0:40:52.429
fork these things pipe them together got the result
0:40:52.429,0:40:53.640
and
0:40:53.640,0:40:57.509
then once it was done with this processes is
just threw them away so any time you'd create a
0:40:57.509,0:41:00.479
new process and it came to you pristine clean
0:41:00.479,0:41:04.239
and you needed a bunch of things it did
put everything in intermediate files
0:41:04.239,0:41:07.549
the fact of the matter is in the early days
0:41:07.549,0:41:08.129
those computers
0:41:08.129,0:41:11.910
didn't really have enough memory to support
all that stuff at once so
0:41:11.910,0:41:15.809
behind you those pipes were actually implemented
as files
0:41:15.809,0:41:19.319
but you didn't have atleast to remember to create
them and delete them
0:41:19.319,0:41:20.200
and deal with them
0:41:20.200,0:41:24.020
as far as you were concerned it just look stuff
flowing through pipes and of course today it
0:41:24.020,0:41:24.490
just
0:41:24.490,0:41:27.989
does flow through pipes in memory
0:41:27.989,0:41:29.439
okay so
0:41:29.439,0:41:33.689
this notion then that that we're just gonna
create processes on the fly is needed and
0:41:33.689,0:41:35.559
connect them together as needed
0:41:35.559,0:41:38.039
it was a novel concept
0:41:38.039,0:41:43.599
and it wasn't that somehow mysteriously figured
out how to create processes cheaply
0:41:43.599,0:41:44.839
cause they hadn't
0:41:44.839,0:41:46.180
they were still
0:41:46.180,0:41:49.959
really expensive to create
0:41:49.959,0:41:52.210
but that extra effort
0:41:52.210,0:41:53.029
was
0:41:53.029,0:41:56.089
worth it because it was saving a lot of programming
time
0:41:56.089,0:41:59.809
so my favorite example is you run ls
0:41:59.809,0:42:01.810
so we have to create a process
0:42:01.810,0:42:04.259
load the ls binary into it
0:42:04.259,0:42:06.180
it prints a line or two on your screen
0:42:06.180,0:42:10.609
and we tear the entire thing down and return
all its resources back to the system
0:42:10.609,0:42:14.979
more than ninety percent of the cost of running
ls is creating and destroying the process
0:42:14.979,0:42:19.239
a tiny fraction of it is actually running
ls
0:42:19.239,0:42:24.259
but it goes so fast, who cares right
0:42:24.259,0:42:25.749
so the point is that
0:42:25.749,0:42:30.039
that concept of just creating things as
needed
0:42:30.039,0:42:31.780
again was very powerful
0:42:31.780,0:42:35.709
and is one that is just pervasive today
0:42:35.709,0:42:38.639
okay so what is a process actually made up
of
0:42:38.639,0:42:43.179
it gets some amount of CPU time or at
least we do dearly hope that it gets some
0:42:43.179,0:42:46.050
amount of CPU time, the lack of getting
CPU time
0:42:46.050,0:42:46.670
that makes it
0:42:46.670,0:42:47.979
a computer so sluggish
0:42:47.979,0:42:49.409
of course
0:42:49.409,0:42:51.920
others really boils down to scheduling
0:42:51.920,0:42:54.249
and we're going to talk about scheduling
0:42:54.249,0:42:56.279
probably more than you care to
0:42:56.279,0:42:59.219
in a couple weeks time
0:42:59.219,0:43:01.619
we have the asynchronous events
0:43:01.619,0:43:04.569
these are the external events that
0:43:04.569,0:43:05.659
are coming in
0:43:05.659,0:43:07.679
so
0:43:07.679,0:43:10.169
they may be either things that
0:43:10.169,0:43:14.339
were coming in from the outside world like
start, stop and quit
0:43:14.339,0:43:15.279
oh
0:43:15.279,0:43:18.170
out-of-band data arrival notification that kind
of thing
0:43:18.170,0:43:22.339
or it may in fact be things that the program
is bringing down upon itself
0:43:22.339,0:43:25.590
such as a segment fault,a divide by zero
0:43:25.590,0:43:26.910
and some other
0:43:26.910,0:43:31.959
what would normally be viewed as incorrect
operation
0:43:31.959,0:43:35.849
and so we'll talk about that when we talk about
signals
0:43:35.849,0:43:37.039
every program
0:43:37.039,0:43:38.899
gets some amount of memory
0:43:38.899,0:43:42.659
it gets an initial amount when it starts
up injured generally allocates more as it
0:43:42.659,0:43:45.229
goes along
0:43:45.229,0:43:49.429
this of course we will deal with very extensively
will spend an entire week on it
0:43:49.429,0:43:54.249
when we talk about how virtual memory is implemented
0:43:54.249,0:43:54.609
and
0:43:54.609,0:43:57.429
then we get I/O descriptors
0:43:57.429,0:44:02.259
I used to say that every program had to have
at least one I/O descriptor since
0:44:02.259,0:44:04.910
it absolutely had no input
0:44:04.910,0:44:06.329
absolutely no output
0:44:06.329,0:44:09.049
then it was sort of pointless
0:44:09.049,0:44:12.900
of course I had to have one of my students
come up and point out to me there is an a
0:44:12.900,0:44:13.849
class of programs
0:44:13.849,0:44:16.469
which don't need I/O descriptors
0:44:16.469,0:44:17.440
and that is
0:44:17.440,0:44:19.549
these things called benchmarks
0:44:19.549,0:44:23.249
it just compute something all we really care
about is how long it takes them to compute
0:44:23.249,0:44:24.959
we dont actually care what the answer is
0:44:24.959,0:44:26.019
In theory we dont
0:44:26.019,0:44:29.779
I personally like my benchmark stop with
something so I can see it there
0:44:29.779,0:44:31.489
doing computing the right thing
0:44:31.489,0:44:33.169
but in theory
0:44:33.169,0:44:35.919
that wouldn't be necessary
0:44:35.919,0:44:38.650
outside of that class of programs
0:44:38.650,0:44:42.670
everything needs some sort of descriptors and
of course we'll talk about descriptors
0:44:42.670,0:44:43.659
quite extensively
0:44:43.659,0:44:47.349
as we go through the I/O subsystem
0:44:47.349,0:44:50.969
okay so the executive summary is that processes
are
0:44:50.969,0:44:54.969
the fundamental service that is provided by
UNIX
0:44:54.969,0:44:58.430
and
0:44:58.430,0:45:02.849
what we're going to spend essentially the
next two and a half weeks working on
0:45:02.849,0:45:04.769
is
0:45:04.769,0:45:07.079
what what makes up processes
0:45:07.079,0:45:10.180
we'll go into much more detail about each of these
four points
0:45:10.180,0:45:11.769
and
0:45:11.769,0:45:13.630
then how do we actually go about
0:45:13.630,0:45:14.390
providing that
0:45:14.390,0:45:16.639
bit of service
0:45:16.639,0:45:17.900
the next thing that I'm
0:45:17.900,0:45:22.210
going to do now is this go through and lay
out some of the terminology that
0:45:22.210,0:45:23.239
we have when
0:45:23.239,0:45:25.130
we're talking about processes
0:45:25.130,0:45:29.229
so this is sort of the big picture here were
on page eighteen
0:45:29.229,0:45:30.669
and
0:45:30.669,0:45:33.669
you can see we have sort of three bits that
make up
0:45:33.669,0:45:36.640
the system
0:45:36.640,0:45:39.029
we have the currently running user process
0:45:39.029,0:45:41.180
and then what we call the top half of the kernel
0:45:41.180,0:45:43.699
and the bottom half of the kernel
0:45:43.699,0:45:47.049
now this would be a picture for a uniprocessor
0:45:47.049,0:45:49.299
so one CPU
0:45:49.299,0:45:51.209
if we had a multiprocessor
0:45:51.209,0:45:54.009
%uh then we would have
0:45:54.009,0:45:57.130
one instance of the kernel
0:45:57.130,0:45:59.529
but multiple instances of the user process
0:45:59.529,0:46:02.879
but for any given CPU on a multiprocessor
0:46:02.879,0:46:05.709
it is running exactly one process
0:46:05.709,0:46:09.309
so you may think they we're running for four-five
processes all at once
0:46:09.309,0:46:14.319
but the fact of the matter is that any instant
in time there's only one process which is
0:46:14.319,0:46:16.299
actually running
0:46:16.299,0:46:18.609
and
0:46:18.609,0:46:21.429
that is the one that we have loaded in the system
0:46:21.429,0:46:25.199
now we give the illusion that were running
lots of things because we switch between them
0:46:25.199,0:46:26.100
rather quickly
0:46:26.100,0:46:29.269
so it looks like things are happening in all
windows at once
0:46:29.269,0:46:31.430
but in reality
0:46:31.430,0:46:33.619
that's not really happening
0:46:33.619,0:46:36.440
okay so there is a set of properties that I want to
look at
0:46:36.440,0:46:40.899
that had to do with each one of these parts here
0:46:40.899,0:46:44.359
but just to sort of look at it from the
big picture perspective
0:46:44.359,0:46:45.970
what you see here
0:46:45.970,0:46:47.180
is
0:46:47.180,0:46:51.549
there is boundary between the user process
and the top half of the kernel
0:46:51.549,0:46:54.949
which is really just like a glorified sovereignty
call
0:46:54.949,0:46:59.539
it's a lot like calling into a library routine
like calling strcat, strcpy or something
0:46:59.539,0:47:00.319
like that
0:47:00.319,0:47:03.679
when you do a system call
0:47:03.679,0:47:05.650
we take that same set of parameters
0:47:05.650,0:47:08.009
now this is sort of
0:47:08.009,0:47:09.780
brick Wall here if you will
0:47:09.780,0:47:11.380
that is protecting
0:47:11.380,0:47:13.680
the top half of the kernel
0:47:13.680,0:47:15.299
from the application
0:47:15.299,0:47:18.899
I'll go more into some detail about how that
actually gets implemented
0:47:18.899,0:47:22.729
but in essense you can think of it
is is there sort of this whaling Wall and these little
0:47:22.729,0:47:24.990
chinks there and you can sort of push a request
through
0:47:24.990,0:47:28.230
and somebody other sides sort of pulls that
looks at it and decides whether they're going
0:47:28.230,0:47:28.690
to
0:47:28.690,0:47:30.769
dain to provide service to you
0:47:30.769,0:47:34.229
and if they do then they sort of send it back
0:47:34.229,0:47:37.649
well like a library where you can just sort
of reach in and walk around if you want to
0:47:37.649,0:47:38.290
you
0:47:38.290,0:47:40.950
good programming practices you don't do that
but
0:47:40.950,0:47:43.049
you could
0:47:43.049,0:47:44.579
all right so
0:47:44.579,0:47:49.089
the the top half of the kernel is really looks
a lot like
0:47:49.089,0:47:50.509
a big library
0:47:50.509,0:47:53.509
%uh it just happens to be a library
routines
0:47:53.509,0:47:57.599
that deal with things where processes need
to interact with each other
0:47:57.599,0:48:01.399
in fact for many people they don't understand
for what's the difference between the C
0:48:01.399,0:48:03.259
library and the top half of the kernel
0:48:03.259,0:48:08.020
if it's something that you're doing that
no other process needs to know about
0:48:08.020,0:48:09.799
then it can be in the C library
0:48:09.799,0:48:13.829
so if you call strcat to concatenate two
strings together
0:48:13.829,0:48:17.599
nobody else needs to know you're doing that
you don't need to coordinate with anybody
0:48:17.599,0:48:19.000
else that you're doing that
0:48:19.000,0:48:20.160
it's just happening
0:48:20.160,0:48:21.979
so that goes in the C library.
0:48:21.979,0:48:24.489
on the other hand if you're reading or writing
the file
0:48:24.489,0:48:28.029
there may be other processes that are also
reading and writing that file
0:48:28.029,0:48:29.910
and therefore that
0:48:29.910,0:48:31.579
has to be done by the kernel
0:48:31.579,0:48:33.120
because they can coordinate
0:48:33.120,0:48:37.189
all the different processes that are trying to access
that file.
0:48:37.189,0:48:40.529
so the top half of the kernel is pretty straightforward
code
0:48:40.529,0:48:45.539
it looks a lot like any other library that
you would write if you look at top half kernel
0:48:45.539,0:48:49.640
code you know you see all read,come in
it's got these parameters we Mark around we
0:48:49.640,0:48:53.719
get some data that we put it in the buffer and
we return back
0:48:53.719,0:48:57.470
and in fact writing code for the top half of
the kernel is
0:48:57.470,0:48:59.729
not all that difficult to do
0:48:59.729,0:49:00.989
it's
0:49:00.989,0:49:01.959
you have
0:49:01.959,0:49:05.939
for many of the same properties that you would
when you're writing user level application
0:49:05.939,0:49:07.529
code
0:49:07.529,0:49:11.779
the bottom half of the kernel is where things
start to get nasty
0:49:11.779,0:49:14.820
because the bottom half of the kernel is the part
of the system
0:49:14.820,0:49:18.769
that deals with all of the asynchronous events
in the system
0:49:18.769,0:49:22.179
is things like device drivers,
0:49:22.179,0:49:23.779
timers
0:49:23.779,0:49:25.010
that level of thing
0:49:25.010,0:49:28.029
that are driven by hardware events
0:49:28.029,0:49:28.659
so
0:49:28.659,0:49:31.459
for example a packet arrives on the network
0:49:31.459,0:49:33.670
that causes an interrupt to come and
0:49:33.670,0:49:36.729
that will be handled by the bottom half of
the kernel
0:49:36.729,0:49:38.829
and historically
0:49:38.829,0:49:43.079
when an interrupt came in it preempted whatever
else was going on
0:49:43.079,0:49:45.400
and it ran until it finished and then it returned
0:49:45.400,0:49:46.539
and it could not
0:49:46.539,0:49:49.439
go to sleep to wait for resources or other
things
0:49:49.439,0:49:51.339
%uh in current systems
0:49:51.339,0:49:54.549
you can actually go to sleep in the interrupt driver
and waiting for
0:49:54.549,0:49:56.739
some other activity to complete
0:49:56.739,0:49:58.259
it is however
0:49:58.259,0:50:00.799
not a good idea to do that
0:50:00.799,0:50:01.909
because
0:50:01.909,0:50:06.739
the usual case of most device drivers is they
can finish whatever they're doing in an interrupt
0:50:06.739,0:50:08.579
without ever blocking
0:50:08.579,0:50:09.580
and so
0:50:09.580,0:50:13.649
when an interrupt comes in we assume that you're
not going to sleep
0:50:13.649,0:50:14.710
and if you actually
0:50:14.710,0:50:17.219
then go to sleep.oh man
0:50:17.219,0:50:20.469
you didnt tell us you're going to do this we
have to go off to do a whole lot of other work
0:50:20.469,0:50:23.029
that we had originally planned on doing
0:50:23.029,0:50:25.460
so if you go to sleep in a device driver
0:50:25.460,0:50:28.209
you are taking a very serious performance hit
0:50:28.209,0:50:31.019
so it's highly recommended that you don't
do that
0:50:31.019,0:50:33.130
but if you have to you can
0:50:33.130,0:50:35.809
on it's because of this historic behavior
or
0:50:35.809,0:50:39.899
of not being able to sleep in the bottom half
of the kernel
0:50:39.899,0:50:42.119
that you have certain properties that have
0:50:42.119,0:50:44.769
taken over in device drivers
0:50:44.769,0:50:45.940
and that is
0:50:45.940,0:50:50.369
that a device driver should be handed all
the resources it needs to get his job done
0:50:50.369,0:50:54.490
you don't give a disk device driver
Go read this
0:50:54.490,0:50:56.549
and put it somewhere
0:50:56.549,0:50:57.580
you have to say
0:50:57.580,0:50:59.410
Go read this particular block
0:50:59.410,0:51:02.650
here is a chunk of memory that I want that
data to put in
0:51:02.650,0:51:03.959
and
0:51:03.959,0:51:06.169
notify me when it's done
0:51:06.169,0:51:06.970
because
0:51:06.970,0:51:10.660
things like allocating memory are classic
places where you end up having to go to sleep
0:51:10.660,0:51:12.939
to wait for stuff to happen
0:51:12.939,0:51:14.449
and
0:51:14.449,0:51:16.390
historically you couldn't do that
0:51:16.390,0:51:18.640
even currently don't want to have to do that
0:51:18.640,0:51:23.400
so device drivers generally have all
resources pre allocated
0:51:23.400,0:51:25.169
and then they can just go
0:51:25.169,0:51:27.279
the one place where this doesn't work
0:51:27.279,0:51:29.029
is the network
0:51:29.029,0:51:30.929
and in particular
0:51:30.929,0:51:34.630
you don't know when somebody's going to send
packets to you
0:51:34.630,0:51:37.040
you say well you're looking to open connections
0:51:37.040,0:51:39.360
but if you're doing something like IP forwarding
0:51:39.360,0:51:40.969
there's no
0:51:40.969,0:51:45.039
top half state it's dealing with this packets
they're just coming in on one interface being
0:51:45.039,0:51:46.719
sent out on another interface
0:51:46.719,0:51:50.630
they never pass through any part of the top
half of the kernel
0:51:50.630,0:51:53.529
and so in the case of network device drivers
0:51:53.529,0:51:56.149
they need to allocate memory
0:51:56.149,0:51:56.640
and
0:51:56.640,0:51:58.829
if memory gets into short supply
0:51:58.829,0:52:01.689
and they try to allocate memory and it's not
available
0:52:01.689,0:52:05.049
they historically coudnt wait for memory to be
available
0:52:05.049,0:52:08.380
and even in practice today don't wait
0:52:08.380,0:52:09.580
for memory to become available
0:52:09.580,0:52:12.469
they simply drop the packet on the floor
0:52:12.469,0:52:18.109
it's like well I didn't have any place to
put it sorry oops
0:52:18.109,0:52:20.940
now that doesn't cause incorrect behavior
0:52:20.940,0:52:24.369
because the higher level protocols will retransmit
0:52:24.369,0:52:29.140
but it does cause great performance problems
because retransmission means that connections
0:52:29.140,0:52:29.879
stall
0:52:29.879,0:52:31.110
they have to back up
0:52:31.110,0:52:33.010
they have to resend data
0:52:33.010,0:52:33.739
and so on
0:52:33.739,0:52:38.739
so you really want to avoid dropping packets
if you can possibly help it
0:52:38.739,0:52:42.029
and consequently
0:52:42.029,0:52:43.420
we tend to
0:52:43.420,0:52:46.499
pre allocate a certain amount of memory for
the network drivers
0:52:46.499,0:52:48.299
and
0:52:48.299,0:52:52.169
we try very hard to make sure that we're not
going to run out of memory but
0:52:52.169,0:52:54.869
if packets come fast enough and we can't deal
with them
0:52:54.869,0:52:57.940
as quickly as they are arriving then over short period
of time
0:52:57.940,0:53:03.489
we get to the point where we simply have to start
dropping packets
0:53:03.489,0:53:07.649
okay this is a part of kernel that you do not wish to
write code for
0:53:07.649,0:53:10.919
because it is extremely difficult to
debug
0:53:10.919,0:53:12.759
you get these bugs where
0:53:12.759,0:53:18.779
the only time it happens is on the third Tuesday
when there's a full moon
0:53:18.779,0:53:19.300
and
0:53:19.300,0:53:24.199
we have a disk interrupt followed by %uh a
terminal character coming in
0:53:24.199,0:53:28.289
and the network packet arriving of size fifteen
twenty two
0:53:28.289,0:53:30.109
and when all those things happened
0:53:30.109,0:53:32.719
the system panics
0:53:32.719,0:53:37.380
and of course there's like it panics
cause you're following some bad pointer
0:53:37.380,0:53:40.969
something that should have been there
but was freed some time in the distant past
0:53:40.969,0:53:42.930
we are not sure when
0:53:42.930,0:53:44.049
and
0:53:44.049,0:53:47.400
try to debug things like that is extremely
difficult
0:53:47.400,0:53:48.509
and you can
0:53:48.509,0:53:52.120
think well I think I found the problem but
it's not reproduceable
0:53:52.120,0:53:55.530
you know you have to wait for the next third
Tuesday with a full moon and blah blah blah
0:53:55.530,0:53:56.950
to happen
0:53:56.950,0:53:57.469
and
0:53:57.469,0:54:01.449
you know so you sort of statistically
guess that you fix that you know I was getting
0:54:01.449,0:54:03.510
this bug once every three days
0:54:03.510,0:54:06.099
and now it's gone for two weeks without happening
0:54:06.099,0:54:07.239
did you fix that?
0:54:07.239,0:54:08.969
or if you've been lucky
0:54:08.969,0:54:10.459
and and it's
0:54:10.459,0:54:14.349
that coupled with the fact that you're
dealing with hardware
0:54:14.349,0:54:18.049
and hardware rarely works the way it's documented
to work
0:54:18.049,0:54:21.770
and so you know they're doing everything that
it says you're supposed to do
0:54:21.770,0:54:26.260
it still doesn't work because you didn't set
the fiddle bit over on that other place over
0:54:26.260,0:54:26.660
there
0:54:26.660,0:54:30.479
that's not documented anywhere but if it's
not said it doesn't work
0:54:30.479,0:54:33.769
occasionally
0:54:33.769,0:54:36.110
so this is another reason that you really want
of avoid
0:54:36.110,0:54:40.459
dealing with this part of the system if
you can possibly help
0:54:40.459,0:54:44.369
okay but lets go through and and look at some
of the properties here starting up at
0:54:44.369,0:54:45.789
the user process
0:54:45.789,0:54:47.980
we're running with
0:54:47.980,0:54:51.449
preemptive scheduling
0:54:51.449,0:54:53.409
now there's several caveats here
0:54:53.409,0:54:55.239
preemptive scheduling is the default
0:54:55.239,0:54:56.970
so called shared scheduler
0:54:56.970,0:55:01.360
that is what you normally use there are other
schedulers like the real time scheduler
0:55:01.360,0:55:02.869
where what I'm saying isnt that true
0:55:02.869,0:55:05.709
we'll talk about some of the schedulers was
later
0:55:05.709,0:55:09.930
but the usual scheduler that you're running
on under UNIX is a shared scheduler
0:55:09.930,0:55:13.229
and under the shared scheduler user applications
0:55:13.229,0:55:15.159
run with pre emptive scheduling
0:55:15.159,0:55:17.449
and pre emptive scheduling means that
0:55:17.449,0:55:20.019
you run at the whim of the system
0:55:20.019,0:55:21.420
if it wants you to run
0:55:21.420,0:55:22.140
you run
0:55:22.140,0:55:25.490
once you to start running you have no guarantee
of how long you're going to run
0:55:25.490,0:55:29.370
it might like to run for three instructions
and then decide it doesn't like you many more
0:55:29.370,0:55:31.150
it wants to run something else
0:55:31.150,0:55:35.920
or you might get to run for several seconds
and in a row with the with no intervening
0:55:35.920,0:55:37.469
things interrupting you
0:55:37.469,0:55:39.719
you just don't know
0:55:39.719,0:55:40.969
and
0:55:40.969,0:55:42.839
really all you know is
0:55:42.839,0:55:43.569
that
0:55:43.569,0:55:48.239
they claim that they're using statistics
and that and that the statistics are fair
0:55:48.239,0:55:55.059
and so on average you're going to get a reasonable
amount of time but thats
0:55:55.059,0:55:57.129
up to the system you don't control that
0:55:57.129,0:55:58.439
the real point here
0:55:58.439,0:56:01.940
is that you don't have any way of creating
a critical section
0:56:01.940,0:56:04.950
you can't say okay I don't want to be interrupted
0:56:04.950,0:56:07.429
during this particular sequence of things
0:56:07.429,0:56:09.809
so you have to program
0:56:09.809,0:56:13.469
assuming that you may be interrupted at any
point
0:56:13.469,0:56:14.979
okay
0:56:14.979,0:56:18.909
the next thing is that when you're running
in a user process
0:56:18.909,0:56:20.719
you are running in
0:56:20.719,0:56:24.150
with the processor in what's called unprivileged
mode
0:56:24.150,0:56:28.109
one of the requirements for running any kind
of a UNIX system
0:56:28.109,0:56:31.759
is that you have to have a processor that
support privileged and unprivileged
0:56:31.759,0:56:33.709
two different modes of operation
0:56:33.709,0:56:37.049
in privileged mode which is what the kernel
runs in
0:56:37.049,0:56:38.950
the entire repertoire
0:56:38.950,0:56:40.869
of the hardware is available
0:56:40.869,0:56:45.339
by this I mean you can set all the registers
you can fiddle with the memory management
0:56:45.339,0:56:47.460
unit you can initiate I/O
0:56:47.460,0:56:50.519
you can access any memory anywhere
0:56:50.519,0:56:51.919
etc
0:56:51.919,0:56:56.540
when you're running in unprivileged
mode which is what user processes run in and
0:56:56.540,0:57:00.709
this a large subset of the instructions which
you cannot execute
0:57:00.709,0:57:03.480
you cannot initiate I/O on
0:57:03.480,0:57:04.209
devices
0:57:04.209,0:57:06.770
you cannot change the memory mapping
0:57:06.770,0:57:10.209
you cannot access memory that's not part of
your address space
0:57:10.209,0:57:13.299
you cannot execute certain instructions
like halt
0:57:13.299,0:57:15.589
and
0:57:15.589,0:57:19.039
so in general you are protected
0:57:19.039,0:57:21.789
from manipulating anything that's outside of your
address space
0:57:21.789,0:57:23.759
this of course is desirable because
0:57:23.759,0:57:27.059
when you're running in this unprevileged
mode
0:57:27.059,0:57:28.300
you're protected
0:57:28.300,0:57:31.910
from other processes manipulating you
and they're protected from you manipulating
0:57:31.910,0:57:33.079
them
0:57:33.079,0:57:36.430
for those of you that have had that misfortune
to have to use
0:57:36.430,0:57:39.339
early versions of windows up to about ninety
eight
0:57:39.339,0:57:42.470
they always ran with the processor
running in privileged mode
0:57:42.470,0:57:44.009
even in applications
0:57:44.009,0:57:46.459
and so either maliciously or accidentally
0:57:46.459,0:57:50.000
you could stop on other people address space
or you could stop on the kernel
0:57:50.000,0:57:53.020
and a lot of the blue screen of death was
people just
0:57:53.020,0:57:56.319
following wild pointers and trashing different
parts of the system
0:57:56.319,0:57:58.819
taking everything down
0:57:58.819,0:58:00.020
it also makes it
0:58:00.020,0:58:02.320
far easier to
0:58:02.320,0:58:05.459
implement things like viruses and worms and
other things because
0:58:05.459,0:58:09.619
a user application can we rewrite the boot
block on the disk they can just the write down
0:58:09.619,0:58:13.109
and manipulate the registers that allow them
to do whatever they want
0:58:13.109,0:58:16.730
whereas when you're running in unprivileged
mode you cant write those kinds of
0:58:16.730,0:58:20.179
of things
0:58:20.179,0:58:24.119
so modern versions of Windows anything from about
2000 on
0:58:24.119,0:58:26.630
now run with privileged and unprevileged mode
0:58:26.630,0:58:28.649
but UNIX has always required that
0:58:28.649,0:58:30.219
and so when you're running an
0:58:30.219,0:58:31.319
user process
0:58:31.319,0:58:33.389
you cannot block i mean
0:58:33.389,0:58:37.969
you cannot execute the instructions which
cause a context switching to occur
0:58:37.969,0:58:40.349
you can't pick what's going to run next
0:58:40.349,0:58:43.140
you can't make that thing run next all you can
do
0:58:43.140,0:58:45.189
is go to the operating system and say
0:58:45.189,0:58:49.269
hey I've got nothing to do. pick somebody else
to run
0:58:49.269,0:58:53.449
and the operating system is the think they can
then execute the instructions which cause
0:58:53.449,0:58:57.609
a different process to be loaded
0:58:57.609,0:58:59.049
and run
0:58:59.049,0:59:03.400
alright.finally while you're in a user application you're
running on a user stack
0:59:03.400,0:59:06.410
that's part of the user's address space
0:59:06.410,0:59:07.889
so
0:59:07.889,0:59:10.819
part of creating a process gives you a runtime
stack
0:59:10.819,0:59:14.369
as part of a virtual address space and so it
can be
0:59:14.369,0:59:18.199
more or less up to the limits of the hardware
as big as you want it to be
0:59:18.199,0:59:19.949
so if you are running on thirty two-bit processor
0:59:19.949,0:59:22.819
you're stack can get the 2 gigabytes
0:59:22.819,0:59:23.319
and
0:59:23.319,0:59:26.839
the what this means is that anytime you
allocate local variables
0:59:26.839,0:59:28.529
you don't have to worry about Oh
0:59:28.529,0:59:30.609
is that gonna overrun my stack?
0:59:30.609,0:59:31.610
so if you need
0:59:31.610,0:59:35.519
a hundred thousand double precision floating
point numbers
0:59:35.519,0:59:37.189
you can just as a local variable allocate
0:59:37.189,0:59:40.269
an array of size a hundred-thousand type
double
0:59:40.269,0:59:44.029
and it just decrements your stack pointer by
hundred hundred thousand bytes
0:59:44.029,0:59:45.009
away you go
0:59:45.009,0:59:47.299
it's just virtual address space
0:59:47.299,0:59:49.020
as you'll see when we get into the kernel
0:59:49.020,0:59:50.210
that ceases to be the case