aboutsummaryrefslogblamecommitdiff
path: root/en_US.ISO8859-1/captions/2006/mckusick-kernelinternals/mckusick-kernelinternals-1.sbv
blob: 8d3b93ab6df0f7df7be2533f716195137b1d39cf (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11










                                         

                                                            























                                                              
                                               


                       
                             















                                                          
                                             





                                      
                                      





                                              
                                       























































                                                         
                                              

















                                           
                                           





                                          
                                          




                                     
                                          





                       
                                          
















































                                                 
                                              







































































                                                         
                                              




































                                                 
                                                









                                                     
                                                                   













                                                    
                                    
























                                                 
                                         



















                                                
                                                    


                       
                                                     



                                                    
                        





                                            
                        








                                            
                                                    









                                                  
                                               





















                                                
                                              





                                   
                                           

















                                                    
                                         















                                         
                                      














                                                   
                                              






















                                                  
             

                       
                                       








                       
                                               



                                          
                                                  





                                                 
   












                                              
                                      













                                                 
                   














                                          
                                    


                                      
            

                       
                                     


                                           
                                             















                                             
                                            



















                                              
                                           



















                                               
                    

                       
                                      


                       

                                                          




















                                                 

                                              















                                             
                                         





                       
                                           





















                                            

                                           




                       
                                          














                                                  
                                     





























                                             
                                              

                       
                                            




                        
                                           





                                               
                   








                                           
                                     











                                           
                                  


















                                                

                                         







                       
                                 















                                          
                 











                                                          
                                              





                                          

                                          




                       

                                               












                                              
                    

                       
                                                   


                       
                                            


                         
                                           

                       
                                           





                                           
                                               


                                    
  

                       
                    











                                    
                                              

                       

                                                     






















                                          

                                              





                                             
                                   








                                         
                                                         








                                 
     












                                              
                                       





                                                  
                                              


                       
                                    








                                             
                                        









                                               
                                      










                                           
                              

                       
                                  


                       
            

                       
                                      


                       

                                 


























                                                 
                                     














                                                    
                                          





                              
         





                                             
                        








                                            
                       


















                                                     
                                    


                              
                                   










                       
                                     

                       
          

                       
                                      















                                                   
                                        





                                               
                                 

                       

                                              











                                           
                                               

                       
                                             









                                             
                                            


















                                                   
                                         







































                                           
       





                                              
              













                          
                                         






                                               
           




                       

                                                 





                                        
                 

                       
                                        



















                                               
                                             


























                                            

                                              














                                          
                                        


                       
                                    

                       
                         

                       
                                  




                       
                             





                                               
                                      





                                           
                                                





                                 
                                 




                       
       







                                           
                                             








                                           
                                         

                       
                                               










                                                 
                                       






























                                                    
                                               







                                 
                                       





                          
                                      












                                              
                                        







                                                
                                         


                       
                                           










                                           
                                                


                       
                








                                              

                                        




                       
                                          



                                               
                   











                                        
                           













                           
                                        





                              
                                            












                                              
                                               








                                       
                       











                                             
                      





                                             

                                                

                       

                                                    




                       

                                             

                       
                                








                                            
                                              























                                                     
                                               






                                          
                                                 















                                              
                                      





                                                    
                         
















                                            
                                                














                                     
                                   












                                            
                            

                       

                                                  

                       
                                              





                       
                                          





                            
                       





                                                 

                                                        





                                                        
                                    










                                               
                 




                              
    

                       
                                       


                       
 















                                                
                                








                                    
                                 




                                    
                           














                                                 
                                        


                       

                                                          





                                       
                           











                                                 
  




                                       

                                      


















                                                              
                                               


                       

                                                

                       
                                                    


                       
                                             





                                        

                                       


















                                                  
                                       


                       
               





                                             
                                          

                       
                                    





                       
                                          


                              
                                             









                                         
                                                    


                        
                                 





























                                               
                                         












                                          
                                      


                                             
     
















                                         
                                 















                                         
                                   

                       

                                                      



























                                              
                                         


                                             
                

                       
                                               


                       
                                      





                       

                                          





                                                 
                                        












                                         

                                                       










                             
                                      





                                                
                                           
















                                                 

                                                     












                                          
                                         


                       
                                    

                       
                            

                       
                                             











                                                  
                                      

                       
       





                                   
                                  


                                         
                                           

                       
                                             









                                              
                    


                              
                                                      













                             

                                           

                       
                   












                                               
                                                      

                       
              








                                    

                                     




























                                                 
                                        








                       
                                                















                                              
                























                                               
                           







                                 
                                           





                                          
                                       












                                           
                

                       
                                     





                           
                

                       
                     

                       
                                    




                           
                                               









                                            
                                    








                                          
   





                                             
                  










                               
                            

























                                             
                                              














                                         
                         

                       
                            

                       
                    








                                          
                                  








                                          
                      





                                                
                                                


                                         

                                           

                       
                  

                       
                                                






                                              
                                








                                                
                                                       























                                             
           

                       
                                              









                                        
                                                                        





                                              
                                            





                                           

                                        


























                                                  
                                           










                                                
                                     







                       
                    




                                             
                  


















                                             
                 



































                                             
                                             







                       
                                        











                                                   
                                             





                                        
                                           












                                                
                                                 





                       

                                               

















                                           
                      

































                                                   
                                      













                                               
                                    

                       
                    

                       
                                       












                                          

                                         








                                       
                      



















                                        
                                  




                       
  




                              
                                          


                       
  

                       
                                               






                                            
                                        








                                          
                                               





                       
                          








                                         
                                                 








                                                    
                           















                                            
                                         













                                
                                            








                                          
                                        





















                                             
                                  






































                                                  
                       





                                          
            


















                                            
                                          































                                                
                                                 












                                              
                                                 


















                                             
                                                   





                                                
                                       























                                                 
                         













                                   
                          




                       
                                            


                         

                                                      

                       
                                               


                       

                                             




                       
                              

                       
                                             

























                                              
                                   













                                             
                                        







































                                              
                                                     


                       
                                                       














                                              
                                            





                            
    




                       
                                              









                                             
                                                 






                                              
                              
















                                           
                                    












                                               
                                                 











                                          

                                                    













                               
                                             








                                                 
                                                









                                            
                                              







                                        
                                             
















                                             
                                          

                       
                            






















                                          
               




                       
                        








































                                             
                                               



























                                                
                                























                                              
                                         

                       
                                                  


                                            
                                             



























                                            
   









                                             
                                                      






                                                 
                                                     


                       
                                     


























                                                
                                    
















                                           
           
















                                            
                                              
















                                                 
                                      


                       
                                                 




















                                             
                                              






                                       

                                             




                       
                   




















                                             
                                           






                                           
                                                










                                     
                      




                       
                                               









                                             
                                           


                                         
                       













                       
                                        
















                                                  
                                                 















                                           
                 



















                                                
                                           








                            

                                            




                         
                                  





                                   
                                            


















                                             
                                       


                       
   











                                                 
                                         























































                                              
                                          













                                              
                                                 











                                             
             






















                                              
                                              












                                                         
                                       



















                                                 
                                    




                       
                                       


























                                            
           








                                         
0:00:09.469,0:00:11.309
Hello my name is Marshall Kirk McKusick

0:00:11.309,0:00:15.389
and I've been around as long as dinosaurs
and mainframes have ruled the world

0:00:15.389,0:00:18.429
which is to say the sixties and seventies

0:00:18.429,0:00:22.460
however by 1970s a new breed of mammals had begun to show up
on the scene

0:00:22.460,0:00:24.240
known as mini computers

0:00:24.240,0:00:28.230
although they were just toys in the 1970s they would soon grow

0:00:28.230,0:00:31.689
and take over most of the computing market

0:00:31.689,0:00:33.150
In 1970 

0:00:33.150,0:00:37.910
at AT&T Bell laboratories two researchers Ken
Thompson and Dennis Ritchie began developing the

0:00:37.910,0:00:39.900
UNIX operating system

0:00:39.900,0:00:42.040
Ken Thompson who had been an alumnus at Berkeley

0:00:42.040,0:00:46.100
came back on a sabbatical in 1975 bringing UNIX
with him

0:00:46.100,0:00:47.539
In the year that he was there

0:00:47.539,0:00:51.330
he managed to get a number of graduate students interested
in UNIX

0:00:51.330,0:00:53.940
and by the time he left in 1976


0:00:53.940,0:00:56.829
Bill Joy has taken over in running the UNIX system

0:00:56.829,0:01:00.470
and in fact continuing to develop software for it.

0:01:00.470,0:01:04.339
Bill began packaging up the software that had
been developed under Berkeley UNIX and

0:01:04.339,0:01:05.779
and distributing it

0:01:05.779,0:01:08.040
as the Berkeley Software Distributions

0:01:08.040,0:01:12.310
whose name was quickly shortened to simply BSD

0:01:12.310,0:01:16.330
BSD continued to be distributed with
yearly distributions for almost fifteen

0:01:16.330,0:01:17.490
years

0:01:17.490,0:01:21.920
initially under Bill Joy and later under others including
yours truly.

0:01:21.920,0:01:24.860
By the late 1980s interest had began to grow

0:01:24.860,0:01:27.400
in freely redistributable software

0:01:27.400,0:01:30.170
so a number of us at Berkeley began separating
out

0:01:30.170,0:01:32.649
the AT&T proprietary bits of BSD 

0:01:32.649,0:01:35.710
from those parts that were freely redistributable.

0:01:35.710,0:01:40.590
By the time of the final distribution at BSD
in 1992

0:01:40.590,0:01:43.620
the entire distribution was freely redistributable.

0:01:43.620,0:01:45.909
I live in a capsule history here

0:01:45.909,0:01:48.009
but if you're interested in the entire story

0:01:48.009,0:01:50.789
I have this three-and-an-half hour epic

0:01:50.789,0:01:54.590
which is available from my website www.mckusick.com 

0:01:54.590,0:01:58.200
that gives the entire history of Berkeley.

0:01:58.200,0:02:00.239
Following the final distribution from Berkeley

0:02:00.239,0:02:01.450
two groups sprung up

0:02:01.450,0:02:03.600
to continue supporting BSD

0:02:03.600,0:02:08.080
the first of this was the NetBSD whose primary
goal was to support

0:02:08.080,0:02:10.459
as many different architectures as possible

0:02:10.459,0:02:14.769
everything from your microwave oven all way
upto your cray XMP

0:02:14.769,0:02:19.409
In fact today NetBSD supports nearly
sixty architectures.

0:02:19.409,0:02:22.419
The other group that sprang up was FreeBSD.

0:02:22.419,0:02:28.239
Their goal was to bring up BSD and support
as wide a set of devices as possible on the

0:02:28.239,0:02:29.719
PC architecture.

0:02:29.719,0:02:36.549
They also had a goal of trying to make the
 system as easy to install as possible to 

0:02:36.549,0:02:39.309
attract by a wide group of developers

0:02:39.309,0:02:42.319
I chose to work primarily with the FreeBSD
group

0:02:42.319,0:02:43.740
both doing software

0:02:43.740,0:02:46.140
and also together with George Neville Neil

0:02:46.140,0:02:51.069
writing this book ""The Design and Implementation
of the FreeBSD Operating System"".

0:02:51.069,0:02:52.060
Together with this book

0:02:52.060,0:02:53.959
I developed a course

0:02:53.959,0:02:56.500
which runs for twelve chapters

0:02:56.500,0:02:58.179
and thirty hours.

0:02:58.179,0:02:59.749
The purpose of this video

0:02:59.749,0:03:01.089
is to give you a taste

0:03:01.089,0:03:02.819
of that course.

0:03:02.819,0:03:07.249
What follows are excerpts from the first lecture
of the course

0:03:07.249,0:03:11.139
which of course you can also get from my website
www.mckusick.com.

0:03:11.139,0:03:13.069


0:03:13.069,0:03:17.739
Enjoy.

0:03:17.739,0:03:22.239
This class is nominally about FreeBSD
because well

0:03:22.239,0:03:26.379
that's what I know best and that's what
the textbook is organized around

0:03:26.379,0:03:29.979
but the fact of the matter is that it's really

0:03:29.979,0:03:32.339
a class about your UNIX and that

0:03:32.339,0:03:36.539
really covers sort of the broad range of things
in the open source arena as its FreeBSD

0:03:36.539,0:03:37.689
in Linux

0:03:37.689,0:03:38.899
which of course

0:03:38.899,0:03:41.159
you use a lot out

0:03:41.159,0:03:41.550
and

0:03:41.550,0:03:44.349
it also covers a commercial systems

0:03:44.349,0:03:46.950
%uh Solaris, HP-UX,

0:03:46.950,0:03:49.279
AIX and so on.

0:03:49.279,0:03:52.419
I am going to tend more towards the open
side

0:03:52.419,0:03:56.389
open source side of things.So it's really
going to be more FreeBSD in Linux than it's

0:03:56.389,0:03:57.579
going to be

0:03:57.579,0:04:00.849
Solaris and HP-UX and so on.

0:04:00.849,0:04:06.959
For the most part at the level of this course
we're dealing with the interfaces to the system

0:04:06.959,0:04:07.329
and

0:04:07.329,0:04:11.599
the fact that the matter is a those interfaces are highly
standardized at this point

0:04:11.599,0:04:12.060
and

0:04:12.060,0:04:15.280
whether it's FreeBSD or Linux or Solaris
or whatever

0:04:15.280,0:04:19.460
the Socket system call has to do the same
thing, it has to have the same arguments

0:04:19.460,0:04:20.150
in that,

0:04:20.150,0:04:23.909
it has to have the same effect

0:04:23.909,0:04:27.319
and so until you get down to the really nitty 
details

0:04:27.319,0:04:29.600
of how they actually go about implementing
that

0:04:29.600,0:04:31.960
the differences are relatively minor.

0:04:31.960,0:04:35.830
So I would say that sixty to seventy percent
of the material that I'm covering

0:04:35.830,0:04:40.779
is just as true for FreeBSD as it would
be for Linux

0:04:40.779,0:04:42.580
or for Solaris

0:04:42.580,0:04:44.659
%uh AIX is a little bit

0:04:44.659,0:04:45.629
sort of off in the weeds

0:04:45.629,0:04:48.709
%uh as is HP-UX

0:04:48.709,0:04:51.099
but luckily we don't have to worry too much about
that.

0:04:51.099,0:04:54.569
Okay so

0:04:54.569,0:04:59.279
the other thing is that I'm going to assume that
all of you have used the system. I get

0:04:59.279,0:05:00.910
really sort of worried when people

0:05:00.910,0:05:04.249
you know raise the hands and ""Hey, what's a Shell?""

0:05:04.249,0:05:07.990
or I don't 
put a lot of code up but a one piece of code and someone said ""Why

0:05:07.990,0:05:11.819
are there two pipe symbols in the middle of
that that If statement?"".

0:05:11.819,0:05:15.740
No we're not programming the Shell we're programming
in C.

0:05:15.740,0:05:19.970
So hopefully you can tell the difference between
Shell scripts and C code.

0:05:19.970,0:05:21.990
so okay but I am but am gonna assume

0:05:21.990,0:05:24.610
you haven't really looked inside the system.

0:05:24.610,0:05:28.289
So I gonna start everything to at a very
high level.

0:05:28.289,0:05:32.969
The problem is I have already discovered you come
from a lot of different sort of

0:05:32.969,0:05:33.819
backgrounds

0:05:33.819,0:05:35.180
and 

0:05:35.180,0:05:36.280
levels of knowledge

0:05:36.280,0:05:37.900
and so

0:05:37.900,0:05:42.620
the way that I find works best to sort of
be useful to everybody is that three pass

0:05:42.620,0:05:43.860
algorithm

0:05:43.860,0:05:49.060
so what I will do is start the first pass a very
broad brush high level

0:05:49.060,0:05:50.569
description of what's going on

0:05:50.569,0:05:54.719
and then I will go back and i'll go through the
same material again but at a lower level of

0:05:54.719,0:05:55.300
detail

0:05:55.300,0:05:59.939
then i finally go back and go through a very nittily
low-level of detail

0:05:59.939,0:06:04.649
and the fact of this is if you are learning new stuff
as I'm doing the high-level thing

0:06:04.649,0:06:08.649
you are gonna be utterly washed by the time I get to
low level niggly details

0:06:08.649,0:06:10.699
but since I'm going to do it topic by topic

0:06:10.699,0:06:14.190
when I get to the end of one of those nearly
low level niggly details

0:06:14.190,0:06:17.900
i'll give you a clue as i will say ""Brain
reset, I'm starting a new topic"" so even if

0:06:17.900,0:06:19.330
you're completely lost

0:06:19.330,0:06:23.530
you can now start listening again plus I'm gonna get
the broad brush up again.

0:06:23.530,0:06:27.059
okay and for those of you that know a lot of
this stuff already

0:06:27.059,0:06:31.770
you'll probably find the broad brush rather boring

0:06:31.770,0:06:35.759
but by the time we get down to nearly low level
details I think you'll actually

0:06:35.759,0:06:37.860
pick up some things that you will find

0:06:37.860,0:06:39.710
useful and interesting.

0:06:39.710,0:06:43.759
So in this way hopefully everybody will
get some

0:06:43.759,0:06:47.699
useful percentage of material out of the course.

0:06:47.699,0:06:49.599
I am gonna start out by just

0:06:49.599,0:06:53.089
walking through and giving you the

0:06:53.089,0:06:56.919
outline of what we're going to try and do here
here

0:06:56.919,0:07:01.169
As i said we're going to go roughly

0:07:01.169,0:07:03.270
just about two-and-an-half hours of lecture

0:07:03.270,0:07:04.729
about two hours forty minutes

0:07:04.729,0:07:06.499
per week

0:07:06.499,0:07:07.619
and

0:07:07.619,0:07:11.770
so we will start off this week with an introduction.

0:07:11.770,0:07:13.860
This is as I said we're going to start from the
top

0:07:13.860,0:07:15.749
and then  just start working our way down

0:07:15.749,0:07:19.350
so the general thing I'm going to do is
to talk about the interface

0:07:19.350,0:07:21.439
%uh which is something that you

0:07:21.439,0:07:25.319
are presumably fairly familiar with since
you've worked with that system

0:07:25.319,0:07:27.249
and then

0:07:27.249,0:07:29.739
you have to sort of layout terminology

0:07:29.739,0:07:32.080
although we use normal english words

0:07:32.080,0:07:34.419
they have

0:07:34.419,0:07:38.580
sometimes rather bizarre meanings compared to their
common usage

0:07:38.580,0:07:39.220
and

0:07:39.220,0:07:42.330
so I will just sort of lay out the terminology
lay out the

0:07:42.330,0:07:45.750
the way we talk about how the system is structured

0:07:45.750,0:07:50.780
and this week we will also talk about the
basic services ""What is it that the kernel is

0:07:50.780,0:07:52.929
providing for us?""

0:07:52.929,0:07:54.060
and then of course

0:07:54.060,0:07:58.499
we'll proceed to dive down in and and see how
that is done

0:07:58.499,0:07:59.970
so here in

0:07:59.970,0:08:01.400
Week number 2

0:08:01.400,0:08:05.450
we're gonna look at the system from the
perspective of 

0:08:05.450,0:08:07.039
something that

0:08:07.039,0:08:08.720
manages processes.

0:08:08.720,0:08:12.170
One way of looking at the kernel is it's really
just a

0:08:12.170,0:08:16.440
the resource manager and the resource that
its managing are things going to do with processes

0:08:16.440,0:08:19.460
So we'll look at a process, what the structure of
it is

0:08:19.460,0:08:20.649
and

0:08:20.649,0:08:23.559
talk about the different ways that they can
be structured.

0:08:23.559,0:08:28.379
Process can for example be an address space
and can have one thread running in it can have

0:08:28.379,0:08:29.749
multiple threads running in it.

0:08:29.749,0:08:34.620
so we'll talk about the different ways
that we think a process is.

0:08:34.620,0:08:38.480
We will look at the management of those processes


0:08:38.480,0:08:39.239
we've got

0:08:39.239,0:08:42.020
to lay out the bits and pieces that
need to be managed

0:08:42.020,0:08:44.660
and then talk about

0:08:44.660,0:08:47.190
how we do that.

0:08:47.190,0:08:51.740
we'll talk about jails.. this is something
that you currently find only in FreeBSD

0:08:51.740,0:08:55.060
hasn't made it into

0:08:55.060,0:08:56.320
Linux yet although

0:08:56.320,0:09:01.630
the concept is being actively worked
on so my guess is that you'll see that

0:09:01.630,0:09:03.500
fairly soon.

0:09:03.500,0:09:06.360
we'll also then talk about scheduling

0:09:06.360,0:09:10.579
which is in essence how we decide what gets
to run, when it gets to run, how long it gets

0:09:10.579,0:09:13.500
to run, etc.

0:09:13.500,0:09:14.330
okay

0:09:14.330,0:09:19.020
The week after that we will go into virtual
memory.

0:09:19.020,0:09:23.800
Signals aren't really part of virtual memory 
but they didn't fit into next week's

0:09:23.800,0:09:26.400
material so I just would dropped that at the
beginning

0:09:26.400,0:09:29.850
but the bulk of Week 3 is going to
be

0:09:29.850,0:09:32.019
the management of Virtual Memory. So we've got

0:09:32.019,0:09:35.119
a bunch of physical memory, a bunch of
processes that are

0:09:35.119,0:09:37.940
trying to use their address spaces

0:09:37.940,0:09:39.590
and we will talk about

0:09:39.590,0:09:41.410
essentially how you will make that all work

0:09:41.410,0:09:43.510
It's called a virtual memory because it's

0:09:43.510,0:09:47.420
sort of a cheat. We promise you the world and
then we deliver you 

0:09:47.420,0:09:51.480
as small number of pages as we think we
can get away with.

0:09:51.480,0:09:56.420
Okay. So the first three weeks then essentially
get us through

0:09:56.420,0:09:58.340
looking at the world as if it was all

0:09:58.340,0:10:00.560
all about processes.

0:10:00.560,0:10:03.880
Then in Week 4 we change gears. we say
okay well you know

0:10:03.880,0:10:07.570
the kernel isn't just all about processes. You can sort of
look at it orthogonally and you can

0:10:07.570,0:10:10.000
say it's really just a giant I/O switch

0:10:10.000,0:10:12.910
it's just like a traffic cop that's just managing
these

0:10:12.910,0:10:14.860
I/O streams

0:10:14.860,0:10:15.450
and

0:10:15.450,0:10:18.610
so let's look at it from that perspective.

0:10:18.610,0:10:19.310
And

0:10:19.310,0:10:24.740
we'll start with special files, again this
sort of the interface when you talk about UNIX

0:10:24.740,0:10:25.880
systems, when you talk about

0:10:25.880,0:10:27.950
what's normally /dev

0:10:27.950,0:10:34.170
interface that gets you access
to the various I/O streams that are available

0:10:34.170,0:10:37.220
and we'll look at how that's organized and
the structure of it

0:10:37.220,0:10:41.840
which used to be fairly simple but in the
last decade has gotten

0:10:41.840,0:10:43.670
incredibly complicated.

0:10:43.670,0:10:48.540
We will also talk about pseudo terminals in
job control

0:10:48.540,0:10:53.330
this is about as interesting as watching the
grass grow but unfortunately it's

0:10:53.330,0:10:55.490
a major component of the system

0:10:55.490,0:10:59.520
and especially people that deal with system
administration have to know far more about

0:10:59.520,0:11:06.520
this than they probably ever thought they
wanted to.

0:11:06.900,0:11:11.430
Okay we will then continue in Week 5 with
the kernel I/O structure,

0:11:11.430,0:11:16.090
We will start with multiplexing of I/O. The
kernel of course has done this

0:11:16.090,0:11:17.360
always

0:11:17.360,0:11:22.110
but we're really talking more about how do
we export I/O multiplexing to

0:11:22.110,0:11:25.970
user applications.

0:11:25.970,0:11:29.250
We will then move into auto configuration strategy

0:11:29.250,0:11:31.370
Auto configuration

0:11:31.370,0:11:32.770
is what happens

0:11:32.770,0:11:36.619
typically or historically I guess you
could say as the system boots.

0:11:36.619,0:11:39.500
so all that stuff that comes out about

0:11:39.500,0:11:40.810
what

0:11:40.810,0:11:43.550
hardwares are on the machine and how it's all
interconnected

0:11:43.550,0:11:47.350
all of that is tied up in auto configuration

0:11:47.350,0:11:50.040
and that used to happen just once it boots

0:11:50.040,0:11:52.000
but in modern systems today

0:11:52.000,0:11:55.839
it's an ongoing process. It happens at boot
but it also happens

0:11:55.839,0:12:00.550
anytime you plug a new I/O device, a
PCMCIA card,

0:12:00.550,0:12:03.680
or you remove a disk or you put in a new disk.

0:12:03.680,0:12:07.010
or any sort of activity that changes the I/O

0:12:07.010,0:12:08.360
structure of the machine

0:12:08.360,0:12:10.870
auto configuration has to get fired back up

0:12:10.870,0:12:13.050
and figure out what's disappeared

0:12:13.050,0:12:18.330
and cleanup and figure out what new has arrived
to configure it in.

0:12:18.330,0:12:19.320
and then we'll talk

0:12:19.320,0:12:23.870
a little bit about the configuration of the
device driver

0:12:23.870,0:12:27.390
this actually gets into an area that 

0:12:27.390,0:12:28.660
is

0:12:28.660,0:12:33.440
one well let me just give it as a bit
of advice to the class esspecially those of

0:12:33.440,0:12:36.780
you who work in system administration.

0:12:36.780,0:12:42.010
You really want to be careful that
you don't learn too much about device drivers

0:12:42.010,0:12:44.670
because there is really these three things that 

0:12:44.670,0:12:48.580
it's not good to learn about and if you do
learn about it it's really good to keep it

0:12:48.580,0:12:49.740
to yourself

0:12:49.740,0:12:51.949
because if you become an expert or

0:12:51.949,0:12:54.960
viewed as an expert in any of these areas

0:12:54.960,0:12:59.370
you will become the designated stuccy for
that and your site you'll never get to do

0:12:59.370,0:13:01.760
anything

0:13:01.760,0:13:02.610
but that

0:13:02.610,0:13:07.360
so The three things that I highly
recommend not learning very much about are

0:13:07.360,0:13:09.060
device drivers,

0:13:09.060,0:13:12.320
send mail configuration files

0:13:12.320,0:13:13.970
or anything having to do

0:13:13.970,0:13:19.350
with LDAP or anything in
that general domain

0:13:19.350,0:13:22.660
because as I say 

0:13:22.660,0:13:24.900
that will become your life's work

0:13:24.900,0:13:25.920
and

0:13:25.920,0:13:32.920
there's other things that you might find more interesting.
""Do you have a question?""

0:13:33.870,0:13:36.659
so one of my students empathizes with my point

0:13:36.659,0:13:39.640
I believe you said you worked on that mail
system

0:13:39.640,0:13:43.120
so you you might know something about
Sendmail configuration files but you don't

0:13:43.120,0:13:47.850
have to answer that

0:13:47.850,0:13:52.100
okay so we're going to talk about what a device
driver does and really just sort of the entry

0:13:52.100,0:13:53.170
points to it

0:13:53.170,0:13:57.180
but we're not going to talk about how you
write such a thing, how you debug such a thing

0:13:57.180,0:14:01.490
or much of anything about it. I actually used
to teach an entire class believe it or not

0:14:01.490,0:14:02.720
about device drivers

0:14:02.720,0:14:05.849
but then I realized the error of my ways and I have
since

0:14:05.849,0:14:12.580
 gone through and made a point of forgetting
every slide in that talk.

0:14:12.580,0:14:16.860
okay so then we will move on to File system

0:14:16.860,0:14:21.540
and as always we'll start at the high level
talk about the interface what is it that is

0:14:21.540,0:14:23.020
exported out of the system

0:14:23.020,0:14:27.840
and then we will start diving down in the C and
how do we go about implementing that

0:14:27.840,0:14:29.010
so

0:14:29.010,0:14:31.010
we'll start with the

0:14:31.010,0:14:32.560
so called 

0:14:32.560,0:14:33.680
Block I/O system

0:14:33.680,0:14:36.140
it's historically been called buffer
cache

0:14:36.140,0:14:38.590
and you still hear it called that periodically

0:14:38.590,0:14:42.720
and the fact of the matter is that there isn't really
about buffer cache anymore, there is just one big

0:14:42.720,0:14:44.620
cache in it.Its the VM cache

0:14:44.620,0:14:47.810
and the Filesystem has a view into it
and

0:14:47.810,0:14:50.829
the processes have a view into it but at
the end of the day

0:14:50.829,0:14:54.660
you really don't want the same information
on two different

0:14:54.660,0:14:56.030
pages of memory

0:14:56.030,0:14:59.390
because that just leads to trouble.

0:14:59.390,0:15:03.390
But Filesystems think they have buffers and so
there's this manouver where we make

0:15:03.390,0:15:06.149
these things that look like what historically
were buffers

0:15:06.149,0:15:08.830
that really just map into VM system

0:15:08.830,0:15:11.720
but they're still managed in the way that
they have been

0:15:11.720,0:15:15.020
managed historically

0:15:15.020,0:15:20.670
okay We will then get down into Filesystem implementation
the local file system if you will

0:15:20.670,0:15:23.400
and into also

0:15:23.400,0:15:25.730
soft updates and snapshots.

0:15:25.730,0:15:26.440
 this

0:15:26.440,0:15:31.100
for the time being is something that you see
only in FreeBSD

0:15:31.100,0:15:35.310
the alternative to soft updates is journalling
which is %uh more commonly used 

0:15:35.310,0:15:39.630
for example what is used by ext3

0:15:39.630,0:15:41.179
and so i'll go through soft updates and

0:15:41.179,0:15:45.260
a lot of the issues in soft updates are the
same issues that you have to deal with journalling

0:15:45.260,0:15:48.370
what is it that we're protecting and how do we
go about doing that

0:15:48.370,0:15:51.150
and the difference is in the detail.

0:15:51.150,0:15:54.630
There is actually a paper in the back to your
notes if this is something that interests

0:15:54.630,0:15:55.240
you

0:15:55.240,0:15:59.930
it's a comparison of journalling versus 
soft updates that was done

0:15:59.930,0:16:02.120
about five or eight years ago.

0:16:02.120,0:16:08.460
and not to spoil the punch line but the answers
they both work about are the same

0:16:08.460,0:16:12.500
Okay snapshots again is something that
if

0:16:12.500,0:16:15.920
you've worked with things like the network
appilance box you're probably quite

0:16:15.920,0:16:19.640
aware of what snapshots are and how they do
or don't work for you

0:16:19.640,0:16:21.959
this is the same functionality

0:16:21.959,0:16:27.380
in the Filesystem implemented in a
somewhat different way

0:16:27.380,0:16:28.449
okay so this

0:16:28.449,0:16:31.940
Week 6 is really going to be the local
file system

0:16:31.940,0:16:34.750
the disk connected to the machine
that we are dealing with.

0:16:34.750,0:16:39.140
Week 7 then we get into multiple 
Filesystem support so how do we abstract out that

0:16:39.140,0:16:41.190
Filesystem layer

0:16:41.190,0:16:46.430
and support Multiple Filesystems at the
same time so for example in FreeBSD

0:16:46.430,0:16:50.199
you can of course run with their traditional
fast Filesystem

0:16:50.199,0:16:54.540
but if you happen to like the Linux Filesystem 
better or you have to share a disk

0:16:54.540,0:16:55.690
with a Linux machine

0:16:55.690,0:16:58.310
you can run the ext2 or ext3

0:16:58.310,0:17:01.020
and it will perfectly happily do that

0:17:01.020,0:17:01.620
so

0:17:01.620,0:17:05.589
we will have to look then at how do we provide
interface so that we can plug in all these different

0:17:05.589,0:17:09.260
Filesystems that we want to support

0:17:09.260,0:17:12.250
another area of which there's been a great

0:17:12.250,0:17:15.309
deal of growth at least in code complexity


0:17:15.309,0:17:17.840
is so-called Volume Management

0:17:17.840,0:17:19.370
so in the

0:17:19.370,0:17:24.480
good old days a Filesystem lived on a disk or
piece of disk and that was that

0:17:24.480,0:17:26.130
but in this day and age 

0:17:26.130,0:17:31.150
that won't do any more so we aggregate disks
together by striping them or RAID 

0:17:31.150,0:17:31.980
arraying them

0:17:31.980,0:17:33.380
or various other things

0:17:33.380,0:17:39.210
and we need a whole layer in the system just to
manage those disks

0:17:39.210,0:17:44.280
we'll then get to the as an example of an alternative
Filesystem we're going to talk about the

0:17:44.280,0:17:46.530
Network Filesystem or NFS

0:17:46.530,0:17:48.500
but that's not because this is

0:17:48.500,0:17:51.090
the world's best remote file system

0:17:51.090,0:17:55.240
or the cleanest design or any of the
properties you might hope that

0:17:55.240,0:17:57.049
such a class as this one would have

0:17:57.049,0:17:58.600
but it's ubiquitous

0:17:58.600,0:18:00.210
very widely used

0:18:00.210,0:18:01.350
and

0:18:01.350,0:18:06.850
so we're going to talk about that one

0:18:06.850,0:18:07.740
okay we'll

0:18:07.740,0:18:10.970
then once again switch gears in Week 8

0:18:10.970,0:18:17.120
and turn our attention to of Networking and
Interprocess communication

0:18:17.120,0:18:18.200
and

0:18:18.200,0:18:23.210
again we'll start from the very top so we'll
go through, we'll go with concepts, the terminology

0:18:23.210,0:18:24.450
that gets used

0:18:24.450,0:18:30.230
and what's the difference between domain
based addressing and an address domain you know

0:18:30.230,0:18:30.910
we'll go through

0:18:30.910,0:18:34.910
 what the basic IPC services are,

0:18:34.910,0:18:39.080
essentially what are all the system calls that
have anything to do with networking

0:18:39.080,0:18:40.590
and

0:18:40.590,0:18:43.720
just sort of describe what each of them are
and I'm going to go through

0:18:43.720,0:18:45.830
a somewhat contrived example

0:18:45.830,0:18:49.840
that makes use of every one of those interfaces

0:18:49.840,0:18:52.860
and just to sort of show how they all connect
together

0:18:52.860,0:18:54.169
and for those of you that work

0:18:54.169,0:18:57.400
in networking or had done any kind of network
programming

0:18:57.400,0:19:00.480
if you're looking for a week to miss and the
Week 8 is the one to miss that's 'cause that's

0:19:00.480,0:19:02.780
the sort of most basic

0:19:02.780,0:19:04.210
lecture that I'm going to give

0:19:04.210,0:19:07.910
If you are not sure whether or not you need to
go through that, there is

0:19:07.910,0:19:09.540
one of the papers in the back

0:19:09.540,0:19:12.620
it is an introduction to Interprocess communication

0:19:12.620,0:19:18.279
read that paper if you say yeah yeah yeah
yeah yeah you are done with Week 8.

0:19:18.279,0:19:20.590
on the other hand if you dont come to Week
8

0:19:20.590,0:19:22.790
and then in Week 9 I say

0:19:22.790,0:19:26.860
I call on you and say alright what is it

0:19:26.860,0:19:30.560
that listen system call does and you
can't tell me

0:19:30.560,0:19:32.610
you're gonna get a demerit

0:19:32.610,0:19:34.340
okay

0:19:34.340,0:19:37.770
then in Week 9 we will get into the actual 

0:19:37.770,0:19:41.419
networking implementation itself, we go
through system layers as we did

0:19:41.419,0:19:43.310
in all the other areas

0:19:43.310,0:19:44.130
and

0:19:44.130,0:19:48.330
we will spend a significant portion of that
class talking about routing

0:19:48.330,0:19:50.230
routing

0:19:50.230,0:19:53.610
for those of you that haven't had the pleasure
of dealing with it

0:19:53.610,0:19:55.540
is a black art

0:19:55.540,0:19:58.050
or at least a dark science

0:19:58.050,0:19:59.170
and

0:19:59.170,0:19:59.930
so

0:19:59.930,0:20:02.490
we'll talk about it

0:20:02.490,0:20:06.270
from the perspective first of all of what
do we do locally within the machine

0:20:06.270,0:20:10.090
and then what are some of the bigger strategies
that we can use for doing routing

0:20:10.090,0:20:11.910
enterprise 

0:20:11.910,0:20:14.840
wide routing or

0:20:14.840,0:20:20.190
area wide routing something like throughout the
state of California or throughout the US whatever

0:20:20.190,0:20:25.379
this again like device drivers is really
just sort of a nickel 

0:20:25.379,0:20:26.480
tour through the 

0:20:27.800,0:20:31.820
what the choices are what that the basic
strategies are that are used

0:20:31.820,0:20:33.989
If you're thinking you're going to walk out
of here

0:20:33.989,0:20:36.110
knowing how to set up a routing well sorry

0:20:36.110,0:20:38.430
we are not going to get that far

0:20:38.430,0:20:41.559
but you should at least have a pretty good idea
of what the issues are

0:20:41.559,0:20:44.430
and what the general solutions are

0:20:44.430,0:20:48.950
okay then finally in Week 10 well not finally
but next few weeks and

0:20:48.950,0:20:52.380
we will go through the Internet Protocols

0:20:52.380,0:20:54.320
primarily TCP/IP

0:20:54.320,0:20:56.560
and this is

0:20:56.560,0:20:58.809
what are the algorithms that are used

0:20:58.809,0:21:01.030
and I'm putting a particular emphasis

0:21:01.030,0:21:03.050
for this particular class

0:21:03.050,0:21:05.080
on

0:21:05.080,0:21:07.730
changes that have been made in the protocols

0:21:07.730,0:21:14.310
to deal with a lot of the sort of attacks that
we've been seeing the SYN attacks and

0:21:14.310,0:21:16.880
that sort of thing

0:21:16.880,0:21:19.440
rather than just a straight

0:21:19.440,0:21:22.440
iteration of what the the actual protocols
are

0:21:22.440,0:21:24.940
i'll talk primarily about IPv4

0:21:24.940,0:21:31.940
but I will also try and talk a bit about
IPv6 as well

0:21:33.510,0:21:35.850
all right so the first ten weeks are

0:21:35.850,0:21:38.100
sort of the kernel course

0:21:38.100,0:21:40.800
now we attack two weeks at the end

0:21:40.800,0:21:42.010
to talk about

0:21:42.010,0:21:43.990
sort of the bigger picture of

0:21:43.990,0:21:48.240
System Tuning,Crash dump analysis that level of
thing

0:21:48.240,0:21:52.940
The idea is to really consolidate what
we figured out or talked about in the first

0:21:52.940,0:21:54.710
ten weeks and

0:21:54.710,0:21:58.760
how that applies to tools that we have available
to us to

0:21:58.760,0:22:00.760
look at what the system is doing,

0:22:00.760,0:22:02.649
 analyze what the system is doing

0:22:02.649,0:22:03.650
and hopefully

0:22:03.650,0:22:04.720
improve

0:22:04.720,0:22:07.130
the performance of what the system is doing

0:22:07.130,0:22:07.750
and

0:22:07.750,0:22:12.169
for the most part the kind of tuning that I'm
talking about is not

0:22:12.169,0:22:14.740
going in and hack hack hacking your kernel 

0:22:14.740,0:22:16.510
because the fact that the matter is

0:22:16.510,0:22:18.600
most of the time you can't do that anyway

0:22:18.600,0:22:22.340
so it's more looking at it from the perspective
of saying

0:22:22.340,0:22:26.390
is this system running badly because it doesn't
have enough memory on it?

0:22:26.390,0:22:29.470
or is it running badly because there isn't enough
I/O capacity?

0:22:29.470,0:22:33.549
or is it running badly because it's got
enough I/O capacity but

0:22:33.549,0:22:35.940
certain drives are being overloaded

0:22:35.940,0:22:37.309
or is it

0:22:37.309,0:22:42.220
being overrun because we're simply trying
to do too much on this machine?,etc.

0:22:42.220,0:22:45.440
so that's the sort of level of thing that we're
looking at it

0:22:45.440,0:22:47.080
but tied into

0:22:47.080,0:22:52.130
lot of concepts that we talked before so we can talk
about active virtual memory

0:22:52.130,0:22:53.710
and what that means

0:22:53.710,0:22:55.120
and

0:22:55.120,0:22:58.750
essentially measure what it is and hopefully
then you will understand in the context of what

0:22:58.750,0:23:00.690
we talked about in the VM section

0:23:00.690,0:23:03.990
what that really means

0:23:03.990,0:23:07.460
the Crash dump analysis is one of these
topics that

0:23:07.460,0:23:08.730
you are gonna love or hate

0:23:08.730,0:23:12.530
you actually have to deal with crashed
dumps

0:23:12.530,0:23:13.679
its people find it invaluable

0:23:13.679,0:23:15.580
and if you don't have to deal with Crash dumps

0:23:15.580,0:23:18.790
it's an incredible mass of boring detail

0:23:18.790,0:23:23.240
the only good part of it is that that's the
whole session is only about an hour long

0:23:23.240,0:23:25.529
If it interests you, listen closely

0:23:25.529,0:23:28.950
and if it bores you, well, its only an hour long

0:23:28.950,0:23:32.880
okay lastly we'll talk a little bit about
security issues

0:23:32.880,0:23:36.250
again this is really more to the tools that
are available

0:23:36.250,0:23:40.750
to deal with security staff as opposed to a
complete tutorial on

0:23:40.750,0:23:45.120
how to implement security so those of you
that deal with security

0:23:45.120,0:23:48.400
this is just gonna to be sort of security one oh
one

0:23:48.400,0:23:50.029
for those of you

0:23:50.029,0:23:51.500
that have but

0:23:51.500,0:23:54.399
you'll have to deal with it but haven't really
thought about it

0:23:54.399,0:23:58.549
it'll probably scare you to death and
you wonder how to keep the machines from

0:23:58.549,0:24:02.840
being hijacked everyday

0:24:02.840,0:24:08.030
Okay so that's in essence what we're going
to try and do here

0:24:08.030,0:24:15.030
anybody have any comments, questions, thoughts.
No? All right well.

0:24:16.130,0:24:17.840
Let's get started

0:24:17.840,0:24:22.180
we will be begin on page fifteen with an
overview of the kernel.

0:24:22.180,0:24:26.040
Hopefully nobody's lost yet.

0:24:26.040,0:24:29.310
What's a kernel? All right.

0:24:29.310,0:24:31.370
so starting at the very top

0:24:31.370,0:24:33.070
the big broad brush

0:24:33.070,0:24:35.140
what we have is

0:24:35.140,0:24:38.330
a UNIX virtual machine and

0:24:38.330,0:24:41.660
 virtual machines are actually something
that has been around

0:24:41.660,0:24:44.539
as a concept since the sixties

0:24:44.539,0:24:48.919
 difference is really just sort of the level
of the interface that people have dealt with

0:24:48.919,0:24:51.360
when they talk about Virtual Machines

0:24:51.360,0:24:53.610
in the 1960s

0:24:53.610,0:24:56.770
computers were these enormous things you would
have

0:24:56.770,0:24:58.870
your computer room would be something that'd be

0:24:58.870,0:25:01.909
three times the size of this conference
room if you had

0:25:01.909,0:25:03.230
a computer

0:25:03.230,0:25:05.530
the computer itself was

0:25:05.530,0:25:07.840
tall as a refrigerator freezer

0:25:07.840,0:25:08.950
imagine

0:25:08.950,0:25:13.909
five or eight or ten of these units
side by side that itself made up the computer

0:25:13.909,0:25:16.080
that would be one big 

0:25:16.080,0:25:20.030
for the core processor and the one which
should be the floating point unit and several

0:25:20.030,0:25:24.080
of them that would be the memory the core momory
literally the core memory

0:25:24.080,0:25:29.110
and then they'd be other rows of these
disk drives which were about the size of the washing

0:25:29.110,0:25:29.660
machine

0:25:29.660,0:25:34.169
and then behind that since you couldn't store
everything on disks so

0:25:34.169,0:25:36.300
then you had rows of tape drives

0:25:36.300,0:25:37.880
and then you had this little

0:25:37.880,0:25:39.610
set of sort of

0:25:39.610,0:25:43.330
munchkins that would run around and and tend
to the machine and they'd mount tapes and take

0:25:43.330,0:25:46.710
off tapes and mount disc packs and remove disc packs 
because 

0:25:46.710,0:25:49.760
the drives themselves were very expensive and
so

0:25:49.760,0:25:53.110
you wouldn't just as today we have a 


0:25:53.110,0:25:56.090
one spindle that was dedicated just to one set
of platters

0:25:56.090,0:25:57.130
you could take out a 

0:25:57.130,0:25:59.460
set of platters and put in another

0:25:59.460,0:26:02.540
hundred megabytes set of platters and these are
platters that are

0:26:02.540,0:26:05.280
this big around and it's like six or eight
of them and

0:26:05.280,0:26:09.140
 giant head assemblies they comes rumbling in and
out

0:26:09.140,0:26:12.440
anyway one of these giant giant machines

0:26:12.440,0:26:17.380
that costs many millions of dollars would run 
at about ten

0:26:17.380,0:26:21.120
million instructions per second, 10 mips

0:26:21.120,0:26:21.630
and 10 mips

0:26:21.630,0:26:28.330
 was more computing power than anybody
could possibly imagine using in a single application

0:26:28.330,0:26:28.880
just

0:26:28.880,0:26:31.050
by contrast you know this

0:26:31.050,0:26:34.070
four-year-old laptop here is probably on
the order of

0:26:34.070,0:26:36.440
one or two hundred mips

0:26:36.440,0:26:37.140
but anyway

0:26:37.140,0:26:40.760
people couldn't really view what we would
do with a lot of computing power

0:26:40.760,0:26:44.640
and the other thing was that you didn't have
a notion of sort of an operating system that had

0:26:44.640,0:26:45.890
applications running on it

0:26:45.890,0:26:46.760
because

0:26:46.760,0:26:50.160
everybody wanted to write straight to
the raw hardware

0:26:50.160,0:26:51.750
and so

0:26:51.750,0:26:55.900
what IBM who was a big manufacturer
of machines in those days

0:26:55.900,0:26:59.060
did what they came up with this thing called
the VM

0:26:59.060,0:27:00.770
and this was a little

0:27:00.770,0:27:02.549
you'd call an operating system really

0:27:02.549,0:27:05.130
but what it did is it cloned

0:27:05.130,0:27:09.270
independent copies of the machine that worked just
like the original machines so you could boot

0:27:09.270,0:27:11.769
something that you thought it was an operating
system

0:27:11.769,0:27:13.380
on top of VM

0:27:13.380,0:27:16.750
so you take one least ten mip machines and
it would clone

0:27:16.750,0:27:20.050
six identical one mip copies

0:27:20.050,0:27:22.030
and then you could boot

0:27:22.030,0:27:24.700
whatever you wanted on each one of those machines
so

0:27:24.700,0:27:29.510
if you were doing database stuff you would boot your
database because database cannot ran on the raw hardware

0:27:29.510,0:27:32.920
or if you're doing payroll who would boot up the payroll
program

0:27:32.920,0:27:37.950
or if you actually tried to service 
users you could boot a time sharing batch thing

0:27:37.950,0:27:40.790
that would read card images and print
stuff out

0:27:40.790,0:27:44.460
or they even had TSO the Time Sharing
Option where you could interactively sit

0:27:44.460,0:27:45.559
and type and send

0:27:45.559,0:27:47.560
stuffs in and get answers back

0:27:47.560,0:27:48.570
 and

0:27:48.570,0:27:51.429
also you could boot TSO so whatever set
of

0:27:51.429,0:27:52.219


0:27:52.219,0:27:55.339
things you need you could boot them and they ran
independently as if they were running on their

0:27:55.339,0:27:56.470
own machine

0:27:56.470,0:28:03.150
but all the VM did was it give you an exact
raw copy of the hardware

0:28:03.150,0:28:04.529
so when UNIX came along

0:28:04.529,0:28:07.350
they sort of liked the notion of

0:28:07.350,0:28:11.509
providing the concept of independent
things that you could operate in

0:28:11.509,0:28:13.610
but they wanted it at a higher level

0:28:13.610,0:28:15.610
so you're looking really to do it

0:28:15.610,0:28:17.480
instead of at the raw hardware level

0:28:17.480,0:28:19.679
to do it at a process level

0:28:19.679,0:28:23.799
and the idea that then was that the interface you
would program to would be what we think of as

0:28:23.799,0:28:26.090
a System call interface today

0:28:26.090,0:28:27.849
and the idea then was that

0:28:27.849,0:28:30.740
you would be given a process or set of processes

0:28:30.740,0:28:34.990
and those were independent. your process
couldn't affect

0:28:34.990,0:28:38.830
the address space of another processor. You couldn't reach
over and mess around with their addresses,

0:28:38.830,0:28:41.030
you couldn't mess around with their I/O
channels

0:28:41.030,0:28:43.179
you could slow them down by

0:28:43.179,0:28:44.299
being a pig but

0:28:44.299,0:28:47.980
that was about the only way that you could affect
other processes

0:28:47.980,0:28:48.480
and

0:28:48.480,0:28:49.830
so

0:28:49.830,0:28:52.669
what the interfaces that they had there

0:28:52.669,0:28:58.660
was one that had these characteristics
 had a a paged virtual address space

0:28:58.660,0:29:02.980
so you din't have to know as in the old days how much physical
memory is on the machine and make your application

0:29:02.980,0:29:04.740
fit into that amount of memory

0:29:04.740,0:29:07.950
you just had what looked like a large

0:29:07.950,0:29:11.710
uniform address space even if the underlying
hardware had segments or some other

0:29:11.710,0:29:13.580
hardware brain damage

0:29:13.580,0:29:17.390
it looked to you like he just had a big uniform
address space and 

0:29:17.390,0:29:21.070
the size of your address space was independent
of the amount of memory that was on your machine

0:29:21.070,0:29:23.900
your address space couldn't be bigger than amount of
physical memory

0:29:23.900,0:29:26.499
cause we sort of move pages around underneath

0:29:26.499,0:29:29.320
whatever part address space was actually
active

0:29:29.320,0:29:34.260
and there's obviously limits to this if
you are trying to run a 1 gigabyte of

0:29:34.260,0:29:35.630
application on top of

0:29:35.630,0:29:37.240
ten megabytes of memory

0:29:37.240,0:29:40.880
it's probably going to bring new meaning to
same day service

0:29:40.880,0:29:45.519
but if you're willing to wait long enough it
will eventually move the pages around and you will

0:29:45.519,0:29:49.740
progress through getting your application run

0:29:49.740,0:29:53.890
another thing was dealing with software
interrupts

0:29:53.890,0:29:55.789
in the old days

0:29:55.789,0:29:58.749
you had to understand how the hardware worked

0:29:58.749,0:30:03.900
in order to deal with exceptional conditions
so for example if you did a divide by zero

0:30:03.900,0:30:08.170
the hardware would jump through some
vector location or

0:30:08.170,0:30:08.630
something

0:30:08.630,0:30:12.799
and you had  know how that worked and make
sure that you had your program

0:30:12.799,0:30:16.510
usually some little bit of assembly language 
set up to deal with that

0:30:16.510,0:30:19.870
and UNIX said let's let's get away
from the hardware here

0:30:19.870,0:30:22.080
and so they did this thing called signals

0:30:22.080,0:30:25.700
and so they just define a set of the signals is that
if you do divide by zero

0:30:25.700,0:30:29.529
you simply register a routine you
want to have called you don't have to know

0:30:29.529,0:30:31.220
how the hardware figured it out

0:30:31.220,0:30:36.740
you just know that that routine is going to get
called and you can deal with it at that point

0:30:36.740,0:30:40.960
well we got set of timers and counters to keep
track of what we're doing, this is really more

0:30:40.960,0:30:43.490
for counting than anything else but

0:30:43.490,0:30:46.970
applications may want to have access to that.

0:30:46.970,0:30:51.720
we have a set of identifiers that we're
going to use for things like accounting,

0:30:51.720,0:30:54.830
protection and scheduling and so on

0:30:54.830,0:30:55.820
and one of the

0:30:55.820,0:31:00.320
the early philosophies of UNIX was to try
and keep it simple.

0:31:00.320,0:31:02.630
operating systems have gotten very baroque

0:31:02.630,0:31:04.490
in particular the thing that

0:31:04.490,0:31:07.350
pre dated UNIX was a thing called
Multix

0:31:07.350,0:31:12.820
Multix was was a joint project between
Honeywell, a big computer manufacturer of the

0:31:12.820,0:31:15.740
time 

0:31:15.740,0:31:17.129
AT&T bell laboratories

0:31:17.129,0:31:19.750
the big industrial labratory at that time

0:31:19.750,0:31:21.380
and MIT

0:31:21.380,0:31:23.430
a big university then and

0:31:23.430,0:31:24.690
still today

0:31:24.690,0:31:29.259
and those three organizations got
together to try and build this 

0:31:29.259,0:31:31.400
time sharing operating system

0:31:31.400,0:31:32.280
and it

0:31:32.280,0:31:33.770
it just got bigger and

0:31:33.770,0:31:37.160
more grandiose and more complex and never
finished

0:31:37.160,0:31:38.979
because as soon as they sort of see

0:31:38.979,0:31:42.709
oh we know how to do that but we could
do this other thing too and so then they would tear it

0:31:42.709,0:31:43.429
apart and

0:31:43.429,0:31:46.440
they never really got to something that

0:31:46.440,0:31:48.210
could be put into production

0:31:48.210,0:31:49.919
and so the

0:31:49.919,0:31:50.570
AT&T

0:31:50.570,0:31:54.340
Bell laboratories decided to pull out of
that project

0:31:54.340,0:31:55.940
and

0:31:55.940,0:32:00.000
the two of the people that had been working on
that project, Ken Thompson and Dennis Richie

0:32:00.000,0:32:04.390
were sort of bummed because they were now
back to typing cards and putting them through

0:32:04.390,0:32:05.259
card readers and

0:32:05.259,0:32:07.960
they had gotten used to the idea that you could
actually

0:32:07.960,0:32:11.559
sit at an ASSR33 teletype and interact
with your computer

0:32:11.559,0:32:13.440
and so

0:32:13.440,0:32:18.230
they found an old %uh PDP-8 sitting off in
the corner that had been abandoned

0:32:18.230,0:32:22.120
and started working on this little tiny operating
system which they called UNIX

0:32:22.120,0:32:26.549
which eventually moved to the PDP-11 and
became what we have today

0:32:26.549,0:32:28.050
but because it was

0:32:28.050,0:32:32.120
they were coming first of all from Multix
where everything had been done and 

0:32:32.120,0:32:34.110
in great grandiose detail

0:32:34.110,0:32:37.549
and because they're fundamentally were two
 of them working on it and they wanted to get something

0:32:37.549,0:32:38.370
done and

0:32:38.370,0:32:40.130
within a year or so

0:32:40.130,0:32:41.529
one of their philosophies was

0:32:41.529,0:32:44.099
let's find the one way of doing things

0:32:44.099,0:32:48.180
let's not have eight ways from Sunday let's just
get the one way

0:32:48.180,0:32:53.860
and that's what we will provide. So what is
the sort of core set of things that we need.

0:32:53.860,0:32:58.620
well first thing is when it comes to identifiers,
let's not have you know

0:32:58.620,0:33:00.430
eighty thousand different identifiers

0:33:00.430,0:33:03.140
so they came up with process identifiers,

0:33:03.140,0:33:09.620
user identifier and at that time a single group
identifier and later expanded

0:33:09.620,0:33:14.200
and they used that sort of identifiers for everything
so its used for counting, used for making

0:33:14.200,0:33:17.410
protection decisions, used for scheduling
decisions

0:33:17.410,0:33:19.470
and

0:33:19.470,0:33:24.279
again it was the simplicity of thing which
was what was driving their decision

0:33:24.279,0:33:28.840
but they're really sort of two key ideas 
that they had

0:33:28.840,0:33:30.880
that really made the difference that

0:33:30.880,0:33:32.539
that's what set them up side

0:33:32.539,0:33:34.749
from what everybody else had done before them

0:33:34.749,0:33:35.450
and which

0:33:35.450,0:33:39.740
in retrospect is something that has been pervasive
more or less ever since

0:33:39.740,0:33:41.869
the first of these was the notion

0:33:41.869,0:33:44.840
that we have a unique descriptor space

0:33:44.840,0:33:46.289
that is

0:33:46.289,0:33:51.250
given a descriptor it can reference
any I/O device

0:33:51.250,0:33:53.650
so or even any kind of I/O channel

0:33:53.650,0:33:58.270
so you can have a descriptor for terminal
or descriptor for a file or descriptive for

0:33:58.270,0:34:02.240
a disk or descriptor for a pipe or descriptor
for a socket

0:34:02.240,0:34:03.500
and

0:34:03.500,0:34:04.790
you don't need to know

0:34:04.790,0:34:07.940
what it references in order to be able to read
and write that thing

0:34:07.940,0:34:11.290
so if i hand you a descriptor 
you can read from that the descriptor or you can write

0:34:11.290,0:34:13.259
to that descriptor

0:34:13.259,0:34:15.189
and

0:34:15.189,0:34:17.359
the correct thing will happen

0:34:17.359,0:34:19.089
and you'd say well

0:34:19.089,0:34:23.629
that's so obvious I mean how else could you
possibly think of doing it?

0:34:23.629,0:34:25.179
well predating UNIX

0:34:25.179,0:34:28.059
everything was done with

0:34:28.059,0:34:29.379
a little subsystem

0:34:29.379,0:34:33.419
that would open a file, read a file, write a
file, close a file

0:34:33.419,0:34:37.429
and there was another set of system calls which
would open a terminal,read a terminal, write terminal,

0:34:37.429,0:34:38.089
close terminal

0:34:38.089,0:34:39.210
and yet another one

0:34:39.210,0:34:42.409
which was create a pipe,read a pipe,
write a pipe and so on.

0:34:42.409,0:34:47.699
so if you are just a drop dead stupid
program like say CAD

0:34:47.699,0:34:51.579
you would have to have code in there and say was
my input a terminal which in case I need to

0:34:51.579,0:34:53.159
use the read terminal

0:34:53.159,0:34:57.419
or is it a file which in case i need
to use read file or is it a pipe in which in case

0:34:57.419,0:34:59.189
i need to use read pipe

0:34:59.189,0:35:01.860
and so the program itself had to have all
this

0:35:01.860,0:35:02.859
coding in it

0:35:02.859,0:35:04.409
whereas when they went to

0:35:04.409,0:35:07.159
the uniform descriptor space

0:35:07.159,0:35:09.630
CAD doesn't know it doesn't need to know
it just says

0:35:09.630,0:35:10.819
read my input,

0:35:10.819,0:35:13.979
write the output

0:35:13.979,0:35:17.059
and it works and we add a new type of descriptor

0:35:17.059,0:35:17.600
and

0:35:17.600,0:35:21.700
CAD just continues to work just as it always
did.

0:35:21.700,0:35:24.199
So this proved to be a very powerful construct

0:35:24.199,0:35:27.019
and pretty much every operating system after
UNIX

0:35:27.019,0:35:28.659
did that there's

0:35:28.659,0:35:30.210
one exception of %uh

0:35:30.210,0:35:32.549
large company in the Pacific North-West

0:35:32.549,0:35:35.830
that still has not quite uniform descriptor
space

0:35:35.830,0:35:38.380
but %uh that's part of their legacy that really

0:35:38.380,0:35:39.900
they're working on that.

0:35:39.900,0:35:42.009
Longhorn will be here.

0:35:42.009,0:35:43.939
and anyway 

0:35:43.939,0:35:46.190
this set of facilities then

0:35:46.190,0:35:50.150
makes up the UNIX virtual machine

0:35:50.150,0:35:51.559
and

0:35:51.559,0:35:55.559
in some sense we still see virtual machines
being used today in fact we're seeing sort

0:35:55.559,0:35:56.749
of a reversion

0:35:56.749,0:36:01.429
back to some of the IBM stuff in things
like the VMware

0:36:01.429,0:36:03.079
which is

0:36:03.079,0:36:07.029
essentially allow you to go back to booting
native operating systems again so sort of

0:36:07.029,0:36:08.280
interesting to watch

0:36:08.280,0:36:09.060
that the sort of

0:36:09.060,0:36:12.919
pendulum of back going back and forth
of what's the correct layer

0:36:12.919,0:36:14.609
for for doing

0:36:14.609,0:36:18.890
virtual machines

0:36:18.890,0:36:22.499
Okay? so far so good?

0:36:22.499,0:36:24.719
all right so i said that there were 

0:36:24.719,0:36:27.160
two key ideas that UNIX had

0:36:27.160,0:36:30.279
the first of these being the uniform descriptor
space

0:36:30.279,0:36:35.819
the second one which was really critical was
this notion of processes as a commodity 

0:36:35.819,0:36:37.309
item

0:36:37.309,0:36:40.220
so here on Page 17 I've tried to lay
it out

0:36:40.220,0:36:41.090
the

0:36:41.090,0:36:44.159
that the components that make up a process

0:36:44.159,0:36:45.759
and

0:36:45.759,0:36:50.359
what do I really mean when I say a process as
a commodity item

0:36:50.359,0:36:53.650
okay leading up to

0:36:53.650,0:36:54.689
UNIX

0:36:54.689,0:36:56.800
the systems that pre-dated it,

0:36:56.800,0:36:59.200
processes were these very large

0:36:59.200,0:37:02.169
heavyweight expensive things

0:37:02.169,0:37:02.779
and

0:37:02.779,0:37:04.539
if you look at

0:37:04.539,0:37:08.629
MVS which was the operating system
that ran on IBM for doing multiple processing

0:37:08.629,0:37:10.509
and

0:37:10.509,0:37:13.799
the system administrator would decide at boot
time

0:37:13.799,0:37:17.019
what degree of multiprocessing they wish
to support

0:37:17.019,0:37:18.140
so they'd say well

0:37:18.140,0:37:20.739
well, we'll let upto six things happen at once

0:37:20.739,0:37:22.490
and so as part of booting up

0:37:22.490,0:37:24.419
they would create six

0:37:24.419,0:37:25.349
processes

0:37:25.349,0:37:30.059
and now you as a user if you wanted to do
something let's say you wanted to

0:37:30.059,0:37:32.009
compile and run a program

0:37:32.009,0:37:34.960
you would be given a process

0:37:34.960,0:37:36.019
and it was up to you

0:37:36.019,0:37:39.369
to figure out how to stage what you needed
done

0:37:39.369,0:37:39.819
and

0:37:39.819,0:37:43.930
that this was often fairly complex

0:37:43.930,0:37:47.880
and so you would have to write out all the
steps that you wanted

0:37:47.880,0:37:50.300
in this wonderful thing called JCL

0:37:50.300,0:37:52.259
Job Control Language. 

0:37:52.259,0:37:56.650
Job Control Language was send mail configuration
file of the sixties

0:37:56.650,0:38:00.679
there where people whose sole job at the company
was how to put this stuff together 'cause

0:38:00.679,0:38:04.189
all you had to do is get one extra space or
a missing comma

0:38:04.189,0:38:05.000
something in there

0:38:05.000,0:38:08.630
and the whole thing would just blow up. it would
just sort of spit the card deck back at

0:38:08.630,0:38:09.799
you and say well

0:38:09.799,0:38:13.500
somewhere in there is a mistake that's sort of
in the general area of this card

0:38:13.500,0:38:15.549
and I can't deal with it. Fix it.

0:38:15.549,0:38:16.489
and of course

0:38:16.489,0:38:20.550
in those days it wasn't just a matter of hitting
carriage when you know make carriage return you have to

0:38:20.550,0:38:25.239
get your deck pull out the card, and type the
new one, put it back in and re-submit it

0:38:25.239,0:38:28.729
As heaven forbid you couldnt touch that 
card reader you know, it had to be done by

0:38:28.729,0:38:29.970
an operator

0:38:29.970,0:38:32.869
so the card deck will read through it would
disappear and

0:38:32.869,0:38:36.800
you know if you're lucky a few minutes later
if you were not lucky a few hours later

0:38:36.800,0:38:37.849
you would get

0:38:37.849,0:38:39.570
a print out

0:38:39.570,0:38:43.419
which was what had happened and then you could
look at it and you know

0:38:43.419,0:38:47.209
I put a comma in the wrong place I guess
I get to do it all again

0:38:47.209,0:38:49.930
so

0:38:49.930,0:38:54.940
the thing you would need to do there for compiling and running a program

0:38:54.940,0:38:59.579
was you'd have to break into these steps. well
I need to run the the preprocessor

0:38:59.579,0:39:04.670
and so clean out whatever gump that was left
over on that process from the previous user

0:39:04.670,0:39:06.240
put the preprocessor in there

0:39:06.240,0:39:10.530
and then read from this file here let's
say I gotta put it somewhere so creative

0:39:10.530,0:39:12.510
scratch file over on this disk and

0:39:12.510,0:39:17.299
it was excruciating detail like how many cylinders
and how many tracks and this and that

0:39:17.299,0:39:19.139
blocks blah blah blah

0:39:19.139,0:39:23.119
and don't forget any of those parameters 'cause
it'll spit it out if you do

0:39:23.119,0:39:26.890
and so then it would run the first step in that
if its successful then you'd have sitting

0:39:26.890,0:39:28.899
in this scratch file that you had created

0:39:28.899,0:39:33.100
the output of the preprocessor and then
you'd load the first pass of the compiler

0:39:33.100,0:39:36.930
and you say now read from that scratch file
and create this other scratch file over here and

0:39:36.930,0:39:39.450
when thats successful and we need to delete that
one

0:39:39.450,0:39:43.830
and then load the second pass, put that back
into another scratch file and then we run this

0:39:43.830,0:39:45.950
assembler, and the optimizer then the

0:39:45.950,0:39:47.750
loader this and that

0:39:47.750,0:39:49.410
finally run the program

0:39:49.410,0:39:50.900
and if all goes well

0:39:50.900,0:39:57.029
you know at step sixteen out comes the answer

0:39:57.029,0:39:58.129
forty two. so UNIX

0:39:58.129,0:40:00.819
said, look this is silly

0:40:00.819,0:40:02.880
a lot of this is just

0:40:02.880,0:40:04.310
bookkeeping

0:40:04.310,0:40:07.249
and computers do bookkeeping really well

0:40:07.249,0:40:12.179
and you'll recall yeah but it's going to take
all these cycles it's like

0:40:12.179,0:40:16.309
computers are supposed to be labor-saving
devices right? so

0:40:16.309,0:40:20.150
they came up with this notion that they would
create processes on the fly as needed

0:40:20.150,0:40:21.159
you had

0:40:21.159,0:40:25.549
you've had a preprocessor in two
steps of the compiler and then

0:40:25.549,0:40:27.109
optimizer and then a loader

0:40:27.109,0:40:29.410
we just create Boom seven processes

0:40:29.410,0:40:31.920
and we connect them together with pipes

0:40:31.920,0:40:35.180
and so we take the input and you know run
through in

0:40:35.180,0:40:38.270
through the pipes and you know out the end
you get the the

0:40:38.270,0:40:39.629
executable

0:40:39.629,0:40:40.030
and

0:40:40.030,0:40:42.880
we will simply create each of these processes

0:40:42.880,0:40:44.650
and

0:40:44.650,0:40:46.549
so you as a user just

0:40:46.549,0:40:49.479
type you know the C compiler and it just

0:40:49.479,0:40:52.429
fork these things pipe them together got the result

0:40:52.429,0:40:53.640
and

0:40:53.640,0:40:57.509
then once it was done with this processes is
just threw them away so any time you'd create a

0:40:57.509,0:41:00.479
new process and it came to you pristine clean

0:41:00.479,0:41:04.239
and you needed a bunch of things it did 
put everything in intermediate files

0:41:04.239,0:41:07.549
the fact of the matter is in the early days

0:41:07.549,0:41:08.129
those computers

0:41:08.129,0:41:11.910
didn't really have enough memory to support
all that stuff at once so

0:41:11.910,0:41:15.809
behind you those pipes were actually implemented
as files

0:41:15.809,0:41:19.319
but you didn't have atleast to remember to create
them and delete them

0:41:19.319,0:41:20.200
and deal with them

0:41:20.200,0:41:24.020
as far as you were concerned it just look stuff
flowing through pipes and of course today it

0:41:24.020,0:41:24.490
just

0:41:24.490,0:41:27.989
does flow through pipes in memory

0:41:27.989,0:41:29.439
okay so

0:41:29.439,0:41:33.689
this notion then that that we're just gonna
create processes on the fly is needed and

0:41:33.689,0:41:35.559
connect them together as needed

0:41:35.559,0:41:38.039
it was a novel concept

0:41:38.039,0:41:43.599
and it wasn't that somehow mysteriously figured
out how to create processes cheaply

0:41:43.599,0:41:44.839
cause they hadn't

0:41:44.839,0:41:46.180
they were still

0:41:46.180,0:41:49.959
really expensive to create

0:41:49.959,0:41:52.210
but that extra effort

0:41:52.210,0:41:53.029
was

0:41:53.029,0:41:56.089
worth it because it was saving a lot of programming
time

0:41:56.089,0:41:59.809
so my favorite example is you run ls

0:41:59.809,0:42:01.810
so we have to create a process

0:42:01.810,0:42:04.259
load the ls binary into it

0:42:04.259,0:42:06.180
it prints a line or two on your screen

0:42:06.180,0:42:10.609
and we tear the entire thing down and return
all its resources back to the system

0:42:10.609,0:42:14.979
more than ninety percent of the cost of running
ls is creating and destroying the process

0:42:14.979,0:42:19.239
a tiny fraction of it is actually running
ls

0:42:19.239,0:42:24.259
but it goes so fast, who cares right

0:42:24.259,0:42:25.749
so the point is that

0:42:25.749,0:42:30.039
that concept of just creating things as
needed

0:42:30.039,0:42:31.780
again was very powerful

0:42:31.780,0:42:35.709
and is one that is just pervasive today

0:42:35.709,0:42:38.639
okay so what is a process actually made up
of

0:42:38.639,0:42:43.179
it gets some amount of CPU time or at
least we do dearly hope that it gets some

0:42:43.179,0:42:46.050
amount of CPU time, the lack of getting
CPU time

0:42:46.050,0:42:46.670
that makes it

0:42:46.670,0:42:47.979
a computer so sluggish

0:42:47.979,0:42:49.409
of course

0:42:49.409,0:42:51.920
others really boils down to scheduling

0:42:51.920,0:42:54.249
and we're going to talk about scheduling

0:42:54.249,0:42:56.279
probably more than you care to

0:42:56.279,0:42:59.219
in a couple weeks time

0:42:59.219,0:43:01.619
we have the asynchronous events

0:43:01.619,0:43:04.569
these are the external events that

0:43:04.569,0:43:05.659
are coming in

0:43:05.659,0:43:07.679
so

0:43:07.679,0:43:10.169
they may be either things that

0:43:10.169,0:43:14.339
were coming in from the outside world like
start, stop and quit

0:43:14.339,0:43:15.279
oh

0:43:15.279,0:43:18.170
out-of-band data arrival notification that kind
of thing

0:43:18.170,0:43:22.339
or it may in fact be things that the program
is bringing down upon itself

0:43:22.339,0:43:25.590
such as a segment fault,a divide by zero

0:43:25.590,0:43:26.910
and some other

0:43:26.910,0:43:31.959
what would normally be viewed as incorrect
operation

0:43:31.959,0:43:35.849
and so we'll talk about that when we talk about
signals

0:43:35.849,0:43:37.039
every program

0:43:37.039,0:43:38.899
gets some amount of memory

0:43:38.899,0:43:42.659
it gets an initial amount when it starts
up injured generally allocates more as it

0:43:42.659,0:43:45.229
goes along

0:43:45.229,0:43:49.429
this of course we will deal with very extensively
will spend an entire week on it

0:43:49.429,0:43:54.249
when we talk about how virtual memory is implemented

0:43:54.249,0:43:54.609
and

0:43:54.609,0:43:57.429
then we get I/O descriptors

0:43:57.429,0:44:02.259
I used to say that every program had to have
at least one I/O descriptor since

0:44:02.259,0:44:04.910
it absolutely had no input

0:44:04.910,0:44:06.329
absolutely no output

0:44:06.329,0:44:09.049
then it was sort of pointless

0:44:09.049,0:44:12.900
of course I had to have one of my students
come up and point out to me there is an a

0:44:12.900,0:44:13.849
class of programs

0:44:13.849,0:44:16.469
which don't need I/O descriptors

0:44:16.469,0:44:17.440
and that is

0:44:17.440,0:44:19.549
these things called benchmarks

0:44:19.549,0:44:23.249
it just compute something all we really care
about is how long it takes them to compute

0:44:23.249,0:44:24.959
we dont actually care what the answer is

0:44:24.959,0:44:26.019
In theory we dont

0:44:26.019,0:44:29.779
I personally like my benchmark stop with
something so I can see it there

0:44:29.779,0:44:31.489
doing computing the right thing

0:44:31.489,0:44:33.169
but in theory

0:44:33.169,0:44:35.919
that wouldn't be necessary

0:44:35.919,0:44:38.650
outside of that class of programs

0:44:38.650,0:44:42.670
everything needs some sort of descriptors and
of course we'll talk about descriptors

0:44:42.670,0:44:43.659
quite extensively

0:44:43.659,0:44:47.349
as we go through the I/O subsystem

0:44:47.349,0:44:50.969
okay so the executive summary is that processes
are

0:44:50.969,0:44:54.969
the fundamental service that is provided by
UNIX

0:44:54.969,0:44:58.430
and

0:44:58.430,0:45:02.849
what we're going to spend essentially the
next two and a half weeks working on

0:45:02.849,0:45:04.769
is

0:45:04.769,0:45:07.079
what what makes up processes

0:45:07.079,0:45:10.180
we'll go into much more detail about each of these
four points

0:45:10.180,0:45:11.769
and

0:45:11.769,0:45:13.630
then how do we actually go about

0:45:13.630,0:45:14.390
providing that

0:45:14.390,0:45:16.639
bit of service

0:45:16.639,0:45:17.900
the next thing that I'm

0:45:17.900,0:45:22.210
going to do now is this go through and lay
out some of the terminology that

0:45:22.210,0:45:23.239
we have when

0:45:23.239,0:45:25.130
we're talking about processes

0:45:25.130,0:45:29.229
so this is sort of the big picture here were
on page eighteen

0:45:29.229,0:45:30.669
and

0:45:30.669,0:45:33.669
you can see we have sort of three bits that
make up

0:45:33.669,0:45:36.640
the system

0:45:36.640,0:45:39.029
we have the currently running user process

0:45:39.029,0:45:41.180
and then what we call the top half of the kernel

0:45:41.180,0:45:43.699
and the bottom half of the kernel

0:45:43.699,0:45:47.049
now this would be a picture for a uniprocessor

0:45:47.049,0:45:49.299
so one CPU 

0:45:49.299,0:45:51.209
if we had a multiprocessor

0:45:51.209,0:45:54.009
%uh then we would have

0:45:54.009,0:45:57.130
one instance of the kernel

0:45:57.130,0:45:59.529
but multiple instances of the user process

0:45:59.529,0:46:02.879
but for any given CPU on a multiprocessor

0:46:02.879,0:46:05.709
it is running exactly one process

0:46:05.709,0:46:09.309
so you may think they we're running for four-five
processes all at once

0:46:09.309,0:46:14.319
but the fact of the matter is that any instant
in time there's only one process which is

0:46:14.319,0:46:16.299
actually running

0:46:16.299,0:46:18.609
and

0:46:18.609,0:46:21.429
that is the one that we have loaded in the system

0:46:21.429,0:46:25.199
now we give the illusion that were running
lots of things because we switch between them

0:46:25.199,0:46:26.100
rather quickly

0:46:26.100,0:46:29.269
so it looks like things are happening in all
windows at once

0:46:29.269,0:46:31.430
but in reality

0:46:31.430,0:46:33.619
that's not really happening

0:46:33.619,0:46:36.440
okay so there is a set of properties that I want to
look at

0:46:36.440,0:46:40.899
that had to do with each one of these parts here

0:46:40.899,0:46:44.359
but just to sort of look at it from the
big picture perspective

0:46:44.359,0:46:45.970
what you see here

0:46:45.970,0:46:47.180
is

0:46:47.180,0:46:51.549
there is boundary between the user process
and the top half of the kernel

0:46:51.549,0:46:54.949
which is really just like a glorified sovereignty
call

0:46:54.949,0:46:59.539
it's a lot like calling into a library routine
like calling strcat, strcpy or something

0:46:59.539,0:47:00.319
like that

0:47:00.319,0:47:03.679
when you do a system call

0:47:03.679,0:47:05.650
we take that same set of parameters

0:47:05.650,0:47:08.009
now this is sort of

0:47:08.009,0:47:09.780
brick Wall here if you will

0:47:09.780,0:47:11.380
that is protecting

0:47:11.380,0:47:13.680
the top half of the kernel

0:47:13.680,0:47:15.299
from the application

0:47:15.299,0:47:18.899
I'll go more into some detail about how that
actually gets implemented

0:47:18.899,0:47:22.729
but in  essense you can think of it
is is there sort of this whaling Wall and these little

0:47:22.729,0:47:24.990
chinks there and you can sort of push a request
through

0:47:24.990,0:47:28.230
and somebody other sides sort of pulls that
looks at it and decides whether they're going

0:47:28.230,0:47:28.690
to

0:47:28.690,0:47:30.769
dain to provide service to you

0:47:30.769,0:47:34.229
and if they do then they sort of send it back

0:47:34.229,0:47:37.649
well like a library where you can just sort
of reach in and walk around if you want to

0:47:37.649,0:47:38.290
you

0:47:38.290,0:47:40.950
good programming practices you don't do that
but

0:47:40.950,0:47:43.049
you could

0:47:43.049,0:47:44.579
all right so

0:47:44.579,0:47:49.089
the the top half of the kernel is really looks
a lot like

0:47:49.089,0:47:50.509
a big library

0:47:50.509,0:47:53.509
%uh it just happens to be a library
routines

0:47:53.509,0:47:57.599
that deal with things where processes need
to interact with each other

0:47:57.599,0:48:01.399
in fact for many people they don't understand
for what's the difference between the C

0:48:01.399,0:48:03.259
library and the top half of the kernel

0:48:03.259,0:48:08.020
if it's something that you're doing that
no other process needs to know about

0:48:08.020,0:48:09.799
then it can be in the C library

0:48:09.799,0:48:13.829
so if you call strcat to concatenate two
strings together

0:48:13.829,0:48:17.599
nobody else needs to know you're doing that
you don't need to coordinate with anybody

0:48:17.599,0:48:19.000
else that you're doing that

0:48:19.000,0:48:20.160
it's just happening

0:48:20.160,0:48:21.979
so that goes in the C library.

0:48:21.979,0:48:24.489
on the other hand if you're reading or writing
the file

0:48:24.489,0:48:28.029
there may be other processes that are also
reading and writing that file

0:48:28.029,0:48:29.910
and therefore that

0:48:29.910,0:48:31.579
has to be done by the kernel

0:48:31.579,0:48:33.120
because they can coordinate

0:48:33.120,0:48:37.189
all the different processes that are trying to access
that file.

0:48:37.189,0:48:40.529
so the top half of the kernel is pretty straightforward
code

0:48:40.529,0:48:45.539
it looks a lot like any other library that
you would write if you look at top half kernel

0:48:45.539,0:48:49.640
code you know you see all read,come in 
it's got these parameters we Mark around we

0:48:49.640,0:48:53.719
get some data that we put it in the buffer and
we return back

0:48:53.719,0:48:57.470
and in fact writing code for the top half of
the kernel is

0:48:57.470,0:48:59.729
not all that difficult to do

0:48:59.729,0:49:00.989
it's

0:49:00.989,0:49:01.959
you have

0:49:01.959,0:49:05.939
for many of the same properties that you would
when you're writing user level application

0:49:05.939,0:49:07.529
code

0:49:07.529,0:49:11.779
the bottom half of the kernel is where things
start to get nasty

0:49:11.779,0:49:14.820
because the bottom half of the kernel is the part
of the system

0:49:14.820,0:49:18.769
that deals with all of the asynchronous events
in the system

0:49:18.769,0:49:22.179
is things like device drivers,

0:49:22.179,0:49:23.779
timers

0:49:23.779,0:49:25.010
that level of thing

0:49:25.010,0:49:28.029
that are driven by hardware events

0:49:28.029,0:49:28.659
so

0:49:28.659,0:49:31.459
for example a packet arrives on the network

0:49:31.459,0:49:33.670
that causes an interrupt to come and

0:49:33.670,0:49:36.729
that will be handled by the bottom half of
the kernel

0:49:36.729,0:49:38.829
and historically

0:49:38.829,0:49:43.079
when an interrupt came in it preempted whatever
else was going on

0:49:43.079,0:49:45.400
and it ran until it finished and then it returned

0:49:45.400,0:49:46.539
and it could not

0:49:46.539,0:49:49.439
go to sleep to wait for resources or other
things

0:49:49.439,0:49:51.339
%uh in current systems

0:49:51.339,0:49:54.549
you can actually go to sleep in the interrupt driver
and waiting for

0:49:54.549,0:49:56.739
some other activity to complete

0:49:56.739,0:49:58.259
it is however

0:49:58.259,0:50:00.799
not a good idea to do that

0:50:00.799,0:50:01.909
because

0:50:01.909,0:50:06.739
the usual case of most device drivers is they
can finish whatever they're doing in an interrupt

0:50:06.739,0:50:08.579
without ever blocking

0:50:08.579,0:50:09.580
and so

0:50:09.580,0:50:13.649
when an interrupt comes in we assume that you're
not going to sleep

0:50:13.649,0:50:14.710
and if you actually

0:50:14.710,0:50:17.219
then go to sleep.oh man

0:50:17.219,0:50:20.469
you didnt tell us you're going to do this we
have to go off to do a whole lot of other work

0:50:20.469,0:50:23.029
that we had originally planned on doing

0:50:23.029,0:50:25.460
so if you go to sleep in a device driver

0:50:25.460,0:50:28.209
you are taking a very serious performance hit

0:50:28.209,0:50:31.019
so it's highly recommended that you don't
do that

0:50:31.019,0:50:33.130
but if you have to you can

0:50:33.130,0:50:35.809
on it's because of this historic behavior
or

0:50:35.809,0:50:39.899
of not being able to sleep in the bottom half
of the kernel

0:50:39.899,0:50:42.119
that you have certain properties that have

0:50:42.119,0:50:44.769
taken over in device drivers

0:50:44.769,0:50:45.940
and that is

0:50:45.940,0:50:50.369
that a device driver should be handed all
the resources it needs to get his job done

0:50:50.369,0:50:54.490
you don't give a disk device driver 
Go read this

0:50:54.490,0:50:56.549
and put it somewhere

0:50:56.549,0:50:57.580
you have to say

0:50:57.580,0:50:59.410
Go read this particular block

0:50:59.410,0:51:02.650
here is a chunk of memory that I want that
 data to put in

0:51:02.650,0:51:03.959
and

0:51:03.959,0:51:06.169
notify me when it's done

0:51:06.169,0:51:06.970
because

0:51:06.970,0:51:10.660
things like allocating memory are classic
places where you end up having to go to sleep

0:51:10.660,0:51:12.939
to wait for stuff to happen

0:51:12.939,0:51:14.449
and

0:51:14.449,0:51:16.390
historically you couldn't do that

0:51:16.390,0:51:18.640
even currently don't want to have to do that

0:51:18.640,0:51:23.400
so device drivers generally have all
resources pre allocated

0:51:23.400,0:51:25.169
and then they can just go

0:51:25.169,0:51:27.279
the one place where this doesn't work

0:51:27.279,0:51:29.029
is the network

0:51:29.029,0:51:30.929
and in particular

0:51:30.929,0:51:34.630
you don't know when somebody's going to send
packets to you

0:51:34.630,0:51:37.040
you say well you're looking to open connections

0:51:37.040,0:51:39.360
but if you're doing something like IP forwarding

0:51:39.360,0:51:40.969
there's no

0:51:40.969,0:51:45.039
top half state it's dealing with this packets
they're just coming in on one interface being

0:51:45.039,0:51:46.719
sent out on another interface

0:51:46.719,0:51:50.630
they never pass through any part of the top
half of the kernel

0:51:50.630,0:51:53.529
and so in the case of network device drivers

0:51:53.529,0:51:56.149
they need to allocate memory

0:51:56.149,0:51:56.640
and

0:51:56.640,0:51:58.829
if memory gets into short supply

0:51:58.829,0:52:01.689
and they try to allocate memory and it's not
available

0:52:01.689,0:52:05.049
they historically coudnt wait for memory to be
available

0:52:05.049,0:52:08.380
and even in practice today don't wait


0:52:08.380,0:52:09.580
for memory to become available

0:52:09.580,0:52:12.469
they simply drop the packet on the floor

0:52:12.469,0:52:18.109
it's like well I didn't have any place to
put it sorry oops

0:52:18.109,0:52:20.940
now that doesn't cause incorrect behavior

0:52:20.940,0:52:24.369
because the higher level protocols will retransmit

0:52:24.369,0:52:29.140
but it does cause great performance problems
because retransmission means that connections

0:52:29.140,0:52:29.879
stall

0:52:29.879,0:52:31.110
they have to back up

0:52:31.110,0:52:33.010
they have to resend data

0:52:33.010,0:52:33.739
and so on

0:52:33.739,0:52:38.739
so you really want to avoid dropping packets
if you can possibly help it

0:52:38.739,0:52:42.029
and consequently

0:52:42.029,0:52:43.420
we tend to

0:52:43.420,0:52:46.499
pre allocate a certain amount of memory for
the network drivers

0:52:46.499,0:52:48.299
and

0:52:48.299,0:52:52.169
we try very hard to make sure that we're not
going to run out of memory but

0:52:52.169,0:52:54.869
if packets come fast enough and we can't deal
with them

0:52:54.869,0:52:57.940
as quickly as they are arriving then over short period
of time

0:52:57.940,0:53:03.489
we get to the point where we simply have to start
dropping packets

0:53:03.489,0:53:07.649
okay this is a part of kernel that you do not wish to
write code for

0:53:07.649,0:53:10.919
because it is extremely difficult to 
debug

0:53:10.919,0:53:12.759
you get these bugs where

0:53:12.759,0:53:18.779
the only time it happens is on the third Tuesday
when there's a full moon

0:53:18.779,0:53:19.300
and

0:53:19.300,0:53:24.199
we have a disk interrupt followed by %uh a
terminal character coming in

0:53:24.199,0:53:28.289
and the network packet arriving of size fifteen
twenty two

0:53:28.289,0:53:30.109
and when all those things happened

0:53:30.109,0:53:32.719
the system panics

0:53:32.719,0:53:37.380
and of course there's like it panics
cause you're following some bad pointer

0:53:37.380,0:53:40.969
something that should have been there
but was freed some time in the distant past

0:53:40.969,0:53:42.930
we are not sure when

0:53:42.930,0:53:44.049
and

0:53:44.049,0:53:47.400
try to debug things like that is extremely
difficult

0:53:47.400,0:53:48.509
and you can

0:53:48.509,0:53:52.120
think well I think I found the problem but
it's not reproduceable

0:53:52.120,0:53:55.530
you know you have to wait for the next third
Tuesday with a full moon and blah blah blah

0:53:55.530,0:53:56.950
to happen

0:53:56.950,0:53:57.469
and

0:53:57.469,0:54:01.449
you know so you sort of statistically
guess that you fix that you know I was getting

0:54:01.449,0:54:03.510
this bug once every three days

0:54:03.510,0:54:06.099
and now it's gone for two weeks without happening

0:54:06.099,0:54:07.239
did you fix that?

0:54:07.239,0:54:08.969
or if you've been lucky

0:54:08.969,0:54:10.459
and and it's

0:54:10.459,0:54:14.349
that coupled with the fact that you're
dealing with hardware

0:54:14.349,0:54:18.049
and hardware rarely works the way it's documented
to work

0:54:18.049,0:54:21.770
and so you know they're doing everything that
it says you're supposed to do

0:54:21.770,0:54:26.260
it still doesn't work because you didn't set
the fiddle bit over on that other place over

0:54:26.260,0:54:26.660
there

0:54:26.660,0:54:30.479
that's not documented anywhere but if it's
not said it doesn't work

0:54:30.479,0:54:33.769
occasionally

0:54:33.769,0:54:36.110
so this is another reason that you really want
of avoid

0:54:36.110,0:54:40.459
dealing with this part of the system if
you can possibly help

0:54:40.459,0:54:44.369
okay but lets go through and and look at some
of the properties here starting up at

0:54:44.369,0:54:45.789
the user process

0:54:45.789,0:54:47.980
we're running with 

0:54:47.980,0:54:51.449
preemptive scheduling

0:54:51.449,0:54:53.409
now there's several caveats here

0:54:53.409,0:54:55.239
preemptive scheduling is the default

0:54:55.239,0:54:56.970
so called shared scheduler

0:54:56.970,0:55:01.360
that is what you normally use there are other
schedulers like the real time scheduler

0:55:01.360,0:55:02.869
where what I'm saying isnt that true

0:55:02.869,0:55:05.709
we'll talk about some of the schedulers was
later

0:55:05.709,0:55:09.930
but the usual scheduler that you're running
on under UNIX is a shared scheduler

0:55:09.930,0:55:13.229
and under the shared scheduler user applications

0:55:13.229,0:55:15.159
run with pre emptive scheduling

0:55:15.159,0:55:17.449
and pre emptive scheduling means that

0:55:17.449,0:55:20.019
you run at the whim of the system

0:55:20.019,0:55:21.420
if it wants you to run

0:55:21.420,0:55:22.140
you run

0:55:22.140,0:55:25.490
once you to start running you have no guarantee
of how long you're going to run

0:55:25.490,0:55:29.370
it might like to run for three instructions
and then decide it doesn't like you many more

0:55:29.370,0:55:31.150
it wants to run something else

0:55:31.150,0:55:35.920
or you might get to run for several seconds
and in a row with the with no intervening

0:55:35.920,0:55:37.469
things interrupting you

0:55:37.469,0:55:39.719
you just don't know

0:55:39.719,0:55:40.969
and

0:55:40.969,0:55:42.839
really all you know is

0:55:42.839,0:55:43.569
that 

0:55:43.569,0:55:48.239
they claim that they're using statistics
and that and that the statistics are fair

0:55:48.239,0:55:55.059
and so on average you're going to get a reasonable
amount of time but thats

0:55:55.059,0:55:57.129
up to the system you don't control that

0:55:57.129,0:55:58.439
the real point here

0:55:58.439,0:56:01.940
is that you don't have any way of creating
a critical section

0:56:01.940,0:56:04.950
you can't say okay I don't want to be interrupted

0:56:04.950,0:56:07.429
during this particular sequence of things

0:56:07.429,0:56:09.809
so you have to program

0:56:09.809,0:56:13.469
assuming that you may be interrupted at any
point

0:56:13.469,0:56:14.979
okay

0:56:14.979,0:56:18.909
the next thing is that when you're running
in a user process

0:56:18.909,0:56:20.719
you are running in

0:56:20.719,0:56:24.150
with the processor in what's called unprivileged
mode

0:56:24.150,0:56:28.109
one of the requirements for running any kind
of a UNIX system

0:56:28.109,0:56:31.759
is that you have to have a processor that
support privileged and unprivileged

0:56:31.759,0:56:33.709
two different modes of operation

0:56:33.709,0:56:37.049
in privileged mode which is what the kernel
runs in

0:56:37.049,0:56:38.950
the entire repertoire

0:56:38.950,0:56:40.869
of the hardware is available

0:56:40.869,0:56:45.339
by this I mean you can set all the registers
you can fiddle with the memory management

0:56:45.339,0:56:47.460
unit you can initiate I/O

0:56:47.460,0:56:50.519
you can access any memory anywhere

0:56:50.519,0:56:51.919
etc

0:56:51.919,0:56:56.540
when you're running in unprivileged
mode which is what user processes run in and

0:56:56.540,0:57:00.709
this a large subset of the instructions which
you cannot execute

0:57:00.709,0:57:03.480
you cannot initiate I/O on

0:57:03.480,0:57:04.209
devices

0:57:04.209,0:57:06.770
you cannot change the memory mapping

0:57:06.770,0:57:10.209
you cannot access memory that's not part of
your address space

0:57:10.209,0:57:13.299
you cannot execute certain instructions
like halt

0:57:13.299,0:57:15.589
and

0:57:15.589,0:57:19.039
so in general you are protected

0:57:19.039,0:57:21.789
from manipulating anything that's outside of your
address space

0:57:21.789,0:57:23.759
this of course is desirable because

0:57:23.759,0:57:27.059
when you're running in this unprevileged 
mode

0:57:27.059,0:57:28.300
you're protected

0:57:28.300,0:57:31.910
from other processes manipulating you 
and they're protected from you manipulating

0:57:31.910,0:57:33.079
them

0:57:33.079,0:57:36.430
for those of you that have had that misfortune
to have to use

0:57:36.430,0:57:39.339
early versions of windows up to about ninety
eight

0:57:39.339,0:57:42.470
they always ran with the processor
running in privileged mode

0:57:42.470,0:57:44.009
even in applications

0:57:44.009,0:57:46.459
and so either maliciously or accidentally

0:57:46.459,0:57:50.000
you could stop on other people address space
or you could stop on the kernel

0:57:50.000,0:57:53.020
and a lot of the blue screen of death was
people just

0:57:53.020,0:57:56.319
following wild pointers and trashing different
parts of the system

0:57:56.319,0:57:58.819
taking everything down

0:57:58.819,0:58:00.020
it also makes it

0:58:00.020,0:58:02.320
far easier to

0:58:02.320,0:58:05.459
implement things like viruses and worms and
other things because

0:58:05.459,0:58:09.619
a user application can we rewrite the boot
block on the disk they can just the write down

0:58:09.619,0:58:13.109
and manipulate the registers that allow them
to do whatever they want

0:58:13.109,0:58:16.730
whereas when you're running in unprivileged
mode you cant write those kinds of

0:58:16.730,0:58:20.179
of things

0:58:20.179,0:58:24.119
so modern versions of Windows anything from about
2000 on

0:58:24.119,0:58:26.630
now run with privileged and unprevileged mode

0:58:26.630,0:58:28.649
but UNIX has always required that

0:58:28.649,0:58:30.219
and so when you're running an 

0:58:30.219,0:58:31.319
 user process

0:58:31.319,0:58:33.389
you cannot block i mean

0:58:33.389,0:58:37.969
you cannot execute the instructions which
cause a context switching to occur

0:58:37.969,0:58:40.349
you can't pick what's going to run next

0:58:40.349,0:58:43.140
you can't make that thing run next all you can
do

0:58:43.140,0:58:45.189
is go to the operating system and say

0:58:45.189,0:58:49.269
hey I've got nothing to do. pick somebody else
to run

0:58:49.269,0:58:53.449
and the operating system is the think they can
then execute the instructions which cause

0:58:53.449,0:58:57.609
a different process to be loaded

0:58:57.609,0:58:59.049
and run

0:58:59.049,0:59:03.400
alright.finally while you're in a user application you're
running on a user stack

0:59:03.400,0:59:06.410
that's part of the user's address space

0:59:06.410,0:59:07.889
so

0:59:07.889,0:59:10.819
part of creating a process gives you a runtime
stack

0:59:10.819,0:59:14.369
as part of a virtual address space and so it
can be

0:59:14.369,0:59:18.199
more or less up to the limits of the hardware
as big as you want it to be

0:59:18.199,0:59:19.949
so if you are running on thirty two-bit processor

0:59:19.949,0:59:22.819
you're stack can get the 2 gigabytes

0:59:22.819,0:59:23.319
and

0:59:23.319,0:59:26.839
the what this means is that anytime you
allocate local variables

0:59:26.839,0:59:28.529
you don't have to worry about Oh

0:59:28.529,0:59:30.609
is that gonna overrun my stack?

0:59:30.609,0:59:31.610
so if you need

0:59:31.610,0:59:35.519
a hundred thousand double precision floating
point numbers

0:59:35.519,0:59:37.189
you can just as a local variable allocate

0:59:37.189,0:59:40.269
an array of size a hundred-thousand type
double

0:59:40.269,0:59:44.029
and it just decrements your stack pointer by
hundred hundred thousand bytes

0:59:44.029,0:59:45.009
away you go

0:59:45.009,0:59:47.299
it's just virtual address space

0:59:47.299,0:59:49.020
as you'll see when we get into the kernel

0:59:49.020,0:59:50.210
that ceases to be the case