aboutsummaryrefslogtreecommitdiff
path: root/en_US.ISO8859-1/captions
diff options
context:
space:
mode:
authorMurray Stokely <murray@FreeBSD.org>2010-03-08 06:04:20 +0000
committerMurray Stokely <murray@FreeBSD.org>2010-03-08 06:04:20 +0000
commitf985ae0fb8b63a75989f162e98686da4d6b4b00b (patch)
tree4ea2ae4ced34331493718d6c0de835528d83bd90 /en_US.ISO8859-1/captions
parent156a60afab5e271cd8323b6e7de89b56c032e83f (diff)
downloaddoc-f985ae0fb8b63a75989f162e98686da4d6b4b00b.tar.gz
doc-f985ae0fb8b63a75989f162e98686da4d6b4b00b.zip
Second pass at human editing to improve the captions for this video
through work for hire on Amazon Mechanical Turk. Sponsored by: FreeBSD Foundation
Notes
Notes: svn path=/head/; revision=35455
Diffstat (limited to 'en_US.ISO8859-1/captions')
-rw-r--r--en_US.ISO8859-1/captions/2009/dcbsdcon/davis-isolatingcluster.sbv1594
1 files changed, 730 insertions, 864 deletions
diff --git a/en_US.ISO8859-1/captions/2009/dcbsdcon/davis-isolatingcluster.sbv b/en_US.ISO8859-1/captions/2009/dcbsdcon/davis-isolatingcluster.sbv
index b05e61955b..6b405900e9 100644
--- a/en_US.ISO8859-1/captions/2009/dcbsdcon/davis-isolatingcluster.sbv
+++ b/en_US.ISO8859-1/captions/2009/dcbsdcon/davis-isolatingcluster.sbv
@@ -1,21 +1,21 @@
0:00:15.749,0:00:18.960
-I do apologize for the
+I do apologize for the (other)
0:00:18.960,0:00:22.130
-for the Euro BSD Con slide. I've redone the
+for the EuroBSDCon slides. I've redone the
0:00:22.130,0:00:23.890
title page and redone the
0:00:23.890,0:00:27.380
and made some changes to the slides
-and they didn't make it for approval
+and they didn't make it through for approval
0:00:27.380,0:00:33.130
by this afternoon so
0:00:33.130,0:00:34.640
- okay so
+okay so
0:00:34.640,0:00:36.390
I'm gonna be talking about
@@ -40,32 +40,32 @@ who we are and
what our problem space is like because that
0:00:49.520,0:00:54.760
-%uh a dictates that that %uh has an effect
-are solutions base
+dictates that… has an effect
+on our solutions base
0:00:54.760,0:00:57.079
-I work for the aerospace corporation
+I work for the aerospace corporation.
0:00:57.079,0:00:58.609
-%uh we we work
+We work;
0:00:58.609,0:01:02.480
- we operate a federally-funded
+we operate a federally-funded
research and development center
0:01:02.480,0:01:05.400
-%uh in the area national security space
+in the area national security space
0:01:05.400,0:01:09.310
and in particular we work with the air force
space and missile command
0:01:09.310,0:01:13.090
-and with that the national reconnaissance
+and with the national reconnaissance
office
0:01:13.090,0:01:16.670
- and our engineers support a wide variety
+and our engineers support a wide variety
0:01:16.670,0:01:20.550
of activities within that area
@@ -80,15 +80,15 @@ a bit over fourteen hundred to correct
sorry twenty four hundred engineers
0:01:25.860,0:01:28.820
- in virtually every discipline we have
+in virtually every discipline we have
0:01:28.820,0:01:33.520
-as you would expect we have our rocket scientists
- we have people who build satellites
+as you would expect we have our rocket scientists,
+we have people who build satellites
0:01:33.520,0:01:37.439
we have people who build sensors that go on
-satellites people who study this sort of things
+satellites, people who study these sort of things
0:01:37.439,0:01:38.130
that you
@@ -100,16 +100,16 @@ see when you
use those sensors
0:01:40.819,0:01:42.040
-that sort of thing
+that sort of thing.
0:01:42.040,0:01:44.180
- we also have civil engineers and
+We also have civil engineers and
0:01:44.180,0:01:45.680
electronic engineers
0:01:45.680,0:01:46.649
-and process
+and process,
0:01:46.649,0:01:49.170
computer process people
@@ -119,17 +119,14 @@ so we literally do everything related to space
and all sorts of things that you might not
0:01:53.120,0:01:55.270
-expect to be related to space
+expect to be related to space,
0:01:55.270,0:01:58.820
-because we also for instance help build ground
-systems since satellites arent very useful if
+since we also for instance help build ground
+systems ‘cause satellites aren’t very useful if
0:01:58.820,0:02:00.680
-there isn't anything to talk to them
-
-0:02:00.680,0:02:02.540
-%um
+there isn't anything to talk to them;
0:02:02.540,0:02:04.090
and these engineers
@@ -149,7 +146,7 @@ you might not think of as an engineering
application but they are
0:02:17.229,0:02:22.249
-to Matlab programs or want to see code
+to Matlab programs or a lot of C code
0:02:22.249,0:02:23.960
or one of traditional parallel for us
@@ -161,11 +158,11 @@ serial code
and then
0:02:26.049,0:02:30.949
-large parallel applications either in house
-or genetic algorithms and that sort
+large parallel applications either in house;
+genetic algorithms and that sort
0:02:30.949,0:02:31.769
-of thing
+of thing,
0:02:31.769,0:02:32.900
or traditional
@@ -174,11 +171,7 @@ or traditional
the classic parallel code
0:02:34.749,0:02:37.599
-like you work around a crater or something material simulation
-
-0:02:37.599,0:02:40.119
-%uh or %uh
-
+like you work around a crate or something material simulation
0:02:40.119,0:02:41.459
or that or food flow
@@ -199,16 +192,16 @@ it
does come back and influence what we
0:02:51.529,0:02:55.999
-the sort of solutions we work at
+the sort of solutions we look at
0:02:55.999,0:03:00.499
-so the rest of the talk Im gonna talk about oops
+so the rest of the talk I’m gonna talk about rese…
0:03:00.499,0:03:05.259
-we skipped a slide, There we are. Thats a little better
+we skipped a slide, there we are, that’s a little better.
0:03:05.259,0:03:08.940
-what I'm interested in is I do high
+Now, what I'm interested in is I do high
performance computing
0:03:08.940,0:03:10.109
@@ -229,25 +222,25 @@ so
our primary resource at this point is
0:03:23.120,0:03:25.429
- the fellowship cluster
+the fellowship cluster
0:03:25.429,0:03:26.540
it's a for the
0:03:26.540,0:03:29.569
-named for the fellowship the ring
+named for the fellowship of the ring
0:03:29.569,0:03:30.449
-it's the
+so it's a…
0:03:30.449,0:03:32.520
-we're gonna wrap some nodes
+… eleven axel nodes
0:03:32.520,0:03:33.930
wrap the core systems
0:03:33.930,0:03:35.909
-%uh over here there's a
+over here there's a
0:03:35.909,0:03:39.659
Cisco a large Cisco switch. Actually today
@@ -257,88 +250,78 @@ there are around two sixty five oh nines if
you assess them
0:03:40.899,0:03:46.149
-and because we couldnt get the core density otherwise
+and because we couldn’t get the port density we wanted otherwise
0:03:46.149,0:03:50.219
and primarily the Gigabit Ethernet system runs
-FreeBSD currently 6.0 because we havent upgraded
+FreeBSD currently 6.0 ‘cause we haven’t upgraded
0:03:50.219,0:03:51.089
it yet
0:03:51.089,0:03:55.639
-planning to move to probably to 7.1
-maybe slightly past 7.1
+planning to move probably to 7.1
+or maybe slightly past 7.1
0:03:55.639,0:04:01.029
-%uh if we want to get the latest initial APM changes in
+if we want to get the latest HWPMC changes in
0:04:01.029,0:04:05.900
we use the Sun Grid Engine scheduler was one of
the two main options for open source
0:04:05.900,0:04:08.949
-resource managers on cluster the other one being
-that
-
-0:04:08.949,0:04:09.959
-the %uh
+resource managers on clusters the other one being
+the…
0:04:09.959,0:04:11.499
-Torp
+… the TORQUE
0:04:11.499,0:04:15.939
-and now the combination from cluster resources
+and now recombination from cluster resources
0:04:15.939,0:04:17.389
so we also have
0:04:17.389,0:04:18.079
- that's actually
+that's actually
0:04:18.079,0:04:22.090
-40 TB thats really the raw number on a sun thumper and
-
-0:04:22.090,0:04:23.219
-and
-
+40 TB that’s really the raw number on a sun thumper and
0:04:23.219,0:04:26.290
-that thirty two usable once you start using ---- two
+that’s thirty two usable once you start using RAID-Z2
0:04:26.290,0:04:30.939
since you might actually like to have your data
should a disk fail
0:04:30.939,0:04:32.969
-and with today's discs drade
+and with today's discs RAID…
0:04:32.969,0:04:34.009
-grade five
+RAID five
0:04:34.009,0:04:35.249
-doesn't really cut it
-
-0:04:35.249,0:04:37.379
-%um
+doesn't really cut it,
0:04:37.379,0:04:40.220
-we also have some other resources coming on but Im going to be
+And then we also have some other resources coming on but I’m going to be (concentrating on)
0:04:40.220,0:04:43.530
-two smaller clusters unfortunately probably running Sun x and
+two smaller clusters unfortunately probably running Linux and
0:04:43.530,0:04:45.900
- some SMPs but
+some SMPs but
0:04:45.900,0:04:49.990
-Im going to be concentrating here on the work we're
+I’m going to be concentrating here on the work we're
doing on our other
0:04:49.990,0:04:54.259
-our FreeBSD based cluster
+FreeBSD based cluster.
0:04:54.259,0:04:55.060
-first of all
+So, first of all
0:04:55.060,0:04:59.410
first of all I want to talk about why we want to
@@ -359,7 +342,7 @@ and
0:05:09.759,0:05:13.399
some fairly trivial experiments that we've done
-so far in terms of the it's enghancing the schedule or
+so far in terms of enhancing the schedule or
0:05:13.399,0:05:15.860
using operating system features
@@ -367,18 +350,15 @@ using operating system features
0:05:15.860,0:05:17.730
so you mitigate those problems
-0:05:17.730,0:05:19.349
-%um
-
0:05:19.349,0:05:20.110
-and %uh
+and
0:05:20.110,0:05:25.110
-then conclude with some future work
+then conclude with some feature work.
0:05:25.110,0:05:29.289
-obviously if you have a resource the size
-of the size of our cluster fourteen hundred
+So, obviously if you have a resource the size…
+the size of our cluster, fourteen hundred
0:05:29.289,0:05:30.970
cores roughly
@@ -390,20 +370,20 @@ you probably want to share it unless you
purpose built it for a single application
0:05:35.080,0:05:37.340
- you're going to want to have your users
+you're going to want to have your users
0:05:37.340,0:05:39.440
sharing it
0:05:39.440,0:05:42.909
-and you don't want to just say you know you get on Monday
+and you don't want to just say you know, you get on Monday
0:05:42.909,0:05:45.330
probably not going to be a very effective
option
0:05:45.330,0:05:49.270
-especially not when we have as many uses we
+especially not when we have as many users as we
do
0:05:49.270,0:05:53.849
@@ -430,7 +410,7 @@ anything we could need once
buy ten of them though
0:06:06.359,0:06:08.939
-if you really really needed it
+if we really, really needed it
0:06:08.939,0:06:09.680
dropping
@@ -439,20 +419,20 @@ dropping
small numbers of millions of dollars on
0:06:11.460,0:06:13.349
-computing resources wouldnt be
+computing resources wouldn’t be
0:06:13.349,0:06:15.039
impossible
0:06:15.039,0:06:20.829
but we can't go to you know just have every engineer
-who wants one just call Dell and say ship me ten racks
+who wants one just call up Dell and say ship me ten racks
0:06:20.829,0:06:24.030
it's not going to work
0:06:24.030,0:06:25.580
-and the other thing is that we cant
+and the other thing is that we can’t
0:06:25.580,0:06:28.360
we need to also provide quick turnaround
@@ -468,7 +448,7 @@ hogging it until they are done
because we have some users
0:06:34.720,0:06:37.099
-n then the next one can run
+and then the next one can run
0:06:37.099,0:06:40.949
because we have some users who'll
@@ -491,7 +471,7 @@ well so we've had to provide some ability for other
users to still get their work done
0:06:53.839,0:06:58.300
-so we can't just.. so we do have to have some share
+so we can't just… so we do have to have some sharing
0:06:58.300,0:07:00.619
however when you start to share any resource
@@ -512,14 +492,11 @@ can't get what they want
0:07:09.700,0:07:11.639
so you have to balance them a bit
-0:07:11.639,0:07:12.999
-%um
-
0:07:12.999,0:07:14.529
you know also
0:07:14.529,0:07:17.869
-%uh some jobs lie when they
+some jobs lie when they
0:07:17.869,0:07:20.870
request resources and they actually need
@@ -540,10 +517,10 @@ and if we don't have a mechanism to constrain
them
0:07:31.000,0:07:32.389
-we have problems
+we have problems.
0:07:32.389,0:07:34.270
-%uh likewise
+Likewise
0:07:34.270,0:07:37.109
once these users start to contend
@@ -552,7 +529,7 @@ once these users start to contend
that doesn't just result in
0:07:39.029,0:07:40.439
-the jobs taking
+the jobs taking,
0:07:40.439,0:07:43.360
taking longer in terms of wall clock time
@@ -561,20 +538,17 @@ taking longer in terms of wall clock time
because they are extremely slow
0:07:44.659,0:07:48.430
-but there's overhead related to that contention
-because they get swapped out due to to that pressure on
-
-0:07:48.430,0:07:49.219
-on
+but there's overhead related to that contention;
+they get swapped out due to pressure on
0:07:49.219,0:07:51.509
-on various systems
+various systems
0:07:51.509,0:07:52.550
if you really
0:07:52.550,0:07:57.039
-for instance run put of memory then you go into
+for instance run out of memory then you go into
swap and you end up wasting all your cycles
0:07:57.039,0:07:58.710
@@ -590,13 +564,13 @@ so there are
resource
0:08:04.219,0:08:08.139
-there are resource cost to the contention not merely
+there are resource costs to the contention not merely
0:08:08.139,0:08:11.979
-a delay in returning results
+a delay in returning results.
0:08:11.979,0:08:16.590
-so now I'm going to switch gears and start talk so I'm
+So now I'm going to switch gears and start talk… so I'm
going to talk a little bit about different
0:08:16.590,0:08:18.270
@@ -609,12 +583,9 @@ to the
0:08:20.610,0:08:22.339
these contention issues
-0:08:22.339,0:08:23.710
-and %uh
-
0:08:23.710,0:08:27.840
-%uh and and look at different ways of solving the
-problem.most of these are things that have
+and look at different ways of solving the
+problem. Most of these are things that have
0:08:27.840,0:08:29.440
already been done
@@ -626,16 +597,16 @@ but I just want to talk about
the different ways and then
0:08:32.990,0:08:35.710
-evaluate them in our context
+evaluate them in our context.
0:08:35.710,0:08:38.119
-so a classic solution to the problem is
+So a classic solution to the problem is
0:08:38.119,0:08:39.280
-Gang scheduling
+Gang Scheduling
0:08:39.280,0:08:44.139
- it's basically conventional Unex process
+It's basically conventional Unix process
context switching
0:08:44.139,0:08:46.560
@@ -643,7 +614,7 @@ written really big
0:08:46.560,0:08:50.339
you what you do is you have your parallel
-job thats running
+job that’s running
0:08:50.339,0:08:51.390
on a system
@@ -653,7 +624,7 @@ and it runs for a while
0:08:52.839,0:08:57.920
and then after a certain amount of time you basically
-shove it all you kick it off of all the nodes
+shove it all; you kick it off of all the nodes
0:08:57.920,0:08:59.940
and let the next one come in
@@ -695,7 +666,7 @@ or that sort of thing so
there there's a there's a lot of overhead
0:09:29.950,0:09:34.340
-associated with this.You take a long context switch
+associated with this. You take a long context switch
0:09:34.340,0:09:36.820
if all of your infrastructure supports this
@@ -738,10 +709,10 @@ or does it look like it's actually converging on
some sort of useful solution
0:10:10.860,0:10:13.980
-as they don't want to just wait till the end
+as they don't want to just wait till the end.
0:10:13.980,0:10:19.270
-down side of course is that this context
+Down side of course is that this context
switches costs are very high
0:10:19.270,0:10:22.460
@@ -762,16 +733,16 @@ with you know
communication libraries written on standard protocols
0:10:35.530,0:10:37.050
-the tools just arent there
+the tools just aren’t there
0:10:37.050,0:10:39.100
and so
0:10:39.100,0:10:40.860
-it's not very practical
+it's not very practical.
0:10:40.860,0:10:44.010
-also it doesn't really make a lot of sense with small jobs
+Also it doesn't really make a lot of sense with small jobs
0:10:44.010,0:10:47.789
and one of the things that we found is we have users who have
@@ -787,14 +758,14 @@ and they could write something that looked more like a
conventional parallel application where they
0:10:57.400,0:11:01.930
-you know wrote a schedule and set up an MPI a Message Pasting Interface
+you know wrote a Scheduler and set up an MPI a Message Passing Interface
0:11:01.930,0:11:05.400
and handed out tasks to pieces of their job and then you
could do this
0:11:05.400,0:11:09.280
-but then they would be running a schedule and they would
+but then they would be running a Scheduler and they would
probably do a bad job of it turns out it's actually
0:11:09.280,0:11:10.820
@@ -807,7 +778,7 @@ even a trivial case
and so what they do instead is they just select twenty
0:11:16.189,0:11:18.730
-twenty thousand jobs to great and say okay
+twenty thousand jobs to grid engine and say okay
0:11:18.730,0:11:21.330
whatever I'll deal with it
@@ -835,36 +806,36 @@ at least not in a
0:11:35.690,0:11:39.149
conventional gang scheduled environment where
-you do gang scheduling on the regularity of
+you do gang scheduling on the granularity of
0:11:39.149,0:11:40.940
jobs
0:11:40.940,0:11:44.140
-so from that perspective it wouldnt work very well
+so from that perspective it wouldn’t work very well.
0:11:44.140,0:11:48.380
-if you have all the pieces in place and you are
-doing a big parallel applications it in fact
+If you have all the pieces in place and you are
+doing a big parallel applications it is in fact
0:11:48.380,0:11:53.770
-an extremely effective approach
+an extremely effective approach.
0:11:53.770,0:11:56.290
-another option which is sort of related
+Another option which is sort of related
0:11:56.290,0:11:57.420
it's in fact
0:11:57.420,0:12:00.079
-take taking an even course with regularity
+take taking an even courser granularity
0:12:00.079,0:12:04.360
is single application or single project
-clusters or sub-clusters
+clusters or sub-clusters.
0:12:04.360,0:12:07.590
-%uh for instance this is used some national labs
+For instance this is used some national labs
0:12:07.590,0:12:11.910
where you're given a cycle allocation for a
@@ -877,13 +848,13 @@ and what your cycle allocation actually comes to you as is
here's your cluster
0:12:16.580,0:12:17.489
-here's a fun-ed
+here's a frontend
0:12:17.489,0:12:19.840
-here's this chunk of notes. they're yours. Go to it
+here's this chunk of notes, they're yours, go to it.
0:12:19.840,0:12:21.930
-Install your own OS. Whatever you want
+Install your own OS, whatever you want
0:12:21.930,0:12:25.580
it's yours
@@ -895,7 +866,7 @@ and then and at a sort of finer scale there's things such as
you could use Emulab
0:12:31.800,0:12:36.300
-which is the network emulation system but also does a less install and configuration
+which is the network emulation system but also does a OS install and configuration
0:12:36.300,0:12:39.300
so you could do dynamic allocation that way
@@ -908,7 +879,7 @@ Project Hedeby now actually I think it's
called service domain manager
0:12:44.040,0:12:46.500
- is the product size version
+is the productised version
0:12:46.500,0:12:50.010
or some Clusters on Demand
@@ -927,27 +898,27 @@ little
a more granular level than the
0:13:02.810,0:13:05.580
-the allocate them once a year approach
+the allocate them once a year approach
0:13:05.580,0:13:07.720
-none the less
+nonetheless
0:13:07.720,0:13:11.220
-lets you give people whole clusters to work with
+let’s you give people whole clusters to work with
0:13:11.220,0:13:12.920
nice one nice thing about it is
0:13:12.920,0:13:15.450
-the at the isolation between the processes
+the isolation between the processes
0:13:15.450,0:13:16.890
is complete
0:13:16.890,0:13:20.800
-so you dont have to worry about users stomping on each other.
-Its their own system. they can trash it all they
+so you don’t have to worry about users stomping on each other.
+It’s their own system, they can trash it all they
0:13:20.800,0:13:22.230
want
@@ -962,15 +933,14 @@ run the nodes into swap
well that's their problem
0:13:28.480,0:13:32.120
-but it also has the advantage that you can tailor the the images
+but it also has the advantage that you can tailor the images
0:13:32.120,0:13:36.980
on the nodes of the operative systems to
meet the exact needs of the application
0:13:36.980,0:13:40.560
-down side of course is its course
-theres a system whichgranularity. in our environment that doesn't work
+down side of course is its coarse granularity, in our environment that doesn't work
0:13:40.560,0:13:41.500
very well
@@ -979,7 +949,7 @@ very well
since we do have all of these all these different types of jobs
0:13:46.800,0:13:51.710
-context switches are also pretty expensive. certainly on the order of minutes
+context switches are also pretty expensive. Certainly on the order of minutes
0:13:51.710,0:13:54.690
Emulab typically claim something like ten minutes
@@ -988,7 +958,7 @@ Emulab typically claim something like ten minutes
there are some systems out there
0:13:57.970,0:14:03.320
-for instance if you use I think its open boot that
+for instance if you use I think it’s Open Boot that
they're calling it today. It used to be 1xBIOS
0:14:03.320,0:14:06.790
@@ -1004,17 +974,17 @@ mostly by getting rid of all that junk the BIOS writers wrote
and
0:14:12.890,0:14:17.770
-the OS speed is pretty fast if you dont have all
-that stuff to waylay you not
+the OS boots pretty fast if you don’t have all
+that stuff to waylay you,
0:14:17.770,0:14:19.940
but in practice on sort of
0:14:19.940,0:14:21.660
-on the shelf
+off the shelf hardware
0:14:21.660,0:14:24.400
-the context switches time quite high
+the context switches times’ are quite high
0:14:24.400,0:14:26.930
users of course can interfere with themselves
@@ -1034,27 +1004,27 @@ is that my users are
almost universally
0:14:37.830,0:14:40.410
-not trained as computer scientists are programmes
+not trained as computer scientists or programmers
0:14:40.410,0:14:42.550
-you know there's there's really no domain
-area
+you know they’re trained in their domain area
0:14:42.550,0:14:44.780
they're really good in that area
0:14:44.780,0:14:48.389
-their concepts of the way hardware works in the
+but their concepts of the way hardware works in the
way software works
0:14:48.389,0:14:55.389
-dont match reality in many cases
+don’t match reality in many cases
0:15:01.269,0:15:02.830
-its pretty rare in practice
+(inaudible question)
+It’s pretty rare in practice
0:15:02.830,0:15:06.700
-well I've heard one one lab that does it significantly
+well I've heard one lab that does it significantly
0:15:06.700,0:15:09.839
but it's like they do it on sort of a yearly
@@ -1070,7 +1040,7 @@ and you do typically have some sort of the deployment
system in place
0:15:18.340,0:15:20.680
-or in most types of cases actually
+or in those types of cases actually
0:15:20.680,0:15:22.359
usually your application comes with
@@ -1085,10 +1055,10 @@ on this project so this is
big resource allocation
0:15:36.000,0:15:39.780
-and %uh yet and I guess one other issue with this is there's no real easy
+And yeah I guess one other issue with this is there's no real easy
0:15:39.780,0:15:43.320
-way to capture on underutilized resources
+way to capture underutilized resources
for example
0:15:43.320,0:15:44.389
@@ -1099,13 +1069,13 @@ an application which you know say single-threaded
and uses a ton of memory
0:15:49.190,0:15:51.210
-on and is running on a machine
+and is running on a machine
0:15:51.210,0:15:55.040
the machines we're buying these days are eight core so
0:15:55.040,0:16:00.040
-thats wasting a lot of CPU cycles you're just
+that’s wasting a lot of CPU cycles you're just
generating a lot of heat doing nothing
0:16:00.040,0:16:03.890
@@ -1123,7 +1093,7 @@ sitting here that
need next to know need
0:16:11.560,0:16:15.910
-a hundred megabytes so we swap seven of
+a hundred megabytes so we slap seven of
those in along with the big job
0:16:15.910,0:16:18.580
@@ -1140,7 +1110,7 @@ obviously if the users have that application
next they can do it themselves
0:16:26.820,0:16:30.510
-it's not something where we can be easily
+but it's not something where we can easily
bring in
0:16:30.510,0:16:35.090
@@ -1148,23 +1118,20 @@ bring in more jobs and have a mix to
take advantage of the different
0:16:35.090,0:16:37.300
-resources
+resources.
0:16:37.300,0:16:39.940
-a related approach is to
+A related approach is to
0:16:39.940,0:16:43.950
to install virtualization software on the
equipment and this is this is
-0:16:43.950,0:16:44.980
-a
-
0:16:44.980,0:16:46.379
this is the essence of
0:16:46.379,0:16:49.800
-what Cloud computing is at the moment
+what Cloud Computing is at the moment
0:16:49.800,0:16:53.520
it's Amazon providing Zen
@@ -1173,28 +1140,28 @@ it's Amazon providing Zen
Zen hosting for
0:16:55.129,0:16:56.769
-relatively arbitrary yet
+relatively arbitrary
0:16:56.769,0:16:59.710
OS images
0:16:59.710,0:17:02.720
-it does have advantage that allows rapid deployment
+it does have the advantage that it allows rapid deployment
0:17:02.720,0:17:06.510
-in theory if your application is scaleable provides for
+in theory if your application is scalable provides for
0:17:06.510,0:17:08.259
-extremely high scaleability
+extremely high scalability
0:17:08.259,0:17:10.110
particularly if you
0:17:10.110,0:17:14.470
-arent us and therefore can possibly somebody else's hardware
+aren’t us and therefore can possibly use somebody else's hardware
0:17:14.470,0:17:16.520
-in in our application's face thats
+in our application's case that’s
0:17:16.520,0:17:18.790
not very practical so
@@ -1213,7 +1180,7 @@ you can have people with their own image in there
0:17:26.470,0:17:30.000
which is tightly resource constrained but you
-can run more than one of them on it. but no for instance
+can run more than one of them on a node. So for instance
0:17:30.000,0:17:31.170
you can give
@@ -1247,7 +1214,7 @@ afford to completely isolate say network bandwidth
at the bottom layer
0:17:49.520,0:17:51.580
-you can be some but
+you can do some but
0:17:51.580,0:17:56.170
if you go overboard you can spend all your time on accounting
@@ -1276,7 +1243,7 @@ operating system if you're using full virtualization
and that can allow
0:18:19.030,0:18:23.820
-allow obsolete core with your baseline core which is
+allow obsolete code with weird baselines to work which is
important in our space because
0:18:23.820,0:18:27.390
@@ -1307,7 +1274,7 @@ the ability to recover resources
as I was talking about before
0:18:45.290,0:18:49.530
-%uh but you can't do easily with sub-clusters because you cant just slip
+but you can't do easily with sub-clusters because you can’t just slip
0:18:49.530,0:18:50.360
another image
@@ -1316,7 +1283,7 @@ another image
on the on there and say are you can use anything and
0:18:52.910,0:18:56.730
-you know get that image ideal priority essentially
+you know give that image idle priority essentially
0:18:56.730,0:19:00.480
down side of course is that it is in complete
@@ -1348,18 +1315,18 @@ or your segment
of cache
0:19:16.390,0:19:18.390
-of cache base
+of cache space
0:19:18.390,0:19:24.809
-so users can in fact interfere with themselves and each other in this
+so users can’t in fact interfere with themselves and each other in this
environment
0:19:24.809,0:19:25.589
it's also
0:19:25.589,0:19:30.479
-%uh not really efficient for small jobs from the cost of running an
- entire arrest for every
+not really efficient for small jobs; the cost of running an
+entire OS for every
0:19:30.479,0:19:33.020
job is fairly high
@@ -1371,23 +1338,20 @@ even with
relatively light
0:19:34.710,0:19:38.250
-%uh you know it's like OS is you're still looking
+Unix like OSes is you're still looking
0:19:38.250,0:19:40.900
couple hundred megabytes in practice
0:19:40.900,0:19:46.240
-once you get everything up and running unless you get totally stripped
-down
-
-0:19:46.240,0:19:47.230
-and %uh
+once you get everything up and running unless you run something
+totally stripped down
0:19:47.230,0:19:49.460
-theres significant overhead
+there’s significant overhead
0:19:49.460,0:19:52.240
-theres CPU slowdown typically in the
+there’s CPU slowdown typically in the
0:19:52.240,0:19:55.360
you know typical estimates are in the twenty
@@ -1404,7 +1368,7 @@ possibly even lower
or higher
0:20:04.830,0:20:05.870
-and and just
+and just
0:20:05.870,0:20:09.920
you know the overhead because you have the whole OS there's a lot of a lot
@@ -1431,20 +1395,20 @@ we use the same memory but
at some level
0:20:25.220,0:20:29.309
-it's all going to get duplicated
+it's all going to get duplicated.
0:20:29.309,0:20:30.590
-a related option
+A related option
0:20:30.590,0:20:34.820
-comes from sort of the the internet havesting
+comes from sort of the internet havesting
industry which is to use virtual private
0:20:34.820,0:20:38.130
which is the technology from virtual private servers
0:20:38.130,0:20:42.110
-the example that everyone here is probably familiar with is jails where
+the example that everyone here is probably familiar with is Jails where
0:20:42.110,0:20:44.130
you can provide
@@ -1453,7 +1417,7 @@ you can provide
your own file system root
0:20:46.720,0:20:49.060
-your network interface
+your own network interface
0:20:49.060,0:20:50.620
and what not
@@ -1471,7 +1435,7 @@ that unlike full virtualization
the overhead is very small
0:20:58.680,0:21:01.030
-basically costs you
+basically it costs you
0:21:01.030,0:21:02.820
@@ -1484,39 +1448,36 @@ or an entry in few structures
there's some extra tests in their kernel but otherwise
0:21:10.220,0:21:14.900
-there's there's not a huge overhead for virtualization you don't need
+there's not a huge overhead for virtualization you don't need
an extra kernel for every
0:21:14.900,0:21:15.460
image
0:21:15.460,0:21:18.390
-so you get you get the difference here
+so you get the difference here
between
0:21:18.390,0:21:21.620
be able to run maybe
0:21:21.620,0:21:25.250
-you might be able to squeeze two hundred VMR images onto a machine
+you might be able to squeeze two hundred VMWare images onto a machine
0:21:25.250,0:21:29.620
-VMR people say no no don't do that but we have machines that are running
+VMWare people say no no don't do that but we have machines that are running
0:21:29.620,0:21:30.509
-nearly that many
-
-0:21:33.720,0:21:34.790
-they're what
+nearly that many.
0:21:34.790,0:21:38.289
-on the other hand there are people out there on thousands of
+On the other hand there are people out there who run thousands of
0:21:38.289,0:21:40.730
virtual hosts
0:21:40.730,0:21:43.170
-using this technique at a single machine so
+using this technique on a single machine so
0:21:43.170,0:21:45.200
big difference in resource use
@@ -1537,7 +1498,7 @@ that overhead is significant
you still do have some ability to tailor the
0:21:59.440,0:22:01.670
-images to jobs needs
+images to a job’s needs
0:22:01.670,0:22:03.309
you could have a
@@ -1546,16 +1507,16 @@ you could have a
custom root that for instance you could be running
0:22:05.400,0:22:07.380
-FreeBSD x6 in one
+FreeBSD 6.0 in one
0:22:07.380,0:22:08.650
in one
0:22:08.650,0:22:11.040
-virtual server and seven in another
+virtual server and 7.0 in another
0:22:11.040,0:22:15.090
-you have to be running of course seven kernel or eight kernel to make
+you have to be running of course 7.0 kernel or 8.0 kernel to make
that work
0:22:15.090,0:22:16.330
@@ -1565,14 +1526,14 @@ but it allows you to do that
we also in principle can do
0:22:18.500,0:22:23.080
-evil things like our sixty four-bit kernel and then thirty two bit
+evil things like our 64-bit kernel and then 32-bit
user spaces because
0:22:23.080,0:22:26.400
-say you have applications that you can't find the source to do anymore
+say you have applications that you can't find the source to anymore
0:22:26.400,0:22:31.830
-or wide worst wide worries you don't
+or libraries you don't
have the source to any more
0:22:31.830,0:22:32.990
@@ -1590,13 +1551,13 @@ virtualization
0:22:39.629,0:22:43.269
you don't have to virtualize things you don't
-care about so you dont have the overhead of
+care about so you don’t have the overhead of
0:22:43.269,0:22:45.520
-virtualizing everything
+virtualizing everything.
0:22:45.520,0:22:48.070
-downsides of course are incomplete isolation
+Downsides of course are incomplete isolation
0:22:48.070,0:22:50.690
you are running processes that on the same kernel
@@ -1608,22 +1569,22 @@ and they can interfere with each other
and there's dubious flexibility obviously
0:22:55.320,0:22:57.900
-I don't think anybody
+I don't think anyone
0:22:57.900,0:23:01.850
-should have the ability to run Windows in a jail
+should have the ability to run Windows in a jail.
0:23:01.850,0:23:02.860
-theres some
+There’s some
0:23:02.860,0:23:04.960
-Net BSD peak of support but
+Net BSD support but
0:23:04.960,0:23:10.510
-and I dont think it's really gotten to that point
+and I don’t think it's really gotten to that point.
0:23:10.510,0:23:12.420
-one one final area
+One final area
0:23:12.420,0:23:14.350
that sort of diverges from this
@@ -1638,7 +1599,7 @@ Unix solution to the problem
on this on single
0:23:20.580,0:23:22.070
-in a single machines
+in a single machine
0:23:22.070,0:23:22.800
which is
@@ -1654,11 +1615,11 @@ resource limits
a resource and typically
0:23:36.240,0:23:36.999
-schedule a
+scheduler a
0:23:38.340,0:23:41.510
-cluster schedulers support the common ones
+cluster schedulers support the common ones
0:23:41.510,0:23:43.150
so you can set a
@@ -1673,48 +1634,45 @@ and the schedulers typically provide
at least
0:23:51.350,0:23:54.740
-lot of support for
+launch support for
0:23:54.740,0:23:56.850
the limits on
0:23:56.850,0:24:01.900
-a given set of process. thats part of the job
+a given set of process, that’s part of the job
0:24:01.900,0:24:02.850
also the most
0:24:02.850,0:24:05.640
-you know there are a number of forms of research
+you know there are a number of forms of resource
partitioning that
0:24:05.640,0:24:07.170
-are available as
-
-0:24:07.170,0:24:08.100
-the in that
+are available
0:24:08.100,0:24:09.700
-as the standard feature
+as a standard feature
0:24:09.700,0:24:12.000
on so memory discs are one of them so
0:24:12.000,0:24:16.800
-if you want to create a file system space it's
-limited in size. Create a memory disc
+if you want to create a file system space that’s
+limited in size, create a memory disc
0:24:16.800,0:24:17.969
and back it
0:24:17.969,0:24:21.130
-and back it with a --- file
+and back it with a NMAP file
0:24:21.130,0:24:22.520
-Quotas another mechanism
+or swap
0:24:22.520,0:24:24.570
-of partitioning that
+of partitioning
0:24:24.570,0:24:26.330
disc use
@@ -1727,22 +1685,22 @@ processes to it
a single process
0:24:32.010,0:24:34.540
-processor a set of processors
+processor or a set of processors
0:24:34.540,0:24:39.310
and so they can't interfere with each other
-with processes running on other processes
+with processes running on other processors
0:24:39.310,0:24:44.280
the nice thing about this first is that you're using existing
-facilities so you dont have to rewrite
+facilities so you don’t have to rewrite
0:24:44.280,0:24:46.170
-also new features
+lots of new features
0:24:46.170,0:24:49.590
-for each application
+for a niche application
0:24:49.590,0:24:52.790
and they tend to integrate well with existing schedulers
@@ -1752,48 +1710,45 @@ in many cases
parts of them are already implemented
0:24:55.940,0:24:59.650
-and in fact the experiments that we'll talk about this later are using
+and in fact the experiments that I'll talk about later are all using
this type of
0:24:59.650,0:25:02.160
-technique
+technique.
0:25:02.160,0:25:02.830
-cons are of course
+Cons are of course
0:25:02.830,0:25:04.850
incomplete isolation again
0:25:04.850,0:25:08.270
-and theres typically no unified framework
+and there’s typically no unified framework
0:25:08.270,0:25:12.310
-for the concept of a job when it comes to the center process
+for the concept of a job when a job is composed of the center processes
0:25:12.310,0:25:16.710
-yeah there there are a number of data structures within the kernel for
+yeah there are a number of data structures within the kernel for
instance the session
0:25:16.710,0:25:18.120
which
0:25:18.120,0:25:19.499
-certain aggregate processes
+sort of aggregate processes
0:25:19.499,0:25:20.990
-but there isnt one
-
-0:25:20.990,0:25:22.230
-in
+but there isn’t one
0:25:22.230,0:25:24.800
-in BVSD at this point
+in BSD or Linux at this point
0:25:24.800,0:25:29.020
-allows you to place resource limits on this in a way that you can process
+which allows you to place resource limits on those in a way that you can a process
0:25:29.020,0:25:32.520
---- did have support like that
+IREX did have support like that
0:25:32.520,0:25:34.160
where they have a job ID
@@ -1802,10 +1757,10 @@ where they have a job ID
and there could be a job limit
0:25:36.210,0:25:38.280
- and slurs projects
+and selected projects
0:25:38.280,0:25:41.320
-pursue similar not not quite the same
+are sort of similar but not quite the same
0:25:41.320,0:25:43.149
processes or part of a project but
@@ -1817,10 +1772,10 @@ it's not quite the same inherited relationship
and typically
0:25:49.500,0:25:50.900
-there arent
+there aren’t
0:25:50.900,0:25:55.390
-limits on things like badwidth. there was
+limits on things like bandwidth. There was
0:25:55.390,0:25:56.430
a sort of a
@@ -1835,10 +1790,10 @@ nice type interface
on that I saw
0:26:01.950,0:26:03.720
-first thing as a research project
+posted as a research project
0:26:03.720,0:26:07.150
-many years ago I think it was that stage
+many years ago I think in the 2.x days
0:26:07.150,0:26:09.880
where you could say this process can have
@@ -1847,7 +1802,7 @@ where you could say this process can have
you know five megabits
0:26:11.580,0:26:12.530
-or or whatever
+or whatever
0:26:12.530,0:26:14.380
but I haven't really seen anything take off
@@ -1856,56 +1811,53 @@ but I haven't really seen anything take off
that would be a pretty neat thing to have
0:26:16.940,0:26:19.309
-but one other exception there
+actually one other exception there
0:26:19.309,0:26:22.230
-is on Irex again
+is on IREX again
0:26:22.230,0:26:28.210
the XFS file system supported guaranteed data rates on file handles
you could say
0:26:28.210,0:26:30.140
-you know if you would say I need
+you could open a file and say I need
0:26:30.140,0:26:32.940
ten megabits read or ten megabits write
0:26:32.940,0:26:34.029
-whatever you say
+or whatever and it would say
0:26:34.029,0:26:35.529
-okay and go
+okay or no
0:26:35.529,0:26:39.279
-and and then you could read and write and
-it would do evil things file system
+and then you could read and write and
+it would do evil things at the file system layer
0:26:39.279,0:26:40.600
in some cases
0:26:40.600,0:26:43.940
-all to making sure that you could get that terrific data rate
-
-0:26:43.940,0:26:44.900
-by
+all to ensure that you could get that streaming data rate
0:26:44.900,0:26:49.710
-by keeping the file
+by keeping the file.
0:26:49.710,0:26:53.620
-so now Im going to talk about what we've done
+So now I’m going to talk about what we've done
0:26:53.620,0:26:59.510
- what we needed was the solution to handle
+what we needed was a solution to handle
a wide range of job types
0:26:59.510,0:27:01.570
-so all the options we looked at for instance
+So of the options we looked at for instance
0:27:01.570,0:27:04.990
-single application clusters of
+single application clusters or
project clusters
0:27:04.990,0:27:11.990
@@ -1920,18 +1872,18 @@ virtualize in order to be
efficient in terms of
0:27:18.179,0:27:22.060
-being able to handle our job nix and what not to handle
+being able to handle our job mix and what not and handle
the fact that our users
0:27:22.060,0:27:23.740
tend to have
0:27:23.740,0:27:27.730
-spikes in their in their use
+spikes in their use
0:27:27.730,0:27:32.799
-on a on a large scale so for instance we get those that show and say
-that they need to run for a month
+on a large scale so for instance we get GPS we’ll show up and say
+we need to run for a month
0:27:32.799,0:27:33.780
on and then
@@ -1951,28 +1903,28 @@ we really need the virtuals something
virtualized
0:27:44.850,0:27:47.120
-and then we got to pay the price of %uh
+and then we have to pay the price of
0:27:47.120,0:27:48.380
of the overhead
0:27:48.380,0:27:51.590
-and again it doesn't handle small jobs and that is a
+and again it doesn't handle small jobs well and that is a
0:27:51.590,0:27:54.050
-large portion of our job nix and
+large portion of our job mix so
0:27:54.050,0:27:55.180
-of that
+of the
0:27:55.180,0:27:58.070
-quarter million or something jobs line
+quarter million or something jobs we’ve run
0:27:58.070,0:27:59.700
on our cluster
0:27:59.700,0:28:02.490
-%uh I would guess that
+I would guess that
0:28:02.490,0:28:04.730
more than half of those were submitted
@@ -1989,28 +1941,25 @@ so they'll just pop up
0:28:11.400,0:28:14.030
the other method to have looked at
-0:28:14.030,0:28:14.800
-are up
-
0:28:14.800,0:28:16.750
-were using resource limits
+are using resource limits
0:28:16.750,0:28:19.060
the nice thing of course is they're achievable
with
0:28:19.060,0:28:21.429
-they acheive useful isolation
+they achieve useful isolation
0:28:21.429,0:28:26.289
-and the inflexible with under existing functionality or small
-extension so that's what we think
+and they’re implementable with either existing functionality or small
+extensions so that's what we’ve
0:28:26.289,0:28:27.230
-concentrating on
+concentrating on.
0:28:27.230,0:28:29.740
-also been doing some thinking about
+We’ve also been doing some thinking about
0:28:29.740,0:28:31.809
could we use the techniques there
@@ -2022,27 +1971,27 @@ and combine them with jails
or related features
0:28:36.170,0:28:40.019
-it may be bulking up jails to be more like ------
+it may be bulking up jails to be more like zones in Solaris
0:28:40.019,0:28:44.150
-or containers I think they're calling this
+or containers I think they're calling them this
week
0:28:44.150,0:28:44.840
and
0:28:44.840,0:28:46.770
-so we're looking that
+so we're looking at that as well
0:28:46.770,0:28:50.840
to be able to provide
0:28:50.840,0:28:54.250
-to to to be able to provide pretty user operating environments
+to be able to provide pretty user operating environments
0:28:54.250,0:28:59.200
-potentially isolating users from operating suffrance as we upgrade the kernel
+potentially isolating users from upgrades so for instance as we upgrade the kernel
0:28:59.200,0:29:03.469
and users can continue using it all the
@@ -2055,17 +2004,17 @@ application in
and handle the updates in libraries and what not
0:29:09.970,0:29:13.840
-they also have potential to provide strong isolation for security
+they also have the potential to provide strong isolation for security
purposes
0:29:13.840,0:29:18.740
-%uh which could be useful in the future
+which could be useful in the future.
0:29:18.740,0:29:20.159
-we do think that that
+We do think that
0:29:20.159,0:29:24.040
-of of its to of these of these mechanisms the nice thing is that
+of these mechanisms the nice thing is that
resource limit
0:29:24.040,0:29:26.150
@@ -2073,35 +2022,33 @@ the resource limits and partitioning scheme
0:29:26.150,0:29:29.860
as well as virtual private service are very
-similar imitation requirements
+similar implementation requirements
0:29:29.860,0:29:33.090
- set up a fair bit more expensive
+set up a fair bit more expensive
0:29:33.090,0:29:34.620
in the VPS case
-
0:29:34.620,0:29:38.780
-while nonetheless they're fairly similar
+but nonetheless they're fairly similar.
0:29:38.780,0:29:42.610
-what we've been doing is we've taken Sun Grid Engine we've taken Sun
-Grid Engine
+So, what we've been doing is we've taken the Sun Grid Engine
0:29:42.610,0:29:46.880
and we were originally intended to actually
-extend Sun Grid Engine and modify its demands
+extend Sun Grid Engine and modify its daemons
0:29:46.880,0:29:48.480
-to do work
+to do the work
0:29:48.480,0:29:51.150
on what we ended up doing instead is realize
that well
0:29:51.150,0:29:54.910
-we can actually starts flying alternate program
+we can actually specify an alternate program
to run instead of the shepherd
0:29:54.910,0:29:57.990
@@ -2111,30 +2058,30 @@ The shepherd is the process
that starts all
0:30:00.580,0:30:02.250
-starts the the script that
+starts the script that
0:30:02.250,0:30:03.380
-can reach job
+can for each job
0:30:03.380,0:30:04.920
on a given node
0:30:04.920,0:30:08.559
- it collects usage and forwards signals to the
+it collects usage and forwards signals to the
children
0:30:08.559,0:30:12.620
and also is responsible for starting remote
- components
+components
0:30:12.620,0:30:14.560
-so shepherd is started and then
+so a shepherd is started and then
0:30:14.560,0:30:17.640
- traditionally in seperate engine it starts out
+traditionally in Sun grid engine it starts out
0:30:17.640,0:30:19.910
-it's own --the event
+its own RShell Daemon
0:30:19.910,0:30:20.800
and
@@ -2146,7 +2093,7 @@ jobs connect over
these days that for their own
0:30:23.670,0:30:25.870
-you're a mechanism which is
+mechanism which is
0:30:25.870,0:30:26.950
secure
@@ -2155,13 +2102,10 @@ secure
not using the
0:30:28.840,0:30:30.530
-arch old code
-
-0:30:30.530,0:30:31.920
-on
+crafty old RShell code.
0:30:35.370,0:30:37.970
-so what we've done is we've implemented a rapid script
+So what we've done is we've implemented a wrapper script
0:30:37.970,0:30:40.139
which allows a pre-command hook
@@ -2170,28 +2114,28 @@ which allows a pre-command hook
to run before the shepherd starts
0:30:42.559,0:30:47.170
-the command rapper send before we send shepherd before we can run like the N program
+the command wrapper so before we start shepherd we can run like the N program
0:30:47.170,0:30:49.150
-or the week and why not
+or we can run
0:30:49.150,0:30:50.430
-troops or whatever
+TRUE to whatever
0:30:50.430,0:30:54.040
- to set up the environment that it runs in or CPU
+to set up the environment that it runs in or CPU
0:30:54.040,0:30:56.600
-that will show later
+setters I’ll show later
0:30:56.600,0:30:58.750
-on and I first met her for cleanup
+and a post command hook for cleanup
0:30:58.750,0:31:03.940
-simply move because I felt like it
+it's implemented in Ruby because I felt like it.
0:31:03.940,0:31:07.830
-the first thing we implemented is memory backs temporary directors. the motivation for
+The first thing we implemented was memory backed temporary directories. The motivation for
0:31:07.830,0:31:08.700
this
@@ -2203,92 +2147,93 @@ is that
we've had problems for users will you know
0:31:12.180,0:31:15.510
-run slash ten on the nodes
+run slash temp out on the nodes
0:31:15.510,0:31:19.059
-where we have the figures is that they do have discs
+where we have the nodes configured is that they do have discs
0:31:19.059,0:31:22.960
-and most of the disc is available as slash ten
+and most of the disc is available as slash temp
0:31:22.960,0:31:25.049
we had some cases
0:31:25.049,0:31:27.840
-particularly early on where users would fill the discs and not complete it
+particularly early on where users would fill up the discs and not delete it
0:31:27.840,0:31:32.300
-their job would crash and they wold forget to add clean up code or whatever
+their job would crash or they would forget to add clean up code or whatever
0:31:32.300,0:31:35.100
-other jobs would fail strangely
+and then other jobs would fail strangely
0:31:35.100,0:31:39.029
-you might expect that you just get a you would get a nice error message
+you might expect that you just get a nice error message
0:31:39.029,0:31:42.040
-being programmers
+programmers being programmers
0:31:42.040,0:31:42.909
-people would not
+people would not do their
0:31:42.909,0:31:44.630
-handle very correctly
+error handling correctly.
0:31:44.630,0:31:47.380
-now of course you have issues like for instance
+A number of libraries do have issues like for instance
0:31:47.380,0:31:49.600
-the PDL library
+the PVM library
0:31:49.600,0:31:52.600
unexpectedly fails and reports a completely strange error
0:31:52.600,0:31:54.759
-if it can't create a file to have
+if it can't create a file in temp
0:31:54.759,0:32:01.669
-because it needs to create an extra file in itself
+because it needs to create a UNIX domain socket
+so it can talk to itself.
0:32:01.669,0:32:03.360
-so what we've done here
+So, what we’ve done here
0:32:03.360,0:32:08.059
is it turns out that Sun Grid Engine actually creates a temporary
directory often the
0:32:08.059,0:32:11.730
-typically but you can change
+typically /TEMP but you can change
that
0:32:11.730,0:32:14.490
-I think it's that's a
+and points temp dir to that
0:32:14.490,0:32:15.370
location
0:32:15.370,0:32:17.499
-we educated most of all users now
+we've educated most of all users now
0:32:17.499,0:32:21.360
-to use that location correctly and values
-that very cool
+to use that location correctly
+so they’ll use that variable
0:32:21.360,0:32:23.279
-they treat their files understand her
+they treat their files under temp dir
0:32:23.279,0:32:24.950
and then when the job exits
0:32:24.950,0:32:26.569
-the Grid Engine deletes the temp dir
+the Grid Engine deletes the directory
0:32:26.569,0:32:28.510
and that all gets cleaned up
0:32:28.510,0:32:32.720
-the problem of course being that of multiples
-also warning on the same note same time
+the problem of course being that of multiple
+are also running on the same node at the same time
0:32:32.720,0:32:35.290
one of them could still fill temp
@@ -2298,7 +2243,7 @@ so the solution was pretty simple
we created a
0:32:38.759,0:32:41.420
-rapper script at the beginning of the job
+wrapper script at the beginning of the job
0:32:41.420,0:32:42.760
creates a
@@ -2307,10 +2252,10 @@ creates a
a
0:32:43.940,0:32:47.260
-memory file to swap back to MB file system
+memory file to swap back to MD file system
0:32:47.260,0:32:50.790
-of a user requestable size of the default
+of a user requestable size with the default
0:32:50.790,0:32:53.310
and
@@ -2319,16 +2264,16 @@ and
this has a number of advantages the biggest one of course is that
0:32:56.520,0:32:58.320
-it's their fixed size so we get
+it's fixed size so we get
0:32:58.320,0:32:59.449
you know
0:32:59.449,0:33:01.000
- the user gets
+the user gets
0:33:01.000,0:33:03.420
- what they asked for
+what they asked for
0:33:03.420,0:33:05.930
and once they run of space, they run out of space well
@@ -2352,7 +2297,7 @@ now that we're running swap back memory files systems for temp
the users who only use a fairly small amount of temp
0:33:24.560,0:33:28.190
- should see vastly improved performance
+should see vastly improved performance
because they're running in memory
0:33:28.190,0:33:32.980
@@ -2362,10 +2307,10 @@ rather than writing to disc
quick example
0:33:34.690,0:33:38.270
-we've a little job script herel
+we've a little job script here
0:33:38.270,0:33:39.830
-prints chapter and
+prints temp dir and
0:33:39.830,0:33:41.950
prints the
@@ -2374,7 +2319,7 @@ prints the
amount of space
0:33:43.080,0:33:46.210
-we consider job request saying that we want
+we submit our job request saying that we want
0:33:46.210,0:33:51.539
this is what we want hundred megabytes of
@@ -2390,10 +2335,10 @@ so the program doesn't
so the program ends at the end of it
0:33:57.620,0:33:58.709
-%uh for doing it
+for doing it
0:33:58.709,0:34:00.510
-heres a live demo
+here's a live demo
0:34:00.510,0:34:01.840
all and then
@@ -2408,16 +2353,16 @@ you can see it
does in fact it creates a memory file system
0:34:07.549,0:34:10.449
-I attempted to do as great code
+I attempted to do great code
0:34:10.449,0:34:13.409
-having a variable space that
+having a variable space
0:34:13.409,0:34:15.839
-having a variable space that is roughly what user asked for
+that is roughly what the user asked for
0:34:15.839,0:34:17.089
-the version I had
+the version that I had
0:34:17.089,0:34:20.739
when I was attempting this was not entirely
@@ -2425,34 +2370,34 @@ accurate
0:34:20.739,0:34:24.710
trying to guess what all the
-USFS overhead would be
+UFS overhead would be
0:34:24.710,0:34:25.889
as the result was
0:34:25.889,0:34:28.399
-%uh not quite consistent
+not quite consistent
0:34:30.790,0:34:33.899
I couldn't figure out easy function so
0:34:33.899,0:34:39.589
-it does a better job than it did to start with
+it does a better job than it did to start with, it’s not perfect
0:34:39.589,0:34:40.600
sometimes however
0:34:40.600,0:34:42.329
-today that that's a good case
+today that that's a good fix
0:34:42.329,0:34:43.550
-we're coming to it
+we're coming to
0:34:43.550,0:34:45.359
-deployed pretty soon
+Deploy it pretty soon
0:34:45.359,0:34:47.159
-it's pretty easily
+it works pretty easily
0:34:47.159,0:34:48.570
well sometimes it's not enough
@@ -2464,17 +2409,18 @@ the biggest issue is that they were badly designed programs all
all over the world
0:34:52.720,0:34:54.919
-don't you step to like they're supposed to
+don't use temp dir like they're supposed to
0:34:54.919,0:34:59.319
in fact
0:35:10.099,0:35:12.759
+(inaudible question)
so there are all these applications
0:35:12.759,0:35:17.979
-there are a lot about patience still that need
-ten because they'll still need start up
+there are all these applications still that need
+temp say during start up
0:35:17.979,0:35:19.230
that sort of thing
@@ -2492,97 +2438,93 @@ so we have problems with these
realistically
0:35:26.290,0:35:27.799
-we cant change all of them
+we can’t change all of them
0:35:27.799,0:35:30.019
it's just not going to happen
0:35:30.019,0:35:31.950
-so we still have a lot of people
+so we still have problems with people
0:35:31.950,0:35:34.509
-running out resources
+running out of resources
0:35:34.509,0:35:35.819
-%uh so we probably
+so we probably
0:35:35.819,0:35:37.489
feel that
0:35:37.489,0:35:41.240
-was general solution is right a per job slash temp
+the most general solution is to write a per job slash temp
0:35:41.240,0:35:44.880
-the first was that her from the files system
-at its best
+and virtualize that portion of the files system
+in memory space
0:35:44.880,0:35:47.119
-we think there is some ways to me that
+and variate symlinks can do that
0:35:47.119,0:35:52.539
-and so he said okay let's give it a shot
+and so we said okay let's give it a shot
0:35:52.539,0:35:56.969
-just to inter these concepts for people who are unfamiliar with him
+just to introduce the concept of variate symlinks for people who aren’t familiar with them
0:35:56.969,0:36:00.280
-offering someone services recent ones that
-contained rules
+variate symlinks are basically symlinks that
+contain variables
0:36:00.280,0:36:02.389
-Richard Senator long time
+which are expanded at run time
0:36:02.389,0:36:05.549
-angeles half past be different for different
+it allows paths to be different for different
processes
0:36:05.549,0:36:06.969
for example
0:36:06.969,0:36:08.689
-you create the files
+you create some files
0:36:08.689,0:36:10.069
-on create
+you create
0:36:10.069,0:36:12.459
-they ask someone whose contents are
+a symlink whose contents are
0:36:12.459,0:36:18.329
-this veritable which has a the fall not shells
-fell the fall diet
+this variable which has the default shell value
0:36:18.329,0:36:18.990
and you
0:36:18.990,0:36:24.949
-he didn't get different results with different
-variables that
+get different results with different
+variable sets.
0:36:24.949,0:36:27.170
-what about her the implementation we've got
+So, to talk about the implementation we’ve done,
0:36:27.170,0:36:32.389
-it's drive from grateful and mission was to
-the data structures are gonna call
+it's derived from direct implementation, most of
+the data structures are identical
0:36:32.389,0:36:33.869
-authorities a number of changes
+however, I’ve made a number of changes
0:36:33.869,0:36:39.649
-the biggest one is that the two the concept
-of us troops and returned them entirely around
-
-0:36:39.649,0:36:40.409
-he added that
+the biggest one is that we took the concept
+of scopes and we turned them entirely around
0:36:40.409,0:36:45.329
-in trade as one of his assistants the which
-is over overridden by the users scope and by a
+in there is a system scope which
+is over overridden by a user scope and by a
0:36:45.329,0:36:47.259
-progressive scope
+process scope
0:36:49.819,0:36:53.449
problem with that is if you
@@ -2597,39 +2539,38 @@ and
you decide you want to do something clever like have
0:36:59.459,0:37:02.219
-root file system which
-%uh
+a root file system which
0:37:02.219,0:37:06.109
-were slashed with points the different things
-for different %uh
+where slash lib points to different things
+for different
0:37:06.109,0:37:08.249
different architectures
0:37:08.249,0:37:11.849
-were seriously until the users come along
+well, works quite nicely until the users come along
and
0:37:11.849,0:37:14.189
-they're upset there are variable
+set their arch variable
0:37:14.189,0:37:15.629
up for you
0:37:15.629,0:37:18.900
-if you have CSX like the program and you don't
+if you have say a Set UID program and you don't
defensively
0:37:18.900,0:37:22.319
and you don't implement correctly
0:37:22.319,0:37:24.900
-the obvious that they sat obviously you would
+the obvious bad things happen. Obviously you would
0:37:24.900,0:37:28.599
-Richard riordan often that I believe they
-did
+write your code to not do that I believe they
+did, but
0:37:28.599,0:37:31.700
there's a whole class of problems where
@@ -2647,50 +2588,47 @@ so by
reversing the order
0:37:38.509,0:37:41.849
-we can't we can reduce the risks
+we can reduce the risks
0:37:41.849,0:37:43.329
at the moment we don't
0:37:43.329,0:37:44.309
-haven't you sir
+have a user scope
0:37:44.309,0:37:47.530
I just don't like the idea of the users scope
to be honest
0:37:47.530,0:37:50.900
-from being they see you dont have to have
-poor user
+problem being that then you have to have
+per user state in kernel
0:37:50.900,0:37:55.509
-that just sits around forever
-you can never guard elected accepted the strain
+that just sort of sits around forever
+you can never garbage collect it except the
0:37:55.509,0:37:57.059
-of late
+Administrator way
0:37:57.059,0:37:59.489
just doesn't seem like a great idea to me
0:37:59.489,0:38:00.700
-on it
+And jail scope
0:38:00.700,0:38:04.609
just hasn't been implemented
0:38:04.609,0:38:09.809
-because it wasn't entirely clear as to what the semantics should be
-
-0:38:09.809,0:38:11.010
-i also
+because it wasn't entirely clear what the semantics should be
0:38:11.010,0:38:14.719
I also added default variable support variable
-also shell style
+also shell style
0:38:14.719,0:38:16.999
-for
+variable support
0:38:16.999,0:38:19.169
to some extent undoes the scope
@@ -2702,7 +2640,7 @@ the scope change
in that
0:38:21.779,0:38:24.749
-the default variable becomes a system scope
+the default variable becomes a system scope
0:38:24.749,0:38:26.540
which is overridden by everything
@@ -2715,10 +2653,7 @@ in particular who wants implement their
slashed temp which varies
0:38:33.380,0:38:36.240
-we have to do something like this because the temp needs to work
-
-0:38:36.240,0:38:37.209
-if
+we have to do something like this because temp needs to work
0:38:37.209,0:38:42.059
if we don't have the job values set
@@ -2727,8 +2662,8 @@ if we don't have the job values set
I also decided to use
0:38:45.829,0:38:49.839
-percent instead of dollar signs to avoid
-confusion over shell variables because these
+percent instead of dollar sign to avoid
+confusion with shell variables because these
0:38:49.839,0:38:50.379
are
@@ -2737,150 +2672,144 @@ are
a separate namespace in the kernel
0:38:52.620,0:38:56.669
-can't do it in a nice to do all the evaluation in the
+we can't do it to main OS and do all the evaluation in the
user space
0:38:56.669,0:38:59.269
it's classic vulnerability
0:38:59.269,0:39:02.739
-the that in the database for instance
+in the CVE database for instance
0:39:02.739,0:39:08.109
-and when I was in the past and avoid confusion
-with but yet that's and worthy and for the
+and we’re not using @ and avoid confusion
+with AFS
0:39:08.109,0:39:09.819
-now BST implementation
+or the Net BSD implementation
0:39:09.819,0:39:11.019
-first is not allowed
+which does not allow
0:39:11.019,0:39:14.879
-he's a reduced rate of one set of core values
+user or administratively settable values
0:39:14.879,0:39:17.019
-that will be enough ballots for
+that support
0:39:17.019,0:39:20.359
-on I don't have any automated variables such
-as the %uh
+I don't have any automated variables such
+as
0:39:20.359,0:39:25.789
-the process is not you which is universally
-sat and that he is the implementation war
+the percent sys value which is universally
+set in the Net BDS implementation
0:39:25.789,0:39:26.750
-on
-
-0:39:26.750,0:39:28.039
-aids
+or
0:39:28.039,0:39:32.579
-hey I do i'd be very foolish they also have
-
+a UID variable which they also have
0:39:32.579,0:39:34.909
-and currently and it allows Senator
+and currently and it allows
0:39:34.909,0:39:40.880
-setting about using other processes yourself
-in your own insurance
+setting of values in other processes,
+you can only set them in your own and inherit it
0:39:40.880,0:39:42.699
-that may change but it's a
+that may change but
0:39:42.699,0:39:47.339
one of my goals here is because they were
-subtle ways to make no mistakes and
+subtle ways to make dumb mistakes and
0:39:47.339,0:39:48.930
-cause securities undergo days
+cause security vulnerabilities
0:39:48.930,0:39:52.479
-I I've attempted to storm the features that
+I've attempted to slim the feature set
down to the point where you
0:39:52.479,0:39:54.909
-at some reasonable chance of not
+have some reasonable chance of not
0:39:54.909,0:39:56.339
doing that
0:39:56.339,0:40:03.339
-if you start building systems on the for
+if you start building systems on them for deployment.
0:40:04.419,0:40:06.909
-the final area that we've worked on
+The final area that we've worked on
0:40:06.909,0:40:09.499
-he is moving away from the final system states
+is moving away from the final system space
0:40:09.499,0:40:12.559
-and the NCR see these sets of them
+and into CPU sets
0:40:12.559,0:40:16.379
-chuck roberts indictment alleges a out for
-them out
+Jeff Roberts implemented a program
0:40:16.379,0:40:20.699
-people may I see he said functionality which
+implemented a CPU set functionality which
allows you to
0:40:20.699,0:40:23.489
-but it also seems the issues that have been
-set
+create… put a process into a CPU set
0:40:23.489,0:40:24.879
-the affinity of that
+and then set the affinity of that
0:40:24.879,0:40:26.269
-see he said
+CPU set
0:40:26.269,0:40:29.189
-political observers also stopped in on this
+by default every process has an anonymous
0:40:29.189,0:40:33.059
-and on a CD set was it was stuffed in the
+CPU set that was stuffed into
one that was created by this
0:40:33.059,0:40:37.269
-in an apparent
+in a parent
0:40:37.269,0:40:38.619
-so what I hear
+so for a little background here
0:40:38.619,0:40:40.740
-it's here unless you can figure issue
+in a typical SGE configuration
0:40:40.740,0:40:42.769
-every day has won so what
+every node has one slot
0:40:42.769,0:40:44.429
-first you here
+per CPU
0:40:44.429,0:40:48.639
-there there are a number of other ways you
-can configure basically us lost is something
+There are a number of other ways you
+can configure it, basically a slot is something
0:40:48.639,0:40:50.019
-in jobs and money
+a job can run in
0:40:50.019,0:40:56.719
-%uh federal jobs crosses what's going to happen
-would be more than one swatch
+and a parallel job crosses slots
+and can be in more than one slot
0:40:56.719,0:41:01.359
-pop quizzes making the applications where
-persons who tends to spend a fair bit of time
+for instance in many applications where
+code tends to spend a fair bit of time
0:41:01.359,0:41:02.380
-waiting for Iran
+waiting for IO
0:41:02.380,0:41:06.209
you are looking at more than one slot per CPU so two slots per
0:41:06.209,0:41:08.089
-CPU is not uncommon
+core is not uncommon
0:41:08.089,0:41:10.869
but probably the most common configuration
@@ -2890,7 +2819,7 @@ and the one that
you get out of the box is you just install a Grid Engine
0:41:13.719,0:41:16.739
- wants free CPU
+is one slot for each CPU
0:41:16.739,0:41:19.830
and that's how that's how we run because we
@@ -2901,7 +2830,7 @@ that whole CPU for whatever they want to do with
it
0:41:23.699,0:41:26.130
-So drums are allocated one or more slots
+so jobs are allocated one or more slots
0:41:26.130,0:41:27.599
if they're
@@ -2911,43 +2840,43 @@ depending on whether they're sequential or parallel jobs
and how many they ask for
0:41:33.189,0:41:37.239
-but there is but this is just a convection
+but this is just a convention
there's no actual connection between slots
0:41:37.239,0:41:39.119
and CPUs
0:41:39.119,0:41:40.829
-it's quite possible to
+so it's quite possible to
0:41:40.829,0:41:42.819
submit a non-parallel job
0:41:42.819,0:41:45.019
-that goes often spawns a zillion threads
+that goes off and spawns a zillion threads
0:41:45.019,0:41:48.369
-and sucks up the whole system
+and sucks up all the CPUs on the whole system
0:41:48.369,0:41:50.800
in some early versions of grid engine
0:41:50.800,0:41:53.569
-there actually was up
+there actually was
0:41:53.569,0:41:55.729
support for tying slots
0:41:55.729,0:41:58.669
-for CPU to set it up that
+to CPUs if you set it up that
way
0:41:58.669,0:42:02.979
-there is a sensible implementation for Iraq's
+there is a sensible implementation for IREX
and then things got weirder and weirder is
0:42:02.979,0:42:06.010
-people try to implement it on other platforms
+people tried to implement it on other platforms
which had
0:42:06.010,0:42:07.030
@@ -2957,29 +2886,26 @@ vastly different
CPU binding semantics
0:42:09.839,0:42:12.359
-and at this point in time we broke it
+and at this point it’s entirely broken
0:42:12.359,0:42:14.959
on every platform as far as I can tell
0:42:14.959,0:42:18.759
-also the we decided okay we've got this rapper
+so we decided okay we've got this wrapper
let's see what we can do
0:42:18.759,0:42:21.009
-on in terms of making things work
-0:42:21.009,0:42:21.659
-certainly
+in terms of making things work.
0:42:21.659,0:42:27.119
-we now have have the rapper store allocations in the final system
+We now have the wrapper store allocations in the final system
0:42:27.119,0:42:31.239
-%uh three and nineteen ninety percent that
-allocation I've read them
+we have a not yet recursive allocation algorithm
0:42:31.239,0:42:33.369
-well we try to do years
+well we try to do is
0:42:33.369,0:42:34.690
find the best fit
@@ -2991,12 +2917,9 @@ fitting set of
adjacent cores
0:42:39.539,0:42:42.329
-and then if that doesn't work we take orders
+and then if that doesn't work we take the largest
to repeat
-0:42:42.329,0:42:43.519
-%um
-
0:42:43.519,0:42:45.180
and until we fix
@@ -3004,7 +2927,7 @@ and until we fix
or until we've got enough slots
0:42:47.300,0:42:50.800
-the goal is to minimize the fragments we haven't
+the goal is to minimize new fragments we haven't
done any analysis
0:42:50.800,0:42:52.269
@@ -3014,24 +2937,23 @@ to determine whether that's actually
an appropriate algorithm
0:42:55.179,0:42:56.289
-off hand it seems
+but off hand it seems
0:42:56.289,0:43:00.519
-you'd find another part of a privilege
+fine given I’ve thought about it over lunch.
0:43:00.519,0:43:02.810
-should forties lay down their arms the US
-is
+Should 40’s lay down their OSes
0:43:02.810,0:43:09.649
-on turns out the FreeBSD, CPU API
-and the last one
+turns out that FreeBSD, CPU setting, API
+and the Linux one
0:43:09.649,0:43:12.519
-differ only in very small details
+differ only in the very small details
0:43:12.519,0:43:13.599
-on that
+They’re
0:43:13.599,0:43:15.479
essentially exactly
@@ -3040,143 +2962,126 @@ essentially exactly
identical which is
0:43:17.569,0:43:20.489
-correct in terms pretty semantically
+convenient semantically,
+so converting between then is pretty straight forward
0:43:20.489,0:43:24.869
-so I think it is of interest to demonstrate
-the effectiveness of cebu said they also happen
+so converting between then is pretty straight forward,
+so I did a set of benchmarks
0:43:24.869,0:43:27.019
-to demonstrate the %uh
-
-0:43:27.019,0:43:28.089
-the %uh
+to demonstrate the
0:43:28.089,0:43:29.359
-rather they probably have
+effectiveness of CPU set,
+they also happen to demonstrate the wrapper
0:43:29.359,0:43:33.319
-the relevance
+but don’t really have any relevance
0:43:33.319,0:43:35.229
-it's all of the young box
-
-0:43:35.229,0:43:36.629
-%um
-
-0:43:36.629,0:43:38.289
-the and %uh
+used a little eight core Intel Xeon box
0:43:38.289,0:43:40.749
-%uh someone clearly is that
+7.1 pre-release that had
0:43:40.749,0:43:43.239
-John dalton and backward it's not
+John Bjorkman backported
0:43:43.239,0:43:46.640
-our CD set it up for me
+CPU set
0:43:46.640,0:43:49.039
-shortly before they released
+from 8.0 shortly before release
0:43:49.039,0:43:53.450
-well it's usually is supposed to be shortly
+well not so shortly, it's supposed to be shortly
before
0:43:53.450,0:43:55.579
-and that in essence six two
+and the SG 6.2
0:43:55.579,0:43:59.739
-I will use the simple intervention or so and
-greens
+we used the simple integer benchmarks
0:43:59.739,0:44:02.519
-program were tested
+end Queens program were tested
0:44:02.519,0:44:03.349
-for instance it
+for instance an 8 x 8 board
0:44:03.349,0:44:05.360
-this any play for place
+placed
0:44:05.360,0:44:08.069
-the queen so they can capture each other
+the 8 queens so they can’t capture each other
0:44:08.069,0:44:09.289
on the board
-0:44:09.289,0:44:11.039
-%um
-
0:44:11.039,0:44:13.680
-so it's a it's a simple symbol of benchmark
+so it's a simple load benchmark
0:44:13.680,0:44:18.800
-%uh that we ran a a small version of the problem
-is our as a measure to man the man to generate
-
-0:44:18.800,0:44:19.599
-one of the
+that we ran a small version of the problem
+as our measure command to generate
0:44:19.599,0:44:24.439
-greta we're close and that we have much longer
+load we ran a larger version that we ran for much longer
0:44:24.439,0:44:28.149
some results
0:44:28.149,0:44:30.129
-so for baseline do it for us
+so for baseline,
0:44:30.129,0:44:33.170
-I think the most interesting thing is to do
-a slot blot
+the most interesting thing is to do
+a baseline run
0:44:33.170,0:44:34.279
you see this
0:44:34.279,0:44:36.410
-some very it's not really very high
+some variance it's not really very high
0:44:36.410,0:44:38.979
-not surprising that doesn't really do anything
+not surprising it doesn't really do anything
0:44:38.979,0:44:40.979
-on accept socks see here
+except suck CPU see here
0:44:40.979,0:44:41.729
-so on
+Really not much
0:44:41.729,0:44:45.229
-going on what's going on
+going on
0:44:45.229,0:44:50.029
-they don't think in this case for about seven
-to one processes and a single
+in this case we’ve got seven
+load processes and a single
0:44:50.029,0:44:52.789
-a single assassin process morning
+a single test process running
0:44:52.789,0:44:55.160
-this is the slogans wait wait
+we see things slow down slightly
0:44:55.160,0:44:55.890
-and %uh
+and
0:44:55.890,0:44:58.389
the standard deviation goes up a bit
0:44:58.389,0:45:00.829
-to live with a deviation from baseline
+it’s a little bit of deviation from baseline
0:45:00.829,0:45:03.659
- the obvious explanation is this
+ the obvious explanation is clearly
0:45:03.659,0:45:07.339
-you know were just context switch is a bit
-more
-
-0:45:07.339,0:45:08.840
-and %uh and %uh
+we’re just content switching
+a bit more
0:45:08.840,0:45:10.349
because we don't have
@@ -3185,362 +3090,342 @@ because we don't have
CPUs that are doing nothing at all
0:45:12.410,0:45:15.559
-on this is there some extra load from the system
+there some extra load from the system
as well
0:45:15.559,0:45:20.049
-since the kernel have to run and not contests
-have to run
+since the kernel has to run and
+background tests have to run
0:45:20.049,0:45:23.150
-you know if this is a story about maybe a
-deposition story
+you know in this case we have a badly behaved application
0:45:23.150,0:45:26.579
-we have people across this is what would some
-couples the year
+we now have 8 load processes which would suck up all the CPU
0:45:26.579,0:45:28.879
-you know we try to run a marathon process
+and then we try to run our measurement process
0:45:28.879,0:45:30.639
-we see a
+we see a you know
0:45:30.639,0:45:32.739
-substantial performance the trees
+substantial performance decrease
0:45:32.739,0:45:35.570
-you know abandon the interest rates that's
-a
+you know about in the range we would expect
0:45:35.570,0:45:37.289
-see if any
+see if we had any
0:45:37.289,0:45:40.140
-the trees
+decrease
0:45:40.140,0:45:43.220
-we fired up because the views that
+we fired up with CPU set
0:45:43.220,0:45:44.249
-when I saw it
+quite obviously
0:45:44.249,0:45:46.190
the interesting thing here is to see it
0:45:46.190,0:45:49.429
-we didn't know statistically significant difference
+we’re getting no statistically significant difference
0:45:49.429,0:45:52.819
-not between the baseline news with with a
+between the baseline case with
0:45:52.819,0:45:56.539
-southern cross is it for you see you sense
-we don't see this very
+7 processors if we use CPU sets
+we don't see this variance
0:45:56.539,0:45:58.520
-which is nice to know that this is it
+which is nice to know that this shows
0:45:58.520,0:45:59.509
that's it
0:45:59.509,0:46:02.869
-we have to we have to see a slight performance
+we actually see a slight performance
improvement
0:46:02.869,0:46:04.179
-and %uh
+and
0:46:04.179,0:46:05.579
-we %uh
+we
0:46:05.579,0:46:07.589
-we see a reduction in various
+we see a reduction in variance
0:46:07.589,0:46:11.569
-hans this issue he says action program performance
-even lot of love it
+so CPU set is actually improving performance
+even if we’re not overloaded
0:46:11.569,0:46:13.510
-and then you see a vote in the case
+and we see in the overloaded case
0:46:13.510,0:46:15.589
-it's it's a it's the same
+it's the same
0:46:15.589,0:46:20.319
-for the opposite the other on process is a
-stop on others he years
+for the other processes
+they’re stuck on other CPUs
0:46:20.319,0:46:22.820
-one interesting side note actually is there
+one interesting side note actually is that
0:46:22.820,0:46:26.719
-where is the what I was doing some tests early
-on
+when I was doing some tests early on
0:46:26.719,0:46:27.869
we actually saw
0:46:27.869,0:46:32.359
-the training base line in the base and a seat
-in the senate he just fired off with the original
+I tried doing the base line and
+the baseline with CPU set and if you just fired off with the original
0:46:32.359,0:46:33.869
-either of them which
+algorithm
0:46:33.869,0:46:34.540
-greta
+which
0:46:34.540,0:46:36.489
-grass seed use your own
+grabbed CPU0
0:46:36.489,0:46:39.339
-he's also if a performance decline
+you saw a significant performance decline
0:46:39.339,0:46:42.319
because there's a lot of stuff that ends up
-running on see user
+running on CPU0
0:46:42.319,0:46:43.819
-which %uh
+which
0:46:43.819,0:46:45.100
-what led to the year
+what led to the
0:46:45.100,0:46:49.890
-the for a conservationist you want to allocate
+quick observation you want to allocate
from the large numbers down
0:46:49.890,0:46:50.569
-so the issue
+so that you use
0:46:50.569,0:46:55.069
-you see used to not learning the random process
-is that it's gone forever we're getting all
+the CPUs which are not running the random processes
+that get stuck on zero
0:46:55.069,0:46:57.880
-the interruptions some architecture's
+or get all the interrupts in some architectures
0:46:57.880,0:47:02.199
-and it's a way to force the road project
+and avoid Core0 in particular.
0:47:02.199,0:47:04.029
so some conclusions
0:47:04.029,0:47:07.530
-all I think we have useful prefer concept
+I think we have useful proof of concept
of going to be deploying
0:47:07.530,0:47:09.880
-I was certainly the man with the %uh
+certainly the
0:47:09.880,0:47:11.000
-memories are seeing
+memory stuff soon
0:47:11.000,0:47:13.329
-once we have to bring it to you at seven what
+once we upgrade to seven we’ll
0:47:13.329,0:47:15.959
-definitely be going to see few sets up as
-well
+definitely be deploying the CPU sets
0:47:15.959,0:47:16.849
-so it's a
+so it's
0:47:16.849,0:47:18.509
-both includes performance
+both improves performance
0:47:18.509,0:47:22.009
-in the contentious in the and contentious
+in the contended case and in the and uncontended case
0:47:22.009,0:47:26.299
-we would like in the future to do some work
-with a personal private superstar
+we would like in the future to do some more work
+with virtual private server stuff
0:47:26.299,0:47:28.979
-in particular and the really interesting
+Particularly it would be really interesting
0:47:28.979,0:47:30.759
-you know when different
+to be able to run different
0:47:30.759,0:47:32.540
-different previous the persons in jails
+different FreeBSD versions in jails
0:47:32.540,0:47:37.660
-for to run up for instance up one several
-sentences in jail since eleven cents lost
+for to run up for instance CentOS images
+in jail since we’re running CentOS
0:47:37.660,0:47:40.649
-on our allies assistance
+on our Linux based systems
0:47:40.649,0:47:43.240
there could actually be some really interesting
things there
0:47:43.240,0:47:45.759
-on in that process which one
+in that for instance we can run
0:47:45.759,0:47:50.989
-we think is actually teach reason except occasions
-it's never going to happen only if one takes
+we could potentially detrace Linux applications
+it's never going to happen on native Linux
0:47:50.989,0:47:53.069
-we are simply there's another example work
+there's also another example where
0:47:53.069,0:47:56.269
-%uh also goes to Tsvangirai who recently
+Paul Sub who’s doing some benchmarking recently
0:47:56.269,0:48:01.039
-and what if the lights on the same farm workers
-seen if we don't have time to implement
+and relative to Linux on the same hardware
0:48:01.039,0:48:04.900
-all in basic matrix multiplication
-
+he was seeing a three and a half times improvement
0:48:04.900,0:48:07.230
-relative to current with with current because
+in basic matrix multiplication
0:48:07.230,0:48:08.549
-%um
+relative to current
0:48:08.549,0:48:11.849
-previously supervision functionality
+because previously super-pegged functionality
0:48:11.849,0:48:14.499
-that's been reduced the number of GOP entries
+where you vastly reduce the number of TLV entries
0:48:14.499,0:48:16.150
-on anonymity a stable
+in the page table
0:48:16.150,0:48:17.229
-and sir
+and so
0:48:17.229,0:48:21.109
-that sort of thing can apply even to allow
-me to sing population
+that sort of thing can apply even to apply
+to our Linux using population
0:48:21.109,0:48:23.969
-a previous the summer winds there
-
-0:48:23.969,0:48:26.309
-on
+could give FreeBSD some real wins there
0:48:26.309,0:48:27.579
-Michael what did that work
+I’d like to look at
0:48:27.579,0:48:30.859
-on the whole on the planet who is leading
-uses proliferates
+more on the point of isolating users from kernel upgrades
0:48:30.859,0:48:32.620
one of the issues we've had is that
0:48:32.620,0:48:34.019
-you will need to win in the fall
+when you do a new bump
0:48:34.019,0:48:38.399
-we have reasons to depend on all sorts of
-a also to libraries immediate which
+we have users who depend on all sorts of libraries
+immediate which
0:48:38.399,0:48:41.380
-you know the vendors like to have them to
+you know the vendors like to rev them to
do
0:48:41.380,0:48:44.640
-stupid eight API briefing change is fairly
-regularly said
+stupid API breaking changes is fairly
+regularly so
0:48:44.640,0:48:48.380
-it be nice for users if we can get all the
-benefits to cooperate
+it’d be nice for users if we can get all the
+benefits to kernel upgrades
0:48:48.380,0:48:51.699
-and you wouldn't have taken operate at their
-leisure
+and they could upgrade at their leisure
0:48:51.699,0:48:54.459
-so we're hoping to be that in future as well
+so we're hoping to do that in future as well
0:48:54.459,0:48:57.809
-all would like to see more women Sunday with
-type resources
-
-0:48:57.809,0:48:59.219
-%um
+we’d would like to see more limits
+on bandwidth type resources
0:48:59.219,0:49:01.199
-for instance a limiting the amount of
-
-0:49:01.199,0:49:02.910
-%um
+for instance say limiting the amount of
0:49:02.910,0:49:05.649
-it's it's really what you want you know it's
-like you know
+it's fairly easy to know the amount
+of sockets I own
0:49:05.649,0:49:10.279
-all but it's her if you want a place of for
-a limit on networking with by a particular
+but it’s hard to place a total limit on
+network bandwidth
0:49:10.279,0:49:11.819
-process
+by a particular process
0:49:11.819,0:49:16.979
-all our store almost forced to resign one
-and ask how do you classify that traffic with
-
-0:49:16.979,0:49:17.649
-that
+when almost all of our storage is on NFS
+how do you classify that traffic
0:49:17.649,0:49:21.259
-after going to change the current somehow
-taking that
+without a fair bit of change to the kernel
+and somehow tagging that
0:49:21.259,0:49:23.799
-it's an interesting challenge
+it's an interesting challenge.
0:49:23.799,0:49:28.309
we'd also like to see it could be needed some
-you implement something like blacks the irish
+you implement something like
0:49:28.309,0:49:30.089
-job when you're out there
+the IRIX job ID
0:49:30.089,0:49:34.099
-so I was scheduled to just hang out at processes
-as part of a job
+to allow the scheduler to just
+tag processes as part of a job
0:49:34.099,0:49:36.309
-%uh currently
+currently
0:49:36.309,0:49:38.939
-I've heard it uses a clever but people past
+I've grid engine uses a clever but evil hack
0:49:38.939,0:49:40.010
-were they out
+where they add
0:49:40.010,0:49:42.509
-an extra boost to the process
+an extra group to the process
0:49:42.509,0:49:44.819
-and they just outrageous troops there
+and they just have a range of groups
0:49:44.819,0:49:48.209
available so they get inherited in the users
-can drop them said
+can’t drop them so
0:49:48.209,0:49:51.889
-thousand track process that it's about what
-happened with the correct limits on the number
+that allows them to track the process
+but it’s an ugly hack
0:49:51.889,0:49:57.499
-of groups that can become a real problem
+and with the current limits on the number of groups
+it can become a real problem
0:49:57.499,0:49:59.529
-actually for raising questions
+actually before I take questions
0:49:59.529,0:49:59.980
-argumentative
+I do want to put in
0:49:59.980,0:50:01.119
one quick point
@@ -3550,32 +3435,33 @@ the think it's not interesting you live in
the area and if you're looking for
0:50:05.100,0:50:06.430
-looking for jobs
+looking for a job
0:50:06.430,0:50:09.780
-we are trying to hire few people it's difficult
+we are trying to hire a few people it's difficult
to hire good
0:50:09.780,0:50:13.069
-we do have some some openings and you're looking
+we do have some openings and we're looking
for
0:50:13.069,0:50:17.409
-PSD people in general system ads
-people
+BSD people in general system
+Admin people
0:50:17.409,0:50:24.409
-so questions
+so questions?
0:50:38.419,0:50:40.989
-%um
+Yes
+(inaudible question)
0:50:40.989,0:50:45.719
-I would would expect that to happen
-but it's not something that attempted to test
+I would expect that to happen
+but it's not something I’ve attempted to test
0:50:45.719,0:50:50.570
-what I would really like is a topology allocator
+what I would really like is to have a topology aware allocator
0:50:50.570,0:50:53.179
so that you can request that you know I want
@@ -3584,139 +3470,129 @@ so that you can request that you know I want
I want to share cache or I don't want to share cache
0:50:56.229,0:51:00.170
-I want to share memory band width or i want to not share memory band width
+I want to share memory band width or not share memory bandwidth
0:51:00.170,0:51:02.459
-actually open MPI of three
+open MPI 1.3
0:51:02.459,0:51:08.469
-on the 1x side have a topology where a rapper for that CPU
+on the Linux side have a topology where a wrapper for their CPU
0:51:08.469,0:51:10.159
functionality
0:51:10.159,0:51:12.249
-makes it something called %uh
+makes it something called
0:51:12.249,0:51:14.139
the PLAP
0:51:14.139,0:51:15.259
-portable lenux
-
-0:51:15.259,0:51:16.519
-%um
+portable Linux
0:51:16.519,0:51:19.599
-CPU allocator.Is that what
+CPU allocator. Is that what
it's actually been
0:51:19.599,0:51:21.959
-what it would act the act on his part
+what the acronym is
0:51:21.959,0:51:25.400
in essence they have to work around the fact
that there were three standard
0:51:25.400,0:51:27.809
-there there were three different
+there were three different
0:51:27.809,0:51:31.759
-currently the eyes for the same siskel
+kernel APIs for the same syscall
0:51:31.759,0:51:38.759
-first the EU or takes to get all the letters
-to get himself some have used the same number
+for CPU allocation because all the vendors
+did it themselves somehow
0:51:38.769,0:51:44.969
-they're completely about some of these guys
-were for routine
+they're the same number but
+they’re completely incompatible
0:51:44.969,0:51:48.749
-when you first saw the application it calls
-and this is called a test of figure out which
+when you first load the application it calls
+the syscall and it tries to figure out which
0:51:48.749,0:51:50.579
one it is
0:51:50.579,0:51:52.719
-whitewater reserve returns depending on what
+by what errors it returns depending on what
0:51:52.719,0:51:56.139
are you missing and completely evil
0:51:56.139,0:52:00.859
-I think people support the heat very the eye
-and we should have their lives were that
+I think people should port their API
+and have their library work but
0:52:00.859,0:52:05.650
-we don't need to do that job because they
-didn't make that mistake
+we don’t need to do that junk
+because we did not make that mistake
0:52:05.650,0:52:12.650
-so I would like to see it of all its universal
-particular
+so I would like to see the
+topology aware stuff in particular
0:52:30.710,0:52:32.529
-yes larry King
+(inaudible question)
0:52:32.529,0:52:37.180
-the trick is to do you want to be leaving
-it's he's he's a limited application
-
-0:52:37.180,0:52:38.869
-then with
-
-0:52:38.869,0:52:39.500
-on
+the trick is it’s easy to limit application bandwidth
0:52:39.500,0:52:42.269
-there's no easy limit application
+fairly easy to limit application bandwidth
0:52:42.269,0:52:44.329
-the defense more difficult when you have to
+it becomes more difficult when you have to
0:52:44.329,0:52:45.430
-the if you're
+if your
0:52:45.430,0:52:49.759
-the new faces shared between application traffic
+interfaces are shared between application traffic
0:52:49.759,0:52:50.880
-%um
+and
0:52:50.880,0:52:53.049
-they had a fast
+say NFS
0:52:53.049,0:52:57.399
-getting classifying that is going to treat
-you have to take you what the data for particular
+getting classifying that is going to be trickier
+you have to tag you’d have to add a fair bit of code
0:52:57.399,0:53:04.399
-to to trace that down through the current
-interim I said certainly do
+to trace that down through the kernel
+certainly doable
0:53:12.069,0:53:15.499
-I I am
+(inaudible question)
0:53:15.499,0:53:18.389
-I I have talked contemplating doing just that
+I have contemplated doing just that
0:53:18.389,0:53:22.059
-or in fact %uh the other thing we consider
+or in fact the other thing we consider
doing
0:53:22.059,0:53:24.829
-morris a research project that is not practical
+more as a research project than is a practical thing
0:53:24.829,0:53:26.719
-jane would be actually help
+would be actually how
0:53:26.719,0:53:28.619
-with would be
+would be
0:53:28.619,0:53:30.029
-independent he lands
+independent VLANs
0:53:30.029,0:53:31.839
because then we could do
@@ -3725,129 +3601,118 @@ because then we could do
things like
0:53:32.459,0:53:35.489
-the peace process of the lambs' the couldn't
-even
+give each process a VLAN they couldn't even
0:53:35.489,0:53:37.979
-sheer at the internet where
+share at the internet layer
0:53:37.979,0:53:41.259
-once the images in place for instance we will
+once the images’ in place for instance we will
be able to do that
0:53:41.259,0:53:45.049
-and that say you know you've got your cases
-jurors whatever
+and that say you know you've got your interfaces
+it’s yours whatever
0:53:45.049,0:53:46.479
-on but then we can't win it
+but then we could limit it
0:53:46.479,0:53:49.959
-we cannot we could raise when did that occur
-on can also have
+we could rate limit that at the kernel
+we can also have
0:53:49.959,0:53:54.729
-we got of physically isolated we got a lot
-of this would have been at work as well
+we’d have a physically isolated
+we’d have a logically isolated network as well
0:53:54.729,0:53:57.589
-with some analysts which is we can actually
-raped women
+with some of the latest switches we could actually
+rate limit
0:53:57.589,0:54:04.589
-in this without the switch as well
+at the switch as well
0:54:19.939,0:54:22.369
-Bob so that's the first question
+(inaudible questions)
+so to the first question
0:54:22.369,0:54:26.190
-we get more and more visible sensitivity cannot
-miss foster's
+we don’t run multiple
0:54:26.190,0:54:27.639
-last night the oscars
+sensitivity data on these clusters
0:54:27.639,0:54:28.709
-I'm sorry
+unclassified cluster
0:54:28.709,0:54:30.460
-we've avoided that problem I think
+we've avoided that problem by
0:54:30.460,0:54:32.299
not allowing it
0:54:32.299,0:54:34.929
-but is it is a real issue
+But it is a real issue
0:54:34.929,0:54:36.939
-she's not one we've had to do it
-
-0:54:36.939,0:54:39.559
-%um
+it's just not one we've had to deal with
0:54:39.559,0:54:42.109
-in practice was stuff that sensitive
-
-0:54:42.109,0:54:43.059
-%um
+in practice with stuff that’s sensitive
0:54:43.059,0:54:47.579
has handling requirements that you can't touch
-the same program without a strong said
+the same hardware without a scrub
0:54:47.579,0:54:49.859
-you need a mystery
+you need a pretty
0:54:49.859,0:54:51.739
ridiculously aggressive
0:54:51.739,0:54:53.770
-you need a very close to a new letter to the
-end
+you need a very coarse granularity
0:54:53.770,0:54:57.240
-a ridiculous noted aging process that you
-never moved over there
+a ridiculous remote imaging process that you
+moved all of the data
0:54:57.240,0:55:00.959
-so if I were to do that until we get rid of
-that test
+so if I were to do that I would
+probably get rid of the discs
0:55:00.959,0:55:01.389
just
0:55:01.389,0:55:02.400
-the witness list
+go disc less
0:55:02.400,0:55:04.910
-don't get rid of my number-one failure case
+that would get rid of my number-one failure case
of
0:55:04.910,0:55:07.839
that would be pretty good but
0:55:07.839,0:55:09.419
-but havent done that
-
-0:55:09.419,0:55:10.609
-on
+but haven’t done it
0:55:10.609,0:55:13.819
-so we we've had occasional problems of NFS overloading
+NFS failures we've had occasional problems of NFS overloading
0:55:13.819,0:55:15.679
we haven't had real problem
0:55:15.679,0:55:19.279
-we're all local network is fairly tightly
+we're all local network it’s fairly tightly
contained so we haven't had problems with
0:55:19.279,0:55:20.539
things
0:55:20.539,0:55:21.819
-%uh with
+with
0:55:21.819,0:55:26.039
you know the server going down for extended
@@ -3857,21 +3722,21 @@ periods and causing everything to hang
it's been more an issue of
0:55:27.819,0:55:33.189
-I mean there there isn't there's a problem
-that Panache is described as in cast
+I mean there isn't there's a problem
+that Panasas is described as in cast
0:55:33.189,0:55:36.109
-you can take out any NFS
+you can take out any NFS server
0:55:36.109,0:55:40.809
-I mean we have the BLueDrat guys come in and the
-PGA this stuff multiple ten-gate I said
+I mean we have the bluearc guys come in and the
+PGA based stuff with multiple ten-gig links I said
0:55:40.809,0:55:42.049
you know I've got
0:55:42.049,0:55:46.779
-to do this and they said can we not try this with all your cluster
+to do this and they said can we not try this with your whole cluster
0:55:46.779,0:55:47.950
because if you got
@@ -3880,11 +3745,11 @@ because if you got
three hundred and fifty
0:55:49.370,0:55:52.599
-gigabit ethernet interface is going into
+gigabit ethernet interfaces going into
the system
0:55:52.599,0:55:56.589
-even ten gig you can saturate pre-turbulate
+Even ten gig you can saturate pretty trivially
0:55:56.589,0:55:57.120
so that level
@@ -3893,20 +3758,21 @@ so that level
there's an inherent problem
0:55:58.930,0:56:01.969
-on we need to handle that kind of band width
+on we need to handle that kind of bandwidth
we've
0:56:01.969,0:56:04.459
-got to get it a parallel file system
+got to get it a parallel file system
0:56:04.459,0:56:06.069
get a cluster
0:56:06.069,0:56:12.289
-before doing streaming stuff we could go file some loners
+before doing streaming stuff we could go via SWAN or something
0:56:12.289,0:56:14.949
anyone else?
0:56:14.949,0:56:15.429
-thank you
+thank you, everyone
+(applause and end)