Queueing Theory Applications, Articles, and Video Tutorials

Yes, Queueing Theory is for EVERYBODY: for you, your mama, your papa, and your baby’s mama. Queueing Theory Applications – for real. But really, there are applications of Queuing Theory everywhere…you don’t believe me? Continue reading. Have you ever waited in line somewhere and thought to yourself: “There’s got to be a better way than just wait here all of my life?” Well, you’re right. The science that looks into the aspects of waiting and in managing lines – at all sorts of venues from the Airport to Amusement Parks; from Restaurants to the Hospital – is called Queueing Theory. Even more effective is to use Queueing Theory with Lean – together they make for an effective approach to problem solving. Let’s dive in.

Introduction to Queueing Theory

Queueing Theory Applications

Queueing Psychology or Psychology of Waiting Lines

Queueing Theory Case Studies

Queueing Theory Video Transcript

Instructor: I’ll start off on just talking about queuing theories, an introductory class on the topic of Queuing Theory. And the idea is basically like this, if you have a queue and this will the schematics that most text books will use, we’ll have some server, something that handles work coming. And then the things that need work to be done queue up behind it. This could be a cashier at a supermarket and these are the people online. This could be a processor in a computer system. And these are processes sitting in the ready queue and the one that’s at the server is the one that’s in the running state. But the theory will end up being exactly the same. So some of the notation will be… its popular to use the Greek letter lambda to mean when you’re starting a queue like this, the rate that’s coming into the queue, like two per second or something like this. Then we need to know the speed of the processor or the cashier at the supermarket. At what speed can they get customers satisfied or done and leave the system?

So… mu ends up being the Greek letter that’s popular to use for this the rate at which the server can service. T ends up been the mean service time for each arrival. So the mean service time will be the reciprocal. If it can do three per second, then is one third of a second per person or per process. Another interesting thing we like to keep track of is what percentage of the time is this server in use? So we can have a queue where it’s been used 70% of the time and that will be pretty good. It will be great if we can get it up to 100%, because in an operating system sense every process has to be … has to do all of its processing before it can finish. So one process is running and then goes off to a disk drive, we’ll let another one in. And then this … the processor is still working. But if it ever gets to the point where everything is waiting for something and nothing can run, now we have this processor that’s just sitting there and not processing anything.

So we’d like to figure out what percentage of the time the server is running and trying to manipulate it for an hour so that it gets close to 100%. It can’t go over 100 obviously. So close to 100% is what we’d like to do. Other things that we might want to know, what is the mean number of customers in the system like at any given time, if something is running and what one is in the running state and the ready state and the total rate. We might want to analyze how many are in the queue at a time. The response time for any system would be the time form when you came in until the time you’re done. That would be the response time or the turnaround time. The through pit of the system is the amount of work it can get through per time so if it can do four jobs per minute that its through put.

What we’ll be doing a little bit later is we’re going to take queues like this and say the output of one queue is the input of another queue. So for example an operating system should have a process this running, it’s running for a while.And then it asks for a resource that’s not available so it gets moved to a queue but there’s other things in front of that and so we have to calculate how long does that queue take to run. And then once that’s done it might come back and feedback it to you. So we’re going to end up having like a network of queues where the output of one server goes into another queue and that one could wrap around to here.

A bottleneck device ends up being… we could have a network of queues running with processes bouncing around from queue to queue, and the one where the server gets as close to 100 … or the one that’s where it’s going to hit 100% is the point at which that queue is considered saturated. So if you think about this, suppose the amount of work coming into a queuing system is more than the rate at which it can get out, what would happen? Not only theoretically but what would happen in actuality, if we had a queue? Is anyone… just an idea… it’s an intuition thing but if the amount of work coming in… the rate at which work coming in couldn’t be processed fast enough, what eventually happen to the queue? Yeah.

Male: [inaudible 00:04:42]

Instructor: It’s will what?

Male: [inaudible 00:04:44].

Instructor: I mean yeah, it will like build off to infinity. It’ll be like more people are coming into the supermarket than the cashier can handle. Theoretically, there’ll be infinitely many customers not being satisfied. So when we network a bunch of queues together, if things are bouncing around whichever one of those queues where the server gets to a 100% use that’s it, it can’t… the whole system can’t handle anymore because one of the queuing systems hit its maximum utilization. And that queuing system would be the bottleneck. And then once the bottleneck hits 100%, we really can’t let more processes in because that queue would then theoretically build up to infinity and then nothing will get done. So once the bottleneck system reaches 100% utilization, that’s it, our system can’t take anymore. So when we analyze a queuing system, we’re going to use this notation. It’s called Kendall notation. And we’re going to say… we’re going to have the probability distribution for the arriving… the arrival distribution.

The second one could be the probability distribution for the service time. The third one could be the number of servers. The fourth one could be the capacity of the queue. Just the last example, there was an infinite queue but we could have a capacity on the queue. The N will end up being the number of customers in the system. We could have an open system where they could keep coming in from the outside world, or we could have a closed system, then we say there’s only 10 processes bouncing around. And then Z would be the queuing discipline like first come first serve. But there could be other queuing disciplines. So for example if we had a queue that was… we can call it A/B/C for example. What we’re doing is we’re saying not only do we have an arrival rate, which we’re going to use lambda to be the arrival rate, but the distribution at which they come in. So, have you guys taken a probability class?

So you know how like… you could have a completely random distribution like maybe… sometimes called the Poisson distribution, something happens at a rate of one per second. But how… when the next event from the previous one is completely unrelated. You could also have what’s the other extreme, which is a deterministic event. So you could have, one comes exactly every second. This is an arrival. So they both have an average of one per second but one of them comes every second and the other one comes totally random with an average of one per second. So just because we know the rate, one per second or 10 per second, we still need more information about how the distance between each one is. So this would be the distance between the arrivals, the distribution for the service time too. We could have a service time where like for example, a processor a lot of times, it runs for a fixed amount of time and then kicks somebody out.

So every time somebody goes into the processor, we could say it runs for a fixed amount of time and runs out so that might be a deterministic service time what… things that… processes that arrive to the queue come in randomly. So we could end up having something like a Markov distribution of where they come in and then maybe a deterministic distribution of when they go out. And then there’s all different types of, if you remember from your probability class, the different types of arrival times, and we’ll cover some of those in future examples. So just for example of using this notation, suppose we have a system like this. In this case our queue has exactly four seats in it. So it’s not like an infinite queue. That’s four seats. We have three servers. If we think of this as three cashiers at a supermarket, and then we have a finite number. This is a closed system. So, new jobs or new processes aren’t coming in from the outside. It’s a closed system.

And let’s say hypothetically, the amount of time they stand here is what we’re calling markovian, meaning they have Poisson [SP] distribution between them. So they could go into this queue at a markovian distribution but them let’s say the service take the same exact amount of time every time. So maybe that’s a deterministic service time. So we could say the arrival of markovian distribution of the service times are deterministic. There are three servers. The queue has four seats in it. It’s a closed system with a total of 12 processes going around and around. And when they arrive at the queue, they are processed in a first come first served. We could later do things like in the queue, we can say we could have priorities like high priority ones get pushed to front and little priorities gets pushed to the back. There are other things besides first come first served. But first come first served is like the intuitive one. And if you leave this out, it is assumed, it’s first come first served. It’s like the default.

And if you leave any of the other ones out, the 12… if you left that out, it’s assumed it’s an open system. Anyone can come in from the outside so it’s not a fixed number. The fourth… if you leave out the number of seats by default, it’s considered an infinite queue. So we called it a… a fuming system… one of the most common ones we’ll follow is the one that’s MM1, meaning an infinite queue one server and there’s a rate it’s coming in but the inter-arrival times between the rates are random. And the service time is also random. Okay, then there’s a famous result called Little’s law and some text books refer it as Little’s results. And it’s an intuitive result about arrivals into a process. So once we have here… lambda is our arrival rate. W of the queue is the mean time… process spent sitting in the queue. The number of items in the queue could be N and so queue… and then we could do the same thing for the whole system… the wait time and the number in our system.

And Little’s result… the Little’s law is basically a formula that says the number in the system equals the rate into the system times the wait time. So hopefully, this is an intuitive example. But if we have 10 people walk into a supermarket every minute and on average they stay there for 30 minutes. Then at a random point in time, how many people are in the supermarket? So I’m just hoping like the intuition so… it’s a rate of 10 per minute so you might think well maybe there’s 10. But that will be true if they stay for one minute. If they stay for two minutes, then there’ll be on average 20 people there and so on. And if on average they stay for 30 minutes, then it will be 10 times 30 which will be 300 people on average. Now if they came in schedule like, let’s say every minute exactly… let’s say 10 people come in exactly at the beginning of every minute and 10 people leave at exactly at the end of every minute. Then there might be only 10 people in at a time but if they can come in and stay random amounts of time some, some people can stay an hour some people leave after two minutes, but on average they stay for 30 minutes. Then this is how many people would be in the system on average.

And the principle applies to the whole system, which is the queue and the processor. And if we wanted to just focus on the queue, same principle applies for the queue. Okay, so take a look at this MM1 queue. So we left out a lot in that Kendall notation. We left out a lot of stuff. So the defaults are, first… it’s going to be first come first serve. The queue size is going to be infinite. The inter-arrival times are markovian. The service time distribution is markovian. One is for one server. It’s an open system. So there’s no number for number in the system. And so let’s say we consider an arrival rate, lambda of one per second and a service time of mu one per second. So let’s just think of some interesting questions. What is the utilization, which is the Greek letter ‘rho’ of the server. Another interesting question might be, what would be the mean time you’re in the system and maybe what is the mean time that you’re in the queue? So first of all, just for a minute, let’s get some intuition going on or maybe think about some bad initiation we don’t want to get used to.

If the rate into the system is one per second and this thing can get the processes served at a rate of one per second. So let’s say, one person is coming up per second to a cash register and the cash register can… well, one per second is quick. But the cashier can get the people out at a rate of one per second what… how big would the queue be? It’s just like giving intuitive guess so it’s…

Male: Coming in at one per second.

Instructor: Right.

Male: And it takes one second to process it.

Instructor: Yeah the processor or the cashier whatever you want to think of it as can move at a speed of one per second. If you give it no work, it obviously does no work but if you gave it like tons of work, it can get the work done at one per second.

Male: So on average is like maybe one…

Instructor: One.

Male: Or one being processed.

Instructor: Okay, well let’s say… okay so let’s do the example of queue that is DD1, meaning a deterministic arrival time like it’s perfectly scheduled, and when something arrives, it will take exactly the same amount of time every time. In that case, at time equals zero, one arrives and starts being served. At time equals one, the first one leaves as the second one arrives. so it never really gets into the queue, it jumps right into the server. And this will go on all day. So if it was a DD1 system, then there would always be one process in the server and nothing ever in the queue, because it’s perfectly scheduled. Now suppose it’s the other extreme, where it’s MM1. So the arrival rates that coming in a rate of one per second but they could really… there’s a total randomness to the time between arrivals. We could get three in one second and then we could wait 10 seconds and have nothing but on average it’s one per second.

And the same thing is true with the server, with that circumstance make it different. Now you can see there’s going to be times where things will be queuing up if they come in, in bunches or if the server like spent a really long time for one customer and then a really quick time for another one. While it’s spending a long time, the queue might build up. So you can see the difference that even though the rate how many come in per second and how many can get done per second, the distribution between the events actually will affect the queue. So let’s say… initiation might lead you to believe that since the arrivals are one per second, then the time is one per second. Therefore, every arrival is served in the exactly one second. And the pervious customer leaves when the new one comes in. So you might think that the queue is of size zero. But if you think about it, it really depends on these letters up here… could cause the queue increase in size and decrease in size.

But what about the utilization? If we take into account that at some times there will be some things in the queue, and other times there won’t because of randomness of the arrival and the completion, with the ‘rho,’ the percentage of time that the server is in use. Obviously, it can never go above 100%. So, it’s somewhere between zero and 100. Could it ever go… would you think it might go less than 50%… less than 100%? Do you think it might… there might be sometimes when it’s not being used? And there’s two answers to this. When the system first starts off, you might have a scenario where it’s not being used, like maybe the first customer comes in and it gets done immediately and then there’s a little wait and sitting there doing nothing and then maybe a second and third and fourth come in and now the queue starts to build up.

And the queue might build up to like three of four, then go back to zero, then build up to three or four and go back down to zero. But once it’s out of a steady state, it should eventually get to the point where it’s running at 100%. It’s just that it can’t… it couldn’t get the work out fast enough so the stuff will be queued up and there might be little areas where for a fraction of a second, it’s not being used. But it will be 100% or extremely close to 100% over a long period of time. So the utilization ends up being… we can think of a formula now as the rate in divided by the rate out. And that would be independent of the input distribution and output distribution. So for example, if we had a rate of one per second coming in and this thing can handle like two per second and it’s like twice as fast, then this thing will be in use for about 50% of the time. If one comes in per second and it can move at a speed of getting two out per second, this will be in use half the time and half the time, it will be sitting and waiting.

Okay, so then what we want to do little later on down the road is we want to start taking inputs from different areas and running them into the same queue. I just want to work on a little intuition here. But this could be an ND3 system. So the, three means three servers. So, if we had for some reason two different inputs coming into the same queue, what would be the rate into the queue? Hopefully, this is a pretty simple intuition. But if one source that was sending inputs into a queue was coming in at a rate of two per second, and another source was sending processes into the same queue at a rate of one per second, then from the point of this queue, how many are coming in? It will be the two added together, right? So, basically we’re just taking all the inputs sources and adding them together. So, let’s see the question, what is the average utilization of the server? Okay, the average utilization of the server we said was the rate in divided by the rate out. So, we got to figure out the rate in first, the rate in is these two added together.

So the rate into the queue is lambda one plus lambda two which is two per second plus one per second equals three per second. The speed at which the server is run is one quarter. So what that means is that it can do, the rate is one whatever job per second. Okay, so ‘mu’ is four. Wait a minute, this should be… ‘mu’ is four this should be … this should actually be … this is the reciprocal of the rate. So this should be… it can process four jobs per second, which means they each actually take one-fourth of a second. This should be a different variable. Meaning the reciprocal of ‘mu’ should be, probably X is the most common one. But anyway we could do four jobs per second on each of the three processors, which really means we have the ability… this system together has the ability to get 12 jobs per second done. And we’d have to some… have some traffic top here saying who’s next who’s next and distribute the jobs out. Assuming that process doesn’t take up any time, that instantly as soon as one of these becomes available we grab what’s on the queue and move it there.

As long as that’s the case, we don’t lose any time for that. We have the ability to handle 12 processes per second with our serving system. So three are coming in per second and 12 are going out per second. So on average each one of these servers will be in use 25% of the time and 75% of the time they’ll just be sitting there waiting for work to come in. And like I said, what we’re doing is when we start networking things together, we honestly cannot physically make any server go above 100% utilization. That will always be the cap. So let’s say, we have an example where this box is supposed to be a processor. I don’t know… have you guys ever used the tool Visio Pro Pictures, Microsoft India? But anyway they have icons for different things and this is the icon for processor. It’s like a… it’s probably… this is probably like 20 years old but it looks like the old computer towers, which have everything including the processor and disk drive in it.

But anyway, so if you get used to that… this is the processor and this is the disk drive. So suppose we’re coming into the processor and obviously inside the processor is a queue… things are queuing up inside the processor. And something runs on the processor for a while and one of two things will happen. Either it runs… so this is a pretty simple example. It will either run to completion. So in this case, there’s no time outs. It either runs to completion and leaves the system. So anytime a process enters into this queue, there’s a 50% chance it will run to completion and leave the system. And there’s a 50% chance it will run for a little while and then need a page from the disk drive. It’ll say “oh, I’m stuck. I need a page in memory. It’s not here. I can go up to the disk drive, bring it in and then once it comes back in, then I need to start running again.” So 50% of the time, it will go out to the disk drive and the disk drive will bring the page back in so the process will then start up again.

Okay, so in this system, we really… we now have two queuing systems, one which it’s output, a percentage of the time will go into another queuing system and a percentage of the time it will leave. So if we wanted to know on average… a question like this, on average how long would it take for a process that’s coming into our system to leave? How… what things would we have to calculate and add up? So, you know what I mean by the question? What we’re saying is in this case, a process could come in here, sit in this queue for a while, then run on a processor for a while, and maybe it’s done or maybe it’s not done. Maybe it will go then sit on this queue for a while, get served by this disk drive for a while, then come back and start all over again, and the question we’re going to try to answer is what’s the turnaround time. From the time you enter our system to the time you leave… just want to know on average how long does the process take to finish.

Like I said, this is a very almost over simplified example. But if we wanted to do that what would we have to calculate and then add up? How would you calculate it, yeah?

Male: You can find the average run time for each operation in the process.

Instructor: The average time it takes to get through here from here to here?

Male: Yeah.

Instructor: Okay.

Male: And then you just, I guess divide it by two or multiple by two that’s all I can say.

Instructor: So let’s just say hypothetically it takes on average two seconds to get through here. And the disk drive is slow, it takes eight seconds to get through here. So would it be 10? Would it be one visit to this queue plus one visit… so in other words just like in English, what are we looking for? We want to know how many times do you visit this queue because it’s not as simple as you just visited once and you’re done. Yeah?

Male: Okay, so it’s… if it only goes through the processor just…

Instructor: Right, right.

Male: But if it [inaudible 00:27:52] disk drive and then it goes back to the processor…

Instructor: And starts all over again.

Male: Oh that …

Instructor: Like as if nothing happened so that will take up a certain amount of time and then you’d have to actually start over again. So it’s an interesting question and it’s not… don’t try to think of a real process running on an operating system doing this. This is… a real one wouldn’t do this. A real picture would be more like, you hit the processor, many times you run and then time out. So you go right back into the processor queue. Sometimes you go out to a disk drive. Sometimes you’ll get put on a [inaudible 00:28:20] queue. That would be more realistic. So we’re just trying to put out an example together that’s not very realistic, just so we can follow it. So what we want to do is we want to ask ourselves how many times on average do we even enter this queue? Then, we want to ask ourselves how much time do we spend at this queue per visit, add that up or multiply that. Let’s say on average we visit this four times and it takes two seconds each. Well that’s… we’re going to spend eight seconds on average here.

And then let’s say, we visit this queue less often but maybe it takes more time. We’ll calculate how many times on average do we hit this queue and then add them all together. So for example, if the end numbers came out we spend… we visit this thing four times for two seconds each, we’re going to spend eight seconds here. And if we visit this one time for 12 seconds, we’ll spend on average 12 seconds each. So it’ll be eight plus 12, all of it would be 20 seconds. So, it’s how many times do we visit each queue times how much time do we on average spend there and then add up all the queues we visit and how many times we visit them. So that’s why we wanted to know… we want to be able to come up with a number… how many are in the queue in front of us? How many are… one will be in service at a time. So when we get put at the end of the queue, how long will it take us to get to front of the queue. So we know how long we spend per visit at each queuing station, yeah?

Male: You seem to make it look like an [inaudible 00:30:01].

Instructor: Yeah it ends up being… yeah, it ends up being… it’s a goldfish [SP] and how many times we hit the queue, time at the queue, add it all up. So yeah, it’s pretty simple [inaudible 00:30:09] of waiting. But we want to figure out how many times… how many times… so, how would you answer this question? This is a… you might recognize this question from your probability class. But if you come into this queue, how many times would you expect to visit the processor on average? Have you ever seen this question? You might have seen this with a coin in your probability class. But if you come into this processor, you’ve been… how many times are you going to visit? You’re going to definitely visit it once because you always thought of hitting a processor, 50% of the time you’ll visit it once and you’ll be done. So half the time you’re going to visit it once. And the other half, you’re going to visit it how many times? One, at least two and possibly more right, right?

Male: [inaudible 00:30:50] … 50% of the time… run time is processor time and 50% of the time, it’s processor time plus the disk drive.

Instructor: Oh, that’s not bad.

Male: Plus the average run time, I guess wouldn’t it?

Instructor: Well, it’s 50% time it’s one pass of the processor and the other 50 it’s processor plus disk plus…

Male: Plus an average run time.

Instructor: Plus the time it takes, assuming it’s starting all over again? Okay, yeah that’s a good way to think of it. But if we wanted to just… that’s what we’re doing here. So we’re saying… you’re taking the… you would take that number and then add up the time at each of these and multiply that by how many times that number comes up. So, we’re doing the same thing. If we wanted to calculate how many times we expect to hit this processor… the comparable probability question ism if you have a coin that has a 50, 50 chance of heads or tails. How many times would you have to flip it until you get the first heads? So the answer is, half the time it’s going to happen on the first try and half the time it will not happen on the first try and then you’ll have to start over. It doesn’t mean it’s not going to happen on the second try. So, it will be like, half the time it will be the first try and 25% of the time it will be the second try, and then 12 and a half percent of the time it will be on the third try and six and a quarter… so on.

But what does that all average out to. So you can add up that series. One plus half plus quarter plus an eighth plus a sixteenth, add that whole thing up. What does that add up to? So it’s one way you can do it with an infinite series and that would add up to two. But another way of thinking it is you can just do and use expected values. So we could say the question is, what is the expected number of visits to the disk drive and what is the expected number of visits to the CPU? So the disk drive… I got these backwards, sorry. Okay, the expected number to the disk drive would be half the time… for this picture, how many times do we go to the disk drive? Half the time, it will be zero and half the time it will be one plus your starting all over again. So, we can just numerically say the expected number of visits will be 80. Half the time it’s going to be one plus ‘E’ which is how many times you hit it assuming you’re starting over? So, the number of visits to the disk drive would be half of one plus whatever ‘E’ is. So now it’s inductive. So now, ‘E’ is equal to one half plus ‘E’ over one half and then we subtract the ‘E’ over a half on each side.

We have half an ‘E’ equals a half. So ‘E’ equals one. And if we did the same type of computation on the disk drive, how many times… I’m sorry, the CPU, how many times do we expect to go to the CPU? It’s a slightly different equation. We’re going to definitely hit it once for sure and we’re picking this up from the picture. We’re going to hit it once and then half the time we’re going to start completely over again. So, what does that add up to… the expected number of times. The expected number of times whatever that is, is equal to one plus half the time we’re going to hit it all over again. So ‘E’ is equal to one plus half of ‘E’. We’ll subtract the half of ‘E’ from… equals a half… one half of ‘E’ equals one. So ‘E’ equals two. So what we’re saying here is on average, a job… a process entering the system will hit the CPU on average two times. And what that really means… half the time it’s going to hit once, a quarter of the time it’s going to hit it twice, an eighth of the time it’s going to hit three times, a sixteenth of the time it’s going to hit it four times and so on.

But on average, it will hit this twice. And then doing the same math on average, we will hit the disk drive one time. Think about it, half the time we hit this… the CPU, half the time we hit the disk drive. So we actually could have calculated the CPU and then just take half of that number and that’s the number of times we hit the disk drive. So what we’re going to do is we’re going to ask ourselves on average how many times we hit each queuing system? We hit this one twice. We hit this one, one time. Now, we want to calculate how much time do we spend each visit here. Multiply that by two. How much time do we spend for each visit here, multiple that by one. And then add them all up and we’ve added up the whole queuing system. So, we know how many times we hit each queue. And now we’re not at the point where we can figure out how much time do we spend at each visit at the queue. So let’s just look at another system. Let’s say we have… this is from an older text book. This is terminals. You could think of this term… and this example is were 18 people sitting at a terminal and every once in a while, they’d hit enter and send something into our system.

And then our system takes it. It’s bouncing around in some queues and it eventually comes back. So this ends up being a closed system and the average person is thinking for, on average, 18 seconds. So, someone is monitoring your system and says. “I know that the average person thinks for 18 seconds. Then sends something into our queuing system. So in our system, I’m now giving… just to have a different notation, I’m now saying instead of this being lambda… I’m sorry, instead of these being ‘mu,’ I’m taking a reciprocal and saying it’s 0.01. So, the time it takes to service something is 0.01 seconds, which means it operates at rate of, it could do 100 per second. So just stating it in a different way. So now something comes into our CPU, 40% of the time it will go out to disk drive one, get some page to come in from memory and then go back to our processing queue, 50% of the time it will go to disk two and then come back to our system and 10% of the time it will finish. And what I mean by finish, it will come back to the person and they’ll see their answer and then they’ll go back to thinking for 18 seconds.

And then I’ll send in another process into our system. So, if we wanted to calculate how many times… how many visits do we do at each of these systems, it’s the same as the calculation before. It would end up being… well, let me just cover this thing intuitively. Every time you come into the CPU, 10% of the time you’re done and 90% of the time, 40% plus 50% get merged together and start over again. So, it will be like ‘E’ equals one plus 0.9 ‘E…’ 0.9% of the time you start over again. So equals one plus 0.9 ‘E’ would come out to be equals 10. The expected number of visits to the CPU… every time something comes out of here, it’s expected to generate 10 visits to the CPU, because 90% of the time it starts all over again, 10% of the time it finishes. So, if you do the same computation, you realize that everything coming out of this system gets multiplied, gets re-entered into the queue on average 10 times. So once we know what that is, we can take 40% of it and we know what the input into this queue is because this queue feeds queue directly. There’s nothing else coming into this queue.

This has only one input. So, if we know the rate into this one, we take 40% of it and we know the rate into this one. If we know the rate into this one, we take 50% of it and we know the rate into this. And then we want to know how much time is spent at each queue. So whatever the rate is into our overall system, let’s say we know… in this event, we don’t know what is the rate coming out of the terminals. The rate into the CPU is going to be 10 times that and then the service rate of that is… it can do 0.01 per second so 10 times 0.01. We can have… the percentage of the time that the CPU is in use is one tenth of whatever the rate into the system is. So again, these are more like the… this is a quintile theoretical example. We have some concrete numbers and then we have the rate into the overall system. So, the utilization… the percentage of time that disk drive one is running is four times the rate coming out of the terminals, times the services rate of disk drive one. So, basically whatever the rate is that jobs are being sent into our system, 0.352 times that number is the percentage of time that disk drive one is in operation.

So, disk drive one, the percentage of time this is doing something would be that number. So, whatever is in this queue, the queue sometimes might be empty and the disk drive might be sitting in idle and sometime it will be working. But the percentage of time that this is in use is going to be 0.352 times the rate that’s going into the system. If more work goes into the system, the rate would go up. If less work starts coming in, the rate will drop. And then the utilization… the percentage of time disk drive two is running, it’s five or half of the rate into the CPU. So it’s five times the rate coming into the CPU, is the rate going into disk drive two. And its speed… let me double check this. Yeah, its speed is 0.015. So, this is a slower… this is what… this is the rate… the time it takes to service one job so this is a slower disk drive than this one. So it’s, we have the fast disk drive and slow disk drive. And maybe our operating system has more pages on the one disk drive and bigger pages on the other, that’s why one is faster and slower than the other.

So anyway the utilization… the percentage of time that disk drive two is running would be the time into the system divided by the time out of the system. So, it’ll be 0.075 whatever the rate is into our overall system. So at this point even though we don’t know the rate into our overall system, if we knew this number, let’s say… so for example, let’s say the rate coming out of… all our people sending in processes was one per second. If it was one per second, then the rate one per second… the utilization of the CPU would be one tenth times one. The utilization… so this would be… the CPU would be in operation 10% of the time. The disk one would be in operation 35% of the time and disk two would be in operation 0.075% of the time. If we we went to two… or a rate of two per second, all of these rates would double, right? So, the question is what’s the maximum capacity… what is the most our system can handle overall? What is the most our system can handle? So, suppose we have three queuing systems and what did we say is the maximum utilization anyone can have, otherwise the queue is going to grow off to infinity? 100%, right?

Once anyone of our devices… it’s the… the processor on it is operating at 100%, that’s it. We can’t have more jobs into the system. Because then that queue out of all three of those devices, that queue would grow into infinity. If we started sending processes in at a faster rate. So, our overall system… our overall queuing system can handle at a maximum… whatever it takes to make the device that gets up to 100%, the fastest. As soon as one hits 100%, that’s it. We can’t add any more. So that’s going to be called… okay, so in this case because disk drive one is the one that will get to 100% the quickest that device is called the bottleneck device. And then the question is how much work would we have to give it so that it hits 100%? And that’s it, we can’t add more work into the system. So then what is the saturation point we can say, well, we spend 18 seconds thinking and then 10 seconds at the CPU for disk drive one and five at disk drive two. This would end up being the total response time. This is how much time it would take to respond and then if we take the total response time and divide it by the amount of time at the bottleneck device, we end up getting the calculation that we could add… we could have 52.63 jobs in this system.

Once we go above that, the bottleneck device, that queue will start growing to infinity. So. we really couldn’t handle more than that. This is an example we did a little bit earlier but suppose a process was running and it runs… if it has a page fold, we go out to the disk drive and we come back in. This is now queuing from the point of view… every instruction you run either the page is in memory or it’s not in memory, you have to go out and get it. So, whatever our hit ratio, that means we’re going to keep doing this, if the page is in memory and if we miss and we have to go out to the disk drive to get it that’s what our miss ratio is. And then we go back to the process and start running again. So, we want… in an analysis like this, we want the bottleneck device to be the CPU. The reason we want the bottleneck to be the CPU is our goal in managing our operating system is so that the CPU is as close to 100% as we can. We don’t want the CPU sitting idle while everything is queuing up on a disk drive. So, this is a theoretical model of the processor and the CPU.

So we want the CPU to be the bottleneck device. So once the maximum miss ratio for the processes in our system that allow us to get to the bottleneck of… have the CPU be the bottleneck. So the role of the CPU is lambda into the CPU divided by service rate of the CPU, which equals ‘t.’ We’re saying ‘t’ is the service rate of the CPU and ‘T’ is the amount of time spent at the disk drive. So, ‘t’ is the time spent at each visit of the CPU. So that’s the ‘t’ and ‘T’ and then the CPU would be the bottleneck if ‘t,’ the amount of time it takes to process an instruction of memory times the rate into the CPU is greater than or equal to ‘T’ times the miss ratio of the CPU. So, what we want is the ratio of ‘t’ to ‘T’ to equal… to be greater than or equal to the miss ratio. So, what does that end up being? Or maybe I’ll give a concrete example. Suppose the time to process an instruction that is in memory is one millisecond and the time to process an instruction that you have to go out to the disk drive, copy the page in and then run it, is one second.

So, this ratio one divided by 1000… this will be 1000. The miss ratio would have to be greater than 1:1000… I’m sorry the miss ratio would have to be less than 1:1000. So, if that was what the time for your system to process something out of memory is one millisecond and to get a page out of disk drive and then process the instruction is one second. We have to manage our operating system so that the miss ratio never goes above one divided by a thousand. If it goes above that, then the bottleneck would become the disk drive. So, as our operating system is managing… we know the speed of our process and we know the speed of our disk drive. We want to manage our processes’ hit and miss ratio. We want to increase the size of the cache or decrease the size of the cache so that the ratio of the time to process out of memory and the time to process out of the disk… we want the miss ratio to never go above that ratio. So, what that’s basically saying is when the operating system starts giving page frame out to a processor, “oh, we’ll give it three frames for this one five frames for that one.”

We want to make sure we give enough frames that the miss ratio never violates this. Otherwise, it doesn’t… it’s not to our advantage to let more processes into the system. How’s that? That makes sense that… we want to let as many processes into our system as long as the scenario where the CPU is sitting idle never happens. And we could let so many in that they will all get clogged up on the disk drive and then we’ve actually let too many in. So, there’ll come a point where letting processes in is good but we’ll hit a point where if we start letting more in there… each one is going to be slowing down the other ones and it defeats the purpose of letting more in. Okay.