In this chapter, we’ll talk about, scatter diagrams. Just what is a scatter diagram? It’s a statistical tool that we use to understand the correlation of a change in an input, or x factor against the output. Or Y result for our process. There can be many different X factors in a process that will have impact on the Y output. So, for example, if we are evaluating the speed in which an ice slush drink dispenses into our cup at the convenience store that may depend on the temperature or the weight or the volume of the drink. And as we make adjustments to the x factor, we’re observing what the impact of that is on our Y result.
Let’s discuss the correlation line in this meeting. So for example, if you are thinking about the temperature of the drink going into our Slurpee cup, we can make changes to the input or x factor in increments of temperature of ten degrees at a time. As we make this change, we monitor the output, or y result of the volume, or weight of the drink, to see how the changes to X may result in changes to the Y. In this case, we are determining if there is a correlation between the two. And if it is moving upward or downward. We evaluate the pattern of the dots. Do they tend to group tightly? Do they tend to follow a line? And if so, what is the shape of this line? Is it nice and straight, as illustrated in this example, or is it curved? Or does it change depending on the variation that we are making to that process input?
Let’s understand more about highly correlated data. So for example in a high positive correlation, we see the following. The dots are closely grouped points of data ascending from left to right in a definite trend. The dots are in close proximity to the average line, meaning there is a close relationship. As we increase the X value or input, there is an increase in the output value of Y.
Conversely, a highly negative correlation is very similar. The dots are closely grouped points of data, but this time they are descending from left to right. The dots are in close proximity to the average line indicating a close relationship. But as we increase the X value or input, there is a decrease in the output value of y.
Low correlations are illustrated by widely scattered data points. As we make incremental changes to our x value, in this case the temperature of our drink, we measure the outputs that we are getting per volume or weight of the drink. If the data is more scattered and we are only seeing a slightly ascending or slightly descending line, then we can draw the conclusion that we have a weak relationship. This means that we can’t predict the impact of the process x factor on the output. Then we have what is called nor correlation, with uniformly scattered points, and no discernible line. Not even a weak relationship. As we change the temperature of our drink, the resulting measurements for volume or weight of the drink were all over the place. Place. In this case we had to draw no correlation or conclusion about the relationship or the impact of the x-factor or input on the y-output.
In scattered diagrams we can also have a non-linear, or curved, linear correlation between the data. What’s interesting about this is that there are closely grouped points following a curve. And we have both ascending and descending behavior of this line over time with a very strong relationship. We can use this to predict the impact at given points in the process. It’s important to understand that you can have what is called cliff and peak relationships. So back to the example of varying the temperature of an ice slush drink. Let’s say that water is an important input of the process. Maybe as we increase the temperature, and water becomes steam, we reach a peak with the rising temperature of the water. But then, as it becomes steam, the properties of the water changed for a period of time in a way that impacted the weight and volume of the drink. And the cycle continued. Finally we have data points that we call outliers and they are important to understand as well. Perhaps we have a nice strong correlation with most of the data but we find a few data points that are way outside the groupings of data points. What’s going on there? There are a few possible explanations. It could be chance exceptions, or it could be just plain mistakes in recording the data. Perhaps the measurement tool we are using isn’t accurate enough, and we need to calibrate our equipment to make sure our readings are back under control. We could have human error in the testing. Or maybe there’s a factor in the environment that we did not have under control. For example, let’s say temperature is important. It’s a cold winter day, and we just had a blast of cold air when somebody opened the door, right when we were running these two experiments. That could have caused the outliers. Outliers should be thrown out or not used when we are designing process parameters.