The table represents data collected on the time spent studying (in minutes) and the resulting test grade.

Time Spent Studying (min) 52 37 31 9 26 40 22 10 45 34 19 60
Grade on Test 95 84 72 58 77 86 72 43 90 81 62 98
Part 1: Create a scatter plot with the predicted line of best fit drawn on it. Determine the type of correlation (if any), and predict the model that will be used.
Part 2: Find the line of best fit for the data either by hand or using technology. Explain your method. Find the predicted score for each time listed in the table.

Question

The table represents data collected on the time spent studying (in minutes) and the resulting test grade.

Time Spent Studying (min)	52	37	31	9	26	40	22	10	45	34	19	60
Grade on Test	95	84	72	58	77	86	72	43	90	81	62	98
Part 1: Create a scatter plot with the predicted line of best fit drawn on it. Determine the type of correlation (if any), and predict the model that will be used. 
Part 2: Find the line of best fit for the data either by hand or using technology. Explain your method. Find the predicted score for each time listed in the table.

anonymous · Answer

Part 3: Find the residuals, and decide if your model is a good fit. Explain your method.

amistre64 · Answer

well, what does your scatter plot look like?  can you work with excel?

anonymous · Answer

I already did a scatter plot I just need help with part 2 and part 3

amistre64 · Answer

a few things you will need to determine the line of best fit

sum of the y parts
sum of the x parts

sum of the xy parts
sum of the xx parts

and the number of points used

you will also need the average X and Y values.

anonymous · Answer

I dont understand, do you mind explaining? Like I know the line of best fit is when you draw a line through the points.

anonymous · Answer

One second, I will try to attach my scatter plot here.

amistre64 · Answer

the line of 'best' fit is the one that has the least amount of total error ...  as such, we could either develop the formulas needed for a slope and a point; or, we could just memorize the formulas already stated in the textbooks.

amistre64 · Answer

either way, we need to know those 7 values to play with since:$$y=mx+b$$
$$m=\frac{n\sum xy-\sum x\sum y}{n\sum xx-\sum x\sum x}$$
$$b=\bar y-m\bar x$$

amistre64 · Answer

i spose they generally name the equation as y hat:  $\hat y$, to distinguish it from the observed y values

anonymous · Answer

Ok here is the scatter plot

amistre64 · Answer

something that will make life easier if we have to do this by hand is to simply subtract the smallest value from all the points to reduce the size of the numbers involved.  its the same line, just moved.  but if this is done by a computer program, then thats irrelevant

amistre64 · Answer

(52	37	31	9	26	40	22	10	45	34	19	60) - 40

(12	-3	-9	31	-14	0	-18	-30	  5	-6	-21	20)

12-3	-9+31-14+0-18-30+5-6-21+20

-14-18-1 = -33,  average is -33/12
......................................................................



95	84	72	58	77	86	72	43	90	81	62	98

(95	84	72	58	77	86	72	43	90	81	62	98)-70

25	14	2	-12	7	16	2	-27	20	11	-8	28

25+14+2-12+7+16+2-27+20+11-8+28

2+76 = 78,  average is 78/12
........................................................................

these values will give us a shifted line by (-40,-70) that we will have to shift back

sum of xx,  and sum of xy of course would be more time consuming by hand

amistre64 · Answer

do you have excel or some equivalent program?

anonymous · Answer

I don't have excel, but I still don't understand what your doing. Like what the lesson tells me on how to find the line of best fit is to use two of the points on the line and the point-slope formula, y−y1=m(x−x1), to write the equation of the line then convert the equation of the line to slope-intercept form so that I can make predictions or answer questions about the data.

anonymous · Answer

So the two points i used was (52, 58) and (60,98) i found the slope and got 3/8 or 0.375 then the equation for the line i got was y=0.375x + 75.5

amistre64 · Answer

attachment

amistre64 · Answer

so you did not create the regression equation, you used a 2 point guesstimation

anonymous · Answer

those two points were from the data

amistre64 · Answer

$$\hat y=\frac{n\sum xy - \sum x \sum y}{n\sum xx - \sum x \sum x}(x-avgX)+avgY$$
$$\hat y=\frac{12(32120) - 385(918)}{12(15117) - 385(385)}(x-32.08333)+76.5$$

which simplifes to something like:

m = 0.9647
b = 45.5471

amistre64 · Answer

when we guess a line of best fit by using 2 data points, its not generally not going to be a good fit.

anonymous · Answer

Oh okay

amistre64 · Answer

the slope between the points that you choose is more like 5 ...

  60,  98
-52, -58
--------
    8,  40 ... 40/8 = 5

anonymous · Answer

Wont it be 52,95   and 60,98

amistre64 · Answer

yes, i was just checking to see if you had used valid points to start with :)

amistre64 · Answer

id used 9,58  since its relatively the lowest/leftmost point instead of 52,95

anonymous · Answer

oh ok

amistre64 · Answer

60, 98 
-9,-58
--------
51, 40  that gives us a more realistic slope closer to the best fit line

amistre64 · Answer

40/51 = 0.7843

anonymous · Answer

oh because it is closer to 0

amistre64 · Answer

not that is closer to 0,  its just a better fit to me, visually that is.

anonymous · Answer

ok

anonymous · Answer

now i have to find the equation of the line

amistre64 · Answer

notice your line in red,  versus my line in black.

anonymous · Answer

Right, so yours is more accurate.

amistre64 · Answer

the library im at 'upgraded' to the newer version of excel .... i like the older one better.

amistre64 · Answer

visually, mine if more accurate :)

anonymous · Answer

So what do i do now? I find the equation, correct?

anonymous · Answer

I got y=0.7843x +50.9413

amistre64 · Answer

with my points?  seems fiar, let me dbl chk

anonymous · Answer

ok

amistre64 · Answer

y = 0.7843x +50.9412

but yeah, thats good

anonymous · Answer

Okay, so after i got that, I dont know what to do next

amistre64 · Answer

lets call the line;  yh  instead of just y

y represents the actual data values that have been recorded,  yh is just a model that we think will predict the values.

anonymous · Answer

Ok

amistre64 · Answer

well, use your yh equation for each time value so determine how well it predicts it.  you are going to create a whole new set of data points which we should call yh.

amistre64 · Answer

we know when x=9, y=58 since we used that to create the line with to start with, and the last point will be exact as well.  the others are going to differ to some degree

anonymous · Answer

so we have to make up an x value?

amistre64 · Answer

no, the x values stay the same, we are trying to predict the outcome of a given time value;  not predict the time value itself

amistre64 · Answer

like this

anonymous · Answer

Oh

amistre64 · Answer

the residuals are just the difference between yh and y

anonymous · Answer

so  i would write 57.9999?

amistre64 · Answer

well, since we used an approximation for the slope fraction, yes ... there is going to be some slight differences when it comes to the points we used.

amistre64 · Answer

how accurate your yh is, is strictly up to you.

amistre64 · Answer

we are "Finding the predicted score for each time listed in the table".  by using our yh equation with the given time values

anonymous · Answer

I get it, but like i dont know what to write on the paper, so far i just have my equation work.

amistre64 · Answer

um, i used excel to find the yh values .... and posted a screenshot of it.

amistre64 · Answer

create a new table, and define your x and yh values.

anonymous · Answer

the yh is the y with the ^ on top?

amistre64 · Answer

yes :)

amistre64 · Answer

y hat

anonymous · Answer

so the values in yh is the predicted scores?

amistre64 · Answer

heres is what has happened so far, so that you can see the big picture that you might be missing.

a data set we given, this formed a scattered set of points on the graph

we used 2 of those points to construct an equation that we can use to predict those points, and others.

we are now finding how our equation plays with the time values, so yes  yh is the predicted scores since we are using our equation to try to determine a given score with it.

amistre64 · Answer

realistically we should have our points stated in terms of (time,grade)  or simply g(t)

we dont have an equation for g(t), we have a bunch of scattered about time,grade points and so we construct a mathematical model that we hope we can use to some effect.

so we develop the grade prediction model as:   g' = 0.7843(t) +50.9412

we use g' to compare with the known time,grade points to determine how good or bad a fit our model is

anonymous · Answer

Im sorry its still confusing i get part of it but not fully and i dont know what to write on the paper, this assignment is holding me back from doing the others

amistre64 · Answer

i dont know what you need to write either, but my guess is that you need to write up something like the first 3 rows of this. http://assets.openstudy.com/updates/attachments/53c010f6e4b00f624a91300f-amistre64-1405100458010-untitled.jpg

anonymous · Answer

ok

amistre64 · Answer

the 4th row is the residuals, the difference between our observed and predicted values: 

yh - y

amistre64 · Answer

the last row, is just taking the differences and squaring them .... not sure if thats the process you are expected to determine the goodness of fit with tho