Repository of Reproducible Computations

Free Statistics

of Irreproducible Research!

Author's title

Author

*The author of this computation has been verified*

R Software Module

rwasp_linear_regression.wasp

Title produced by software

Linear Regression Graphical Model Validation

Date of computation

Wed, 12 Nov 2008 15:15:36 -0700

Cite this page as follows

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?v=date/2008/Nov/12/t1226528322em7ck35vrsogpoh.htm/, Retrieved Mon, 20 May 2024 07:28:00 +0000

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?pk=24452, Retrieved Mon, 20 May 2024 07:28:00 +0000

QR Codes:

Paste this QR Code to cite your computation.

Original text written by user:

IsPrivate?

No (this computation is public)

User-defined keywords

Estimated Impact

171

Family? (F = Feedback message, R = changed R code, M = changed R Module, P = changed Parameters, D = changed Data)

F     [Testing Sample Mean with known Variance - Confidence Interval] [Case: The Pork Qu...] [2008-11-12 11:09:16] [8094ad203a218aaca2d1cea2c78c2d6e]
F RMPD    [Linear Regression Graphical Model Validation] [Various EDA Topic...] [2008-11-12 22:15:36] [1351baa662f198be3bff32f9007a9a6d] [Current]
F    D      [Linear Regression Graphical Model Validation] [Blok 8 opdracht 3 Q4] [2008-11-13 18:18:47] [6173c35e31b784a490c8cd5476f785d4] 

Feedback Forum

2008-11-14 15:57:37 [Katrijn Truyman] [reply] 
veel uitleg en info over de werking van de analyse, maar wederom geen concrete getallen over jouw gegevens.
2008-11-17 08:13:03 [006ad2c49b6a7c2ad6ab685cfc1dae56] [reply] 
Je had meer uitleg kunnen geven over je eigen gegevens en je moest een box-cox normality plot maken.
2008-11-22 15:15:10 [Peter Van Doninck] [reply] 
Bij deze vraag is er echter ook geen duidelijke conclusie gegeven, enkel de theorie. Uit de link is er ook niet echt iets uit af te leiden. Er is geen box cox normality plot getekend. In de praktijk zal de normality plot ook dezelfde conclusie geven als bij de linearity plot. De trasformatie zal hier ook niet doeltreffend zijn. (had ik per ongeluk bij annelies toegevoegd) 
2008-11-23 12:44:58 [Nathalie Daneels] [reply] 
Evaluatie opdracht 3 - Blok 8 (Q4) 
De student heeft informatie gegeven over de box-cox normality plot, maar heeft geen interpretatie gegeven van zijn eigen bevindingen. Bovendien heeft de student ook niet de juiste grafiek geproduceerd. Ik zal bij de evaluatie een link zetten die wel de juiste grafieken produceert.  
Dit zou een interpretatie kunnen zijn: 
Allereerst moeten we opmerken dat de box cox normality plot niet hetzelfde is als de box-cox linearity plot. De box-cox normality plot gaat over de distributieverdeling van 1 variabele. We moeten ook bij deze grafiek nagaan bij welke waarde van lambda de grafiek een maximum bereikt. Net zoals bij de box-cox normality plot laten we de waardes van lambda variëren tussen -2 en 2 (De horizontale as op de grafiek). De verticale as op de grafiek toont de gewijzigde correlatie (na toepassing van de transformatie van een variabele). De correlatie waarbij de grafiek een maximum bereikt (bij een bepaalde waarde van lambda) zorgt ervoor dat de verdeling van de tijdreeks meer op een normaalverdeling gaat lijken: Bij een hogere correlatie gaat de tijdreeks meer de normaalverdeling benaderen. De correlatie heeft hier betrekking op de verdeling van de Normal QQ-plot 
In dit geval kunnen we vaststellen dat de grafiek een maximum bereikt bij een waarde van -2 voor lambda. We gaan ervan uit dat de grafiek na -2 gaat dalen. 
Vervolgens kunnen we op de histogrammen zien dat de transformatie wel een effect heeft gehad op de verdeling van de tijdreeks. Uit het histogram van de getransformeerde data kunnen we besluiten dat de verdeling van gegevens meer gelijken op een normaalverdeling na transformatie van de variabele. Eerst waren de data zeer duidelijk rechtsscheef verdeeld, en na transformatie blijkt dat ze meer leunen naar de normaalverdeling verdeling, met uitzondering van het linkse staafje dat eerder afgezonderd ligt.  
Ook als we naar de normal QQ plot kijken, kunnen we concluderen dat door de transformatie van de variabele de tijdreeks dichter bij een normaalverdeling aanleunt. De rechte lijn van normal q-q plot geeft een perfect lineair verband weer. Hierdoor kunnen we een conclusie vormen over de ‘verbeterde correlatie’ tussen 2 variabelen na transformatie van een variabele. De 2e grafiek (de normal QQ plot van de getransformeerde data) toont duidelijk aan dat deze perfecte lineariteit beter benaderd wordt na transformatie. We zijn dus in ons opzet geslaagd.
2008-11-23 12:58:26 [Nathalie Daneels] [reply] 
Evaluatie opdracht 3 - Blok 8 (Q4) 
 
Ik ben bij de evaluatie hierboven de link vergeten te zetten waarop de juiste grafieken geproduceerd zijn. 
 
http://www.freestatistics.org/blog/index.php?v=date/2008/Nov/23/t1227444988r2pg0x58urumjuy.htm
2008-11-23 13:08:59 [Nathalie Daneels] [reply] 
Evaluatie opdracht 3 - Blok 8 (Q5) 
 
Ik ga hier ook de evaluatie van Q5 zetten, want bij het document van de student was er geen link, dus kan ik die evaluatie eigenlijk niet evalueren. 
 
De student had bij deze vraag wel de theorie uit EDA over de maximum likelihood normal distribution fitting. 
 
Dit is de link waarop de juiste grafiek geproduceerd is: 
http://www.freestatistics.org/blog/index.php?v=date/2008/Nov/23/t12274454986v2w31ewqj1cy81.htm 
 
Dit zou de conclusie kunnen zijn: 
De software maakt een schatting van het gemiddelde en de standaardfout die het best past bij de verdeling van de gegevens. In de grafiek zie je dan ook de geschatte normaalverdeling die het dichtst bij het histogram aanleunt. 
Bij de tabel moeten we enkel kijken naar de estimated value van de ‘mean’ en de ‘standard deviation’.  
De tweede kolom geeft eigenlijk de standaarddeviatie van het geschatte gemiddelde en de standaarddeviatie van de geschatte standaarddeviatie, maar dit wordt hier verder buiten beschouwing gelaten. 
Uit de tabel kunnen we het geschatte gemiddelde (85) en standaarddeviatie (8) afleiden. De lijn, die op de grafiek is getekend, vormt de geschatte normaalverdeling die het dichtst bij het histogram aanleunt. Bij een normaalverdeling bevindt het gemiddelde zich meestal in het midden van het histogram, wat we bij deze figuur ook redelijk goed kunnen beamen.

Post a new message

Dataseries X:

Download CSV

Histogram

Boxplots

Dataseries Y:

Download CSV

Histogram

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'Gwilym Jenkins' @ 72.249.127.135

\begin{tabular}{lllllllll}
\hline
Summary of computational transaction \tabularnewline
Raw Input & view raw input (R code)  \tabularnewline
Raw Output & view raw output of R engine  \tabularnewline
Computing time & 3 seconds \tabularnewline
R Server & 'Gwilym Jenkins' @ 72.249.127.135 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=24452&T=0

[TABLE]
[ROW][C]Summary of computational transaction[/C][/ROW]
[ROW][C]Raw Input[/C][C]view raw input (R code) [/C][/ROW]
[ROW][C]Raw Output[/C][C]view raw output of R engine [/C][/ROW]
[ROW][C]Computing time[/C][C]3 seconds[/C][/ROW]
[ROW][C]R Server[/C][C]'Gwilym Jenkins' @ 72.249.127.135[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=24452&T=0

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=24452&T=0

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'Gwilym Jenkins' @ 72.249.127.135

Simple Linear Regression
Statistics	Estimate	S.D.	T-STAT (H0: coeff=0)	P-value (two-sided)
constant term	-16.0075737640161	20.7429696996494	-0.771710801095497	0.442846703119283
slope	1.67173296391877	0.243290159249537	6.87135463709453	2.01960359547115e-09

\begin{tabular}{lllllllll}
\hline
Simple Linear Regression \tabularnewline
Statistics & Estimate & S.D. & T-STAT (H0: coeff=0) & P-value (two-sided) \tabularnewline
constant term & -16.0075737640161 & 20.7429696996494 & -0.771710801095497 & 0.442846703119283 \tabularnewline
slope & 1.67173296391877 & 0.243290159249537 & 6.87135463709453 & 2.01960359547115e-09 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=24452&T=1

[TABLE]
[ROW][C]Simple Linear Regression[/C][/ROW]
[ROW][C]Statistics[/C][C]Estimate[/C][C]S.D.[/C][C]T-STAT (H0: coeff=0)[/C][C]P-value (two-sided)[/C][/ROW]
[ROW][C]constant term[/C][C]-16.0075737640161[/C][C]20.7429696996494[/C][C]-0.771710801095497[/C][C]0.442846703119283[/C][/ROW]
[ROW][C]slope[/C][C]1.67173296391877[/C][C]0.243290159249537[/C][C]6.87135463709453[/C][C]2.01960359547115e-09[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=24452&T=1

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=24452&T=1

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Simple Linear Regression
Statistics	Estimate	S.D.	T-STAT (H0: coeff=0)	P-value (two-sided)
constant term	-16.0075737640161	20.7429696996494	-0.771710801095497	0.442846703119283
slope	1.67173296391877	0.243290159249537	6.87135463709453	2.01960359547115e-09

Figure 1

PNG link

Postscript link

PDF link

Figure 2

PNG link

Postscript link

PDF link

Figure 3

PNG link

Postscript link

PDF link

Figure 4

PNG link

Postscript link

PDF link

Figure 5

PNG link

Postscript link

PDF link

Figure 6

PNG link

Postscript link

PDF link

Figure 7

PNG link

Postscript link

PDF link

Figure 8

PNG link

Postscript link

PDF link

Figure 9

PNG link

Postscript link

PDF link

Parameters (Session):

par1 = 0 ;

Parameters (R input):

par1 = 0 ;

R code (references can be found in the software module):

par1 <- as.numeric(par1)
library(lattice)
z <- as.data.frame(cbind(x,y))
m <- lm(y~x)
summary(m)
bitmap(file='test1.png')
plot(z,main='Scatterplot, lowess, and regression line')
lines(lowess(z),col='red')
abline(m)
grid()
dev.off()
bitmap(file='test2.png')
m2 <- lm(m$fitted.values ~ x)
summary(m2)
z2 <- as.data.frame(cbind(x,m$fitted.values))
names(z2) <- list('x','Fitted')
plot(z2,main='Scatterplot, lowess, and regression line')
lines(lowess(z2),col='red')
abline(m2)
grid()
dev.off()
bitmap(file='test3.png')
m3 <- lm(m$residuals ~ x)
summary(m3)
z3 <- as.data.frame(cbind(x,m$residuals))
names(z3) <- list('x','Residuals')
plot(z3,main='Scatterplot, lowess, and regression line')
lines(lowess(z3),col='red')
abline(m3)
grid()
dev.off()
bitmap(file='test4.png')
m4 <- lm(m$fitted.values ~ m$residuals)
summary(m4)
z4 <- as.data.frame(cbind(m$residuals,m$fitted.values))
names(z4) <- list('Residuals','Fitted')
plot(z4,main='Scatterplot, lowess, and regression line')
lines(lowess(z4),col='red')
abline(m4)
grid()
dev.off()
bitmap(file='test5.png')
myr <- as.ts(m$residuals)
z5 <- as.data.frame(cbind(lag(myr,1),myr))
names(z5) <- list('Lagged Residuals','Residuals')
plot(z5,main='Lag plot')
m5 <- lm(z5)
summary(m5)
abline(m5)
grid()
dev.off()
bitmap(file='test6.png')
hist(m$residuals,main='Residual Histogram',xlab='Residuals')
dev.off()
bitmap(file='test7.png')
if (par1 > 0)
{
densityplot(~m$residuals,col='black',main=paste('Density Plot   bw = ',par1),bw=par1)
} else {
densityplot(~m$residuals,col='black',main='Density Plot')
}
dev.off()
bitmap(file='test8.png')
acf(m$residuals,main='Residual Autocorrelation Function')
dev.off()
bitmap(file='test9.png')
qqnorm(x)
qqline(x)
grid()
dev.off()
load(file='createtable')
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Simple Linear Regression',5,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Statistics',1,TRUE)
a<-table.element(a,'Estimate',1,TRUE)
a<-table.element(a,'S.D.',1,TRUE)
a<-table.element(a,'T-STAT (H0: coeff=0)',1,TRUE)
a<-table.element(a,'P-value (two-sided)',1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'constant term',header=TRUE)
a<-table.element(a,m$coefficients[[1]])
sd <- sqrt(vcov(m)[1,1])
a<-table.element(a,sd)
tstat <- m$coefficients[[1]]/sd
a<-table.element(a,tstat)
pval <- 2*(1-pt(abs(tstat),length(x)-2))
a<-table.element(a,pval)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'slope',header=TRUE)
a<-table.element(a,m$coefficients[[2]])
sd <- sqrt(vcov(m)[2,2])
a<-table.element(a,sd)
tstat <- m$coefficients[[2]]/sd
a<-table.element(a,tstat)
pval <- 2*(1-pt(abs(tstat),length(x)-2))
a<-table.element(a,pval)
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable.tab')

Free Statistics

Description of Statistical Computation

Tree of Dependent Computations

Dataset

Tables (Output of Computation)

Figures (Output of Computation)

Input Parameters & R Code