Repository of Reproducible Computations

Free Statistics

of Irreproducible Research!

Author's title

Author

*Unverified author*

R Software Module

rwasp_regression_trees1.wasp

Title produced by software

Recursive Partitioning (Regression Trees)

Date of computation

Mon, 12 Dec 2011 11:14:58 -0500

Cite this page as follows

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?v=date/2011/Dec/12/t1323706534qsb1c0bomtpby1v.htm/, Retrieved Fri, 03 May 2024 10:21:02 +0000

Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL https://freestatistics.org/blog/index.php?pk=154089, Retrieved Fri, 03 May 2024 10:21:02 +0000

QR Codes:

Paste this QR Code to cite your computation.

Original text written by user:

IsPrivate?

No (this computation is public)

User-defined keywords

Estimated Impact

117

Family? (F = Feedback message, R = changed R code, M = changed R Module, P = changed Parameters, D = changed Data)

-     [Recursive Partitioning (Regression Trees)] [] [2010-12-05 19:50:12] [b98453cac15ba1066b407e146608df68]
- R PD  [Recursive Partitioning (Regression Trees)] [] [2011-12-09 10:41:59] [22f8bc702946f784836540059d0d9516]
-         [Recursive Partitioning (Regression Trees)] [WS 10 Rec. Partit...] [2011-12-12 14:19:47] [74be16979710d4c4e7c6647856088456]
-   PD        [Recursive Partitioning (Regression Trees)] [Paper Deel 3 (Rec...] [2011-12-12 16:14:58] [d41d8cd98f00b204e9800998ecf8427e] [Current]

Feedback Forum

Post a new message

Dataseries X:

Download CSV

Histogram

Boxplots

439387	134	107	177551	171
405942	86	136	72875	133
405267	91	122	109249	70
394976	102	141	95066	115
390163	134	138	134796	145
382564	97	149	139537	184
374269	129	157	126893	160
371645	136	175	234853	238
370878	127	149	153824	155
368833	145	106	175681	282
358662	130	91	101817	210
351056	150	221	135062	159
349227	164	183	106385	144
348955	139	161	144068	164
347930	125	115	62464	165
346142	80	147	131106	161
344751	95	108	113587	118
343613	117	148	108179	190
334082	59	137	97967	80
325723	132	132	104155	112
322679	113	155	91677	145
319346	119	168	107549	156
310108	111	143	94584	147
309762	134	161	146648	165
308944	115	127	143950	256
305210	128	178	104434	152
292930	72	143	139206	106
292891	109	113	87408	107
291777	132	136	119906	162
287314	116	200	159803	157
287239	90	136	72117	145
285266	116	122	90694	130
284519	98	133	85008	119
283283	123	73	162627	213
280943	132	132	112368	192
279589	163	115	94588	127
277542	95	122	74844	76
276709	111	127	161961	179
275578	139	145	118850	161
273632	61	120	137639	137
273576	89	81	112669	102
269169	80	141	95684	106
268118	140	118	139141	197
268066	152	100	174110	202
266805	134	110	106271	139
265348	88	133	109840	155
264530	96	104	124252	147
264159	83	110	63583	73
262412	170	132	144645	181
256814	117	133	105416	140
255082	86	120	91072	125
249893	93	118	93071	132
248834	112	81	93587	152
247842	108	153	130307	111
243048	113	151	125369	143
242782	131	162	124550	170
242395	111	135	59938	107
241171	113	128	78912	125
237531	122	112	92288	102
236398	74	85	50548	89
236370	104	135	84891	133
232791	67	129	95329	122
230091	73	69	100174	98
225816	77	127	82124	86
224936	120	111	113864	222
219392	73	159	84892	104
219074	86	149	77826	103
212262	47	112	105612	122
210247	88	123	138599	152
209639	88	118	151611	153
209481	68	150	124254	107
207253	62	106	98073	108
207178	137	131	112215	199
205260	103	168	79738	93
204450	75	132	79146	117
202898	125	122	106816	145
202410	78	112	238712	57
201970	101	159	96056	155
201345	88	81	62853	113
200181	127	144	76302	134
199717	68	83	64631	95
199642	57	94	55062	80
199186	89	119	87026	121
197813	114	104	108461	162
195822	56	78	77457	113
193456	80	102	61263	136
192718	74	80	62486	101
192645	81	128	96740	108
190673	77	72	61680	88
189021	86	76	95216	192
189020	29	45	106221	109
186273	50	155	97392	158
183696	64	103	126938	106
182915	89	36	48029	82
181728	77	90	92059	127
180424	79	116	125976	128
180362	69	108	88766	84
179994	47	55	87656	67
179928	66	137	74783	85
178489	78	123	143372	123
176062	44	122	80849	150
174970	120	94	71764	114
174311	62	59	44244	72
173420	123	123	67486	109
173260	41	78	37238	16
170849	161	143	136051	163
166270	93	134	80613	124
166059	61	128	94747	65
162366	71	110	68007	92
161691	55	125	105793	41
161189	50	84	81964	74
157518	67	103	68966	103
157448	93	74	114029	157
157125	37	81	59591	89
156389	92	184	104880	120
150006	105	120	77993	132
147989	55	93	63995	90
147760	83	110	45635	132
146175	62	102	57139	68
145905	30	99	38692	38
145249	55	61	41449	68
143182	52	89	35224	66
141987	45	100	57793	60
140319	50	45	90586	52
138731	127	80	99971	87
138630	70	118	113402	104
137093	37	81	117129	97
136341	58	78	48194	57
135798	116	127	151244	61
135248	58	111	98956	126
133301	26	77	15986	27
130908	142	176	135096	108
118033	49	121	144253	15
114673	33	46	31081	41
107465	37	65	42087	33
106655	28	69	197426	60
101097	30	41	31701	35
94381	42	89	66089	50
87995	38	121	65702	28
80953	49	27	56622	59
78800	20	66	33032	21
73566	22	67	22618	26
65295	8	80	76173	49
65029	21	61	32551	21
61857	23	30	25162	31
46660	12	13	6179	13
43287	13	64	43750	49
38214	16	21	8773	16
33186	1	22	22807	5
31414	18	9	14116	39
24188	8	7	5950	7
23623	12	0	1168	11
21054	4	0	855	4
17547	0	4	3926	3
14688	4	0	6023	5
7199	7	0	1644	6
969	0	0	0	0
455	0	0	0	0
203	0	0	0	0
98	0	0	0	0
1	0	0	0	0
0	0	0	0	0
0	0	0	0	0
0	0	0	0	0

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'Gwilym Jenkins' @ jenkins.wessa.net

\begin{tabular}{lllllllll}
\hline
Summary of computational transaction \tabularnewline
Raw Input & view raw input (R code)  \tabularnewline
Raw Output & view raw output of R engine  \tabularnewline
Computing time & 3 seconds \tabularnewline
R Server & 'Gwilym Jenkins' @ jenkins.wessa.net \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=154089&T=0

[TABLE]
[ROW][C]Summary of computational transaction[/C][/ROW]
[ROW][C]Raw Input[/C][C]view raw input (R code) [/C][/ROW]
[ROW][C]Raw Output[/C][C]view raw output of R engine [/C][/ROW]
[ROW][C]Computing time[/C][C]3 seconds[/C][/ROW]
[ROW][C]R Server[/C][C]'Gwilym Jenkins' @ jenkins.wessa.net[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=154089&T=0

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=154089&T=0

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Summary of computational transaction
Raw Input	view raw input (R code)
Raw Output	view raw output of R engine
Computing time	3 seconds
R Server	'Gwilym Jenkins' @ jenkins.wessa.net

10-Fold Cross Validation
	Prediction (training)			Prediction (testing)
Actual	C1	C2	CV	C1	C2	CV
C1	574	163	0.7788	62	21	0.747
C2	87	642	0.8807	17	74	0.8132
Overall	-	-	0.8295	-	-	0.7816

\begin{tabular}{lllllllll}
\hline
10-Fold Cross Validation \tabularnewline
 & Prediction (training) & Prediction (testing) \tabularnewline
Actual & C1 & C2 & CV & C1 & C2 & CV \tabularnewline
C1 & 574 & 163 & 0.7788 & 62 & 21 & 0.747 \tabularnewline
C2 & 87 & 642 & 0.8807 & 17 & 74 & 0.8132 \tabularnewline
Overall & - & - & 0.8295 & - & - & 0.7816 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=154089&T=1

[TABLE]
[ROW][C]10-Fold Cross Validation[/C][/ROW]
[ROW][C][/C][C]Prediction (training)[/C][C]Prediction (testing)[/C][/ROW]
[ROW][C]Actual[/C][C]C1[/C][C]C2[/C][C]CV[/C][C]C1[/C][C]C2[/C][C]CV[/C][/ROW]
[ROW][C]C1[/C][C]574[/C][C]163[/C][C]0.7788[/C][C]62[/C][C]21[/C][C]0.747[/C][/ROW]
[ROW][C]C2[/C][C]87[/C][C]642[/C][C]0.8807[/C][C]17[/C][C]74[/C][C]0.8132[/C][/ROW]
[ROW][C]Overall[/C][C]-[/C][C]-[/C][C]0.8295[/C][C]-[/C][C]-[/C][C]0.7816[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=154089&T=1

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=154089&T=1

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

10-Fold Cross Validation
	Prediction (training)			Prediction (testing)
Actual	C1	C2	CV	C1	C2	CV
C1	574	163	0.7788	62	21	0.747
C2	87	642	0.8807	17	74	0.8132
Overall	-	-	0.8295	-	-	0.7816

Confusion Matrix (predicted in columns / actuals in rows)
	C1	C2
C1	66	16
C2	10	72

\begin{tabular}{lllllllll}
\hline
Confusion Matrix (predicted in columns / actuals in rows) \tabularnewline
 & C1 & C2 \tabularnewline
C1 & 66 & 16 \tabularnewline
C2 & 10 & 72 \tabularnewline
\hline
\end{tabular}
%Source: https://freestatistics.org/blog/index.php?pk=154089&T=2

[TABLE]
[ROW][C]Confusion Matrix (predicted in columns / actuals in rows)[/C][/ROW]
[ROW][C][/C][C]C1[/C][C]C2[/C][/ROW]
[ROW][C]C1[/C][C]66[/C][C]16[/C][/ROW]
[ROW][C]C2[/C][C]10[/C][C]72[/C][/ROW]
[/TABLE]
Source: https://freestatistics.org/blog/index.php?pk=154089&T=2

Globally Unique Identifier (entire table): ba.freestatistics.org/blog/index.php?pk=154089&T=2

As an alternative you can also use a QR Code:

The GUIDs for individual cells are displayed in the table below:

Confusion Matrix (predicted in columns / actuals in rows)
	C1	C2
C1	66	16
C2	10	72

Figure 1

PNG link

Postscript link

PDF link

Figure 2

PNG link

Postscript link

PDF link

Figure 3

PNG link

Postscript link

PDF link

Parameters (Session):

par1 = 1 ; par2 = Do not include Seasonal Dummies ; par3 = No Linear Trend ;

Parameters (R input):

par1 = 1 ; par2 = quantiles ; par3 = 2 ; par4 = yes ;

R code (references can be found in the software module):

library(party)
library(Hmisc)
par1 <- as.numeric(par1)
par3 <- as.numeric(par3)
x <- data.frame(t(y))
is.data.frame(x)
x <- x[!is.na(x[,par1]),]
k <- length(x[1,])
n <- length(x[,1])
colnames(x)[par1]
x[,par1]
if (par2 == 'kmeans') {
cl <- kmeans(x[,par1], par3)
print(cl)
clm <- matrix(cbind(cl$centers,1:par3),ncol=2)
clm <- clm[sort.list(clm[,1]),]
for (i in 1:par3) {
cl$cluster[cl$cluster==clm[i,2]] <- paste('C',i,sep='')
}
cl$cluster <- as.factor(cl$cluster)
print(cl$cluster)
x[,par1] <- cl$cluster
}
if (par2 == 'quantiles') {
x[,par1] <- cut2(x[,par1],g=par3)
}
if (par2 == 'hclust') {
hc <- hclust(dist(x[,par1])^2, 'cen')
print(hc)
memb <- cutree(hc, k = par3)
dum <- c(mean(x[memb==1,par1]))
for (i in 2:par3) {
dum <- c(dum, mean(x[memb==i,par1]))
}
hcm <- matrix(cbind(dum,1:par3),ncol=2)
hcm <- hcm[sort.list(hcm[,1]),]
for (i in 1:par3) {
memb[memb==hcm[i,2]] <- paste('C',i,sep='')
}
memb <- as.factor(memb)
print(memb)
x[,par1] <- memb
}
if (par2=='equal') {
ed <- cut(as.numeric(x[,par1]),par3,labels=paste('C',1:par3,sep=''))
x[,par1] <- as.factor(ed)
}
table(x[,par1])
colnames(x)
colnames(x)[par1]
x[,par1]
if (par2 == 'none') {
m <- ctree(as.formula(paste(colnames(x)[par1],' ~ .',sep='')),data = x)
}
load(file='createtable')
if (par2 != 'none') {
m <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data = x)
if (par4=='yes') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'10-Fold Cross Validation',3+2*par3,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
a<-table.element(a,'Prediction (training)',par3+1,TRUE)
a<-table.element(a,'Prediction (testing)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Actual',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
a<-table.row.end(a)
for (i in 1:10) {
ind <- sample(2, nrow(x), replace=T, prob=c(0.9,0.1))
m.ct <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data =x[ind==1,])
if (i==1) {
m.ct.i.pred <- predict(m.ct, newdata=x[ind==1,])
m.ct.i.actu <- x[ind==1,par1]
m.ct.x.pred <- predict(m.ct, newdata=x[ind==2,])
m.ct.x.actu <- x[ind==2,par1]
} else {
m.ct.i.pred <- c(m.ct.i.pred,predict(m.ct, newdata=x[ind==1,]))
m.ct.i.actu <- c(m.ct.i.actu,x[ind==1,par1])
m.ct.x.pred <- c(m.ct.x.pred,predict(m.ct, newdata=x[ind==2,]))
m.ct.x.actu <- c(m.ct.x.actu,x[ind==2,par1])
}
}
print(m.ct.i.tab <- table(m.ct.i.actu,m.ct.i.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.i.tab[i,i] / sum(m.ct.i.tab[i,]))
numer <- numer + m.ct.i.tab[i,i]
}
print(m.ct.i.cp <- numer / sum(m.ct.i.tab))
print(m.ct.x.tab <- table(m.ct.x.actu,m.ct.x.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.x.tab[i,i] / sum(m.ct.x.tab[i,]))
numer <- numer + m.ct.x.tab[i,i]
}
print(m.ct.x.cp <- numer / sum(m.ct.x.tab))
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (jjj in 1:par3) a<-table.element(a,m.ct.i.tab[i,jjj])
a<-table.element(a,round(m.ct.i.tab[i,i]/sum(m.ct.i.tab[i,]),4))
for (jjj in 1:par3) a<-table.element(a,m.ct.x.tab[i,jjj])
a<-table.element(a,round(m.ct.x.tab[i,i]/sum(m.ct.x.tab[i,]),4))
a<-table.row.end(a)
}
a<-table.row.start(a)
a<-table.element(a,'Overall',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.i.cp,4))
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.x.cp,4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable3.tab')
}
}
m
bitmap(file='test1.png')
plot(m)
dev.off()
bitmap(file='test1a.png')
plot(x[,par1] ~ as.factor(where(m)),main='Response by Terminal Node',xlab='Terminal Node',ylab='Response')
dev.off()
if (par2 == 'none') {
forec <- predict(m)
result <- as.data.frame(cbind(x[,par1],forec,x[,par1]-forec))
colnames(result) <- c('Actuals','Forecasts','Residuals')
print(result)
}
if (par2 != 'none') {
print(cbind(as.factor(x[,par1]),predict(m)))
myt <- table(as.factor(x[,par1]),predict(m))
print(myt)
}
bitmap(file='test2.png')
if(par2=='none') {
op <- par(mfrow=c(2,2))
plot(density(result$Actuals),main='Kernel Density Plot of Actuals')
plot(density(result$Residuals),main='Kernel Density Plot of Residuals')
plot(result$Forecasts,result$Actuals,main='Actuals versus Predictions',xlab='Predictions',ylab='Actuals')
plot(density(result$Forecasts),main='Kernel Density Plot of Predictions')
par(op)
}
if(par2!='none') {
plot(myt,main='Confusion Matrix',xlab='Actual',ylab='Predicted')
}
dev.off()
if (par2 == 'none') {
detcoef <- cor(result$Forecasts,result$Actuals)
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Goodness of Fit',2,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Correlation',1,TRUE)
a<-table.element(a,round(detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'R-squared',1,TRUE)
a<-table.element(a,round(detcoef*detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'RMSE',1,TRUE)
a<-table.element(a,round(sqrt(mean((result$Residuals)^2)),4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable1.tab')
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Actuals, Predictions, and Residuals',4,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'#',header=TRUE)
a<-table.element(a,'Actuals',header=TRUE)
a<-table.element(a,'Forecasts',header=TRUE)
a<-table.element(a,'Residuals',header=TRUE)
a<-table.row.end(a)
for (i in 1:length(result$Actuals)) {
a<-table.row.start(a)
a<-table.element(a,i,header=TRUE)
a<-table.element(a,result$Actuals[i])
a<-table.element(a,result$Forecasts[i])
a<-table.element(a,result$Residuals[i])
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable.tab')
}
if (par2 != 'none') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Confusion Matrix (predicted in columns / actuals in rows)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
for (i in 1:par3) {
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
}
a<-table.row.end(a)
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (j in 1:par3) {
a<-table.element(a,myt[i,j])
}
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable2.tab')
}

Free Statistics

Description of Statistical Computation

Tree of Dependent Computations

Dataset

Tables (Output of Computation)

Figures (Output of Computation)

Input Parameters & R Code