Home » date » 2010 » Dec » 13 »

WS10 - Recursive Partitioning

*The author of this computation has been verified*
R Software Module: /rwasp_regression_trees1.wasp (opens new window with default values)
Title produced by software: Recursive Partitioning (Regression Trees)
Date of computation: Mon, 13 Dec 2010 12:41:50 +0000
 
Cite this page as follows:
Statistical Computations at FreeStatistics.org, Office for Research Development and Education, URL http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr.htm/, Retrieved Mon, 13 Dec 2010 13:40:16 +0100
 
BibTeX entries for LaTeX users:
@Manual{KEY,
    author = {{YOUR NAME}},
    publisher = {Office for Research Development and Education},
    title = {Statistical Computations at FreeStatistics.org, URL http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr.htm/},
    year = {2010},
}
@Manual{R,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Development Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2010},
    note = {{ISBN} 3-900051-07-0},
    url = {http://www.R-project.org},
}
 
Original text written by user:
 
IsPrivate?
No (this computation is public)
 
User-defined keywords:
 
Dataseries X:
» Textbox « » Textfile « » CSV «
25.94 23688100 39.18 3940.35 0.0274 144.7 5.45 28.66 13741000 35.78 4696.69 0.0322 140.8 5.73 33.95 14143500 42.54 4572.83 0.0376 137.1 5.85 31.01 16763800 27.92 3860.66 0.0307 137.7 6.02 21.00 16634600 25.05 3400.91 0.0319 144.7 6.27 26.19 13693300 32.03 3966.11 0.0373 139.2 6.53 25.41 10545800 27.95 3766.99 0.0366 143.0 6.54 30.47 9409900 27.95 4206.35 0.0341 140.8 6.5 12.88 39182200 24.15 3672.82 0.0345 142.5 6.52 9.78 37005800 27.57 3369.63 0.0345 135.8 6.51 8.25 15818500 22.97 2597.93 0.0345 132.6 6.51 7.44 16952000 17.37 2470.52 0.0339 128.6 6.4 10.81 24563400 24.45 2772.73 0.0373 115.7 5.98 9.12 14163200 23.62 2151.83 0.0353 109.2 5.49 11.03 18184800 21.90 1840.26 0.0292 116.9 5.31 12.74 20810300 27.12 2116.24 0.0327 109.9 4.8 9.98 12843000 27.70 2110.49 0.0362 116.1 4.21 11.62 13866700 29.23 2160.54 0.0325 118.9 3.97 9.40 15119200 26.50 2027.13 0.0272 116.3 3.77 9.27 8301600 22.84 1805.43 0.0272 114.0 3.65 7.76 14039600 20.49 1498.80 0.0265 97.0 3.07 8.78 1213 etc...
 
Output produced by software:

Enter (or paste) a matrix (table) containing all data (time) series. Every column represents a different variable and must be delimited by a space or Tab. Every row represents a period in time (or category) and must be delimited by hard returns. The easiest way to enter data is to copy and paste a block of spreadsheet cells. Please, do not use commas or spaces to seperate groups of digits!


Summary of computational transaction
Raw Inputview raw input (R code)
Raw Outputview raw output of R engine
Computing time6 seconds
R Server'Sir Ronald Aylmer Fisher' @ 193.190.124.24


Goodness of Fit
Correlation0.9268
R-squared0.859
RMSE28.6691


Actuals, Predictions, and Residuals
#ActualsForecastsResiduals
125.9449.0725-23.1325
228.6626.12833333333332.53166666666667
333.9526.12833333333337.82166666666667
431.0126.12833333333334.88166666666667
52126.1283333333333-5.12833333333333
626.1926.12833333333330.0616666666666674
725.4126.1283333333333-0.718333333333334
830.4726.12833333333334.34166666666667
912.8850.825-37.945
109.78119.258421052632-109.478421052632
118.2526.1283333333333-17.8783333333333
127.4426.1283333333333-18.6883333333333
1310.8149.0725-38.2625
149.1213.5443478260870-4.42434782608696
1511.0313.5443478260870-2.51434782608696
1612.7413.5443478260870-0.804347826086957
179.9813.5443478260870-3.56434782608696
1811.6226.1283333333333-14.5083333333333
199.413.5443478260870-4.14434782608696
209.2710.39-1.12
217.767.716666666666670.043333333333333
228.7810.39-1.61
2310.6513.5443478260870-2.89434782608696
2410.9513.5443478260870-2.59434782608696
2512.3613.5443478260870-1.18434782608696
2610.8510.390.459999999999999
2711.8413.5443478260870-1.70434782608696
2812.1410.391.75
2911.6510.391.26
308.867.716666666666671.14333333333333
317.637.71666666666667-0.0866666666666669
327.387.71666666666667-0.336666666666667
337.257.71666666666667-0.466666666666667
348.037.716666666666670.313333333333333
357.757.716666666666670.0333333333333332
367.167.71666666666667-0.556666666666667
377.187.71666666666667-0.536666666666667
387.517.71666666666667-0.206666666666667
397.07115.362142857143-108.292142857143
407.117.71666666666667-0.606666666666666
418.987.716666666666671.26333333333333
429.5310.39-0.860000000000001
4310.5410.390.149999999999999
4411.3113.5443478260870-2.23434782608696
4510.3610.39-0.0300000000000011
4611.4413.5443478260870-2.10434782608696
4710.4513.5443478260870-3.09434782608696
4810.6913.5443478260870-2.85434782608696
4911.2813.5443478260870-2.26434782608696
5011.9613.5443478260870-1.58434782608696
5113.5213.5443478260870-0.0243478260869576
5212.8913.5443478260870-0.654347826086957
5314.0313.54434782608700.485652173913042
5416.2713.54434782608702.72565217391304
5516.1713.54434782608702.62565217391304
5617.2513.54434782608703.70565217391304
5719.3813.54434782608705.83565217391304
5826.250.825-24.625
5933.5350.825-17.295
6032.250.825-18.625
6138.4550.825-12.375
6244.8650.825-5.965
6341.6749.0725-7.4025
6436.0650.825-14.765
6539.7649.0725-9.3125
6636.8113.544347826087023.2656521739130
6742.6526.128333333333316.5216666666667
6846.8926.128333333333320.7616666666667
6953.6149.07254.5375
7057.5950.8256.765
7167.8249.072518.7475
7271.8949.072522.8175
7375.51119.258421052632-43.7484210526316
7468.4950.82517.665
7562.7250.82511.895
7670.3950.82519.565
7759.7750.8258.945
7857.2750.8256.445
7967.9650.82517.135
8067.8550.82517.025
8176.9850.82526.155
8281.0849.072532.0075
8391.66119.258421052632-27.5984210526316
8484.84119.258421052632-34.4184210526316
8585.73119.258421052632-33.5284210526316
8684.61119.258421052632-34.6484210526316
8792.91119.258421052632-26.3484210526316
8899.8119.258421052632-19.4584210526316
89121.19119.2584210526321.93157894736842
90122.04119.2584210526322.78157894736843
91131.76119.25842105263212.5015789473684
92138.48119.25842105263219.2215789473684
93153.47119.25842105263234.2115789473684
94189.95119.25842105263270.6915789473684
95182.22119.25842105263262.9615789473684
96198.08119.25842105263278.8215789473684
97135.36119.25842105263216.1015789473684
98125.02119.2584210526325.76157894736842
99143.5119.25842105263224.2415789473684
100173.95218.937222222222-44.9872222222222
101188.75218.937222222222-30.1872222222222
102167.44218.937222222222-51.4972222222222
103158.95218.937222222222-59.9872222222222
104169.53218.937222222222-49.4072222222222
105113.66115.362142857143-1.70214285714286
106107.59115.362142857143-7.77214285714285
10792.67115.362142857143-22.6921428571429
10885.35115.362142857143-30.0121428571429
10990.13115.362142857143-25.2321428571429
11089.31115.362142857143-26.0521428571429
111105.12115.362142857143-10.2421428571429
112125.83115.36214285714310.4678571428571
113135.81115.36214285714320.4478571428571
114142.43115.36214285714327.0678571428572
115163.39115.36214285714348.0278571428571
116168.21115.36214285714352.8478571428572
117185.35218.937222222222-33.5872222222222
118188.5115.36214285714373.1378571428571
119199.91218.937222222222-19.0272222222222
120210.73218.937222222222-8.20722222222224
121192.06218.937222222222-26.8772222222222
122204.62218.937222222222-14.3172222222222
123235218.93722222222216.0627777777778
124261.09218.93722222222242.1527777777777
125256.88218.93722222222237.9427777777778
126251.53218.93722222222232.5927777777778
127257.25218.93722222222238.3127777777778
128243.1218.93722222222224.1627777777778
129283.75218.93722222222264.8127777777778
130300.98218.93722222222282.0427777777778
 
Charts produced by software:
http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/20l8c1292244101.png (open in new window)
http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/20l8c1292244101.ps (open in new window)


http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/30l8c1292244101.png (open in new window)
http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/30l8c1292244101.ps (open in new window)


http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/4sv8x1292244101.png (open in new window)
http://www.freestatistics.org/blog/date/2010/Dec/13/t1292244015spe7wfjdeml61cr/4sv8x1292244101.ps (open in new window)


 
Parameters (Session):
par1 = 1 ; par2 = none ; par3 = 3 ; par4 = no ;
 
Parameters (R input):
par1 = 1 ; par2 = none ; par3 = 3 ; par4 = no ;
 
R code (references can be found in the software module):
library(party)
library(Hmisc)
par1 <- as.numeric(par1)
par3 <- as.numeric(par3)
x <- data.frame(t(y))
is.data.frame(x)
x <- x[!is.na(x[,par1]),]
k <- length(x[1,])
n <- length(x[,1])
colnames(x)[par1]
x[,par1]
if (par2 == 'kmeans') {
cl <- kmeans(x[,par1], par3)
print(cl)
clm <- matrix(cbind(cl$centers,1:par3),ncol=2)
clm <- clm[sort.list(clm[,1]),]
for (i in 1:par3) {
cl$cluster[cl$cluster==clm[i,2]] <- paste('C',i,sep='')
}
cl$cluster <- as.factor(cl$cluster)
print(cl$cluster)
x[,par1] <- cl$cluster
}
if (par2 == 'quantiles') {
x[,par1] <- cut2(x[,par1],g=par3)
}
if (par2 == 'hclust') {
hc <- hclust(dist(x[,par1])^2, 'cen')
print(hc)
memb <- cutree(hc, k = par3)
dum <- c(mean(x[memb==1,par1]))
for (i in 2:par3) {
dum <- c(dum, mean(x[memb==i,par1]))
}
hcm <- matrix(cbind(dum,1:par3),ncol=2)
hcm <- hcm[sort.list(hcm[,1]),]
for (i in 1:par3) {
memb[memb==hcm[i,2]] <- paste('C',i,sep='')
}
memb <- as.factor(memb)
print(memb)
x[,par1] <- memb
}
if (par2=='equal') {
ed <- cut(as.numeric(x[,par1]),par3,labels=paste('C',1:par3,sep=''))
x[,par1] <- as.factor(ed)
}
table(x[,par1])
colnames(x)
colnames(x)[par1]
x[,par1]
if (par2 == 'none') {
m <- ctree(as.formula(paste(colnames(x)[par1],' ~ .',sep='')),data = x)
}
load(file='createtable')
if (par2 != 'none') {
m <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data = x)
if (par4=='yes') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'10-Fold Cross Validation',3+2*par3,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
a<-table.element(a,'Prediction (training)',par3+1,TRUE)
a<-table.element(a,'Prediction (testing)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Actual',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,paste('C',jjj,sep=''),1,TRUE)
a<-table.element(a,'CV',1,TRUE)
a<-table.row.end(a)
for (i in 1:10) {
ind <- sample(2, nrow(x), replace=T, prob=c(0.9,0.1))
m.ct <- ctree(as.formula(paste('as.factor(',colnames(x)[par1],') ~ .',sep='')),data =x[ind==1,])
if (i==1) {
m.ct.i.pred <- predict(m.ct, newdata=x[ind==1,])
m.ct.i.actu <- x[ind==1,par1]
m.ct.x.pred <- predict(m.ct, newdata=x[ind==2,])
m.ct.x.actu <- x[ind==2,par1]
} else {
m.ct.i.pred <- c(m.ct.i.pred,predict(m.ct, newdata=x[ind==1,]))
m.ct.i.actu <- c(m.ct.i.actu,x[ind==1,par1])
m.ct.x.pred <- c(m.ct.x.pred,predict(m.ct, newdata=x[ind==2,]))
m.ct.x.actu <- c(m.ct.x.actu,x[ind==2,par1])
}
}
print(m.ct.i.tab <- table(m.ct.i.actu,m.ct.i.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.i.tab[i,i] / sum(m.ct.i.tab[i,]))
numer <- numer + m.ct.i.tab[i,i]
}
print(m.ct.i.cp <- numer / sum(m.ct.i.tab))
print(m.ct.x.tab <- table(m.ct.x.actu,m.ct.x.pred))
numer <- 0
for (i in 1:par3) {
print(m.ct.x.tab[i,i] / sum(m.ct.x.tab[i,]))
numer <- numer + m.ct.x.tab[i,i]
}
print(m.ct.x.cp <- numer / sum(m.ct.x.tab))
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (jjj in 1:par3) a<-table.element(a,m.ct.i.tab[i,jjj])
a<-table.element(a,round(m.ct.i.tab[i,i]/sum(m.ct.i.tab[i,]),4))
for (jjj in 1:par3) a<-table.element(a,m.ct.x.tab[i,jjj])
a<-table.element(a,round(m.ct.x.tab[i,i]/sum(m.ct.x.tab[i,]),4))
a<-table.row.end(a)
}
a<-table.row.start(a)
a<-table.element(a,'Overall',1,TRUE)
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.i.cp,4))
for (jjj in 1:par3) a<-table.element(a,'-')
a<-table.element(a,round(m.ct.x.cp,4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable3.tab')
}
}
m
bitmap(file='test1.png')
plot(m)
dev.off()
bitmap(file='test1a.png')
plot(x[,par1] ~ as.factor(where(m)),main='Response by Terminal Node',xlab='Terminal Node',ylab='Response')
dev.off()
if (par2 == 'none') {
forec <- predict(m)
result <- as.data.frame(cbind(x[,par1],forec,x[,par1]-forec))
colnames(result) <- c('Actuals','Forecasts','Residuals')
print(result)
}
if (par2 != 'none') {
print(cbind(as.factor(x[,par1]),predict(m)))
myt <- table(as.factor(x[,par1]),predict(m))
print(myt)
}
bitmap(file='test2.png')
if(par2=='none') {
op <- par(mfrow=c(2,2))
plot(density(result$Actuals),main='Kernel Density Plot of Actuals')
plot(density(result$Residuals),main='Kernel Density Plot of Residuals')
plot(result$Forecasts,result$Actuals,main='Actuals versus Predictions',xlab='Predictions',ylab='Actuals')
plot(density(result$Forecasts),main='Kernel Density Plot of Predictions')
par(op)
}
if(par2!='none') {
plot(myt,main='Confusion Matrix',xlab='Actual',ylab='Predicted')
}
dev.off()
if (par2 == 'none') {
detcoef <- cor(result$Forecasts,result$Actuals)
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Goodness of Fit',2,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'Correlation',1,TRUE)
a<-table.element(a,round(detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'R-squared',1,TRUE)
a<-table.element(a,round(detcoef*detcoef,4))
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'RMSE',1,TRUE)
a<-table.element(a,round(sqrt(mean((result$Residuals)^2)),4))
a<-table.row.end(a)
a<-table.end(a)
table.save(a,file='mytable1.tab')
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Actuals, Predictions, and Residuals',4,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'#',header=TRUE)
a<-table.element(a,'Actuals',header=TRUE)
a<-table.element(a,'Forecasts',header=TRUE)
a<-table.element(a,'Residuals',header=TRUE)
a<-table.row.end(a)
for (i in 1:length(result$Actuals)) {
a<-table.row.start(a)
a<-table.element(a,i,header=TRUE)
a<-table.element(a,result$Actuals[i])
a<-table.element(a,result$Forecasts[i])
a<-table.element(a,result$Residuals[i])
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable.tab')
}
if (par2 != 'none') {
a<-table.start()
a<-table.row.start(a)
a<-table.element(a,'Confusion Matrix (predicted in columns / actuals in rows)',par3+1,TRUE)
a<-table.row.end(a)
a<-table.row.start(a)
a<-table.element(a,'',1,TRUE)
for (i in 1:par3) {
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
}
a<-table.row.end(a)
for (i in 1:par3) {
a<-table.row.start(a)
a<-table.element(a,paste('C',i,sep=''),1,TRUE)
for (j in 1:par3) {
a<-table.element(a,myt[i,j])
}
a<-table.row.end(a)
}
a<-table.end(a)
table.save(a,file='mytable2.tab')
}
 





Copyright

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Software written by Ed van Stee & Patrick Wessa


Disclaimer

Information provided on this web site is provided "AS IS" without warranty of any kind, either express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and noninfringement. We use reasonable efforts to include accurate and timely information and periodically update the information, and software without notice. However, we make no warranties or representations as to the accuracy or completeness of such information (or software), and we assume no liability or responsibility for errors or omissions in the content of this web site, or any software bugs in online applications. Your use of this web site is AT YOUR OWN RISK. Under no circumstances and under no legal theory shall we be liable to you or any other person for any direct, indirect, special, incidental, exemplary, or consequential damages arising from your access to, or use of, this web site.


Privacy Policy

We may request personal information to be submitted to our servers in order to be able to:

  • personalize online software applications according to your needs
  • enforce strict security rules with respect to the data that you upload (e.g. statistical data)
  • manage user sessions of online applications
  • alert you about important changes or upgrades in resources or applications

We NEVER allow other companies to directly offer registered users information about their products and services. Banner references and hyperlinks of third parties NEVER contain any personal data of the visitor.

We do NOT sell, nor transmit by any means, personal information, nor statistical data series uploaded by you to third parties.

We carefully protect your data from loss, misuse, alteration, and destruction. However, at any time, and under any circumstance you are solely responsible for managing your passwords, and keeping them secret.

We store a unique ANONYMOUS USER ID in the form of a small 'Cookie' on your computer. This allows us to track your progress when using this website which is necessary to create state-dependent features. The cookie is used for NO OTHER PURPOSE. At any time you may opt to disallow cookies from this website - this will not affect other features of this website.

We examine cookies that are used by third-parties (banner and online ads) very closely: abuse from third-parties automatically results in termination of the advertising contract without refund. We have very good reason to believe that the cookies that are produced by third parties (banner ads) do NOT cause any privacy or security risk.

FreeStatistics.org is safe. There is no need to download any software to use the applications and services contained in this website. Hence, your system's security is not compromised by their use, and your personal data - other than data you submit in the account application form, and the user-agent information that is transmitted by your browser - is never transmitted to our servers.

As a general rule, we do not log on-line behavior of individuals (other than normal logging of webserver 'hits'). However, in cases of abuse, hacking, unauthorized access, Denial of Service attacks, illegal copying, hotlinking, non-compliance with international webstandards (such as robots.txt), or any other harmful behavior, our system engineers are empowered to log, track, identify, publish, and ban misbehaving individuals - even if this leads to ban entire blocks of IP addresses, or disclosing user's identity.


FreeStatistics.org is powered by