Patrick Wessa, Bart Baesens, Stephan Poelmans, Ed van Stee
K.U.Leuven Association, Integrated Faculty of Business and Economics
Workshop website: http://www.freestatistics.org/workshop/
This workshop may be useful for anyone with an interest in one of the following three topics:
Statistics Education within the pedagogical paradigm of social constructivism,
Educational Research based on objectively measured learning activities which have never been available before,
Empirical Research which is fully reproducible – this is called Reproducible Computing which supports communication, collaboration, and dissemination of research results.
The main focus of this workshop is on Statistics Education or any type of education where students need to be able to interact with and communicate about empirical research results. In this sense, the workshop may be of interest to academics from various fields.
The second focus is on Educational Research and Quality Control (of the course environment). The workshop clearly illustrates how the learning outcomes (as measured by objective exams) can be related to (or predicted by) various factors such as: objectively measured learning activities, learning attitudes, social interaction/networking, etc... The model that describes such a relationship is useful for research purposes and allows us to control/improve the quality of our education.
Anyone with an interest in presenting Empirical Research results in a form which allows readers to fully reproduce and reuse the underlying computations, may find this workshop useful too. The pedagogical focus of this workshop does not imply that Reproducible Computing technology is solely useful for educational purposes.
The novelty about our newly developed Reproducible Computing technology1 lies in the fact that it empowers students (and the educator) to easily archive, exchange, reproduce, and reuse R computations [21], [22], [32]. This technological innovation allows us to create and maintain a learning environment that supports social constructivism which can be shown to be very helpful in learning statistics [23], [33]. The basic idea is to create an environment where students are allowed to interact with each other (and the tutor) about (a series of) research-related activities (such as assignments or workshops) based on the R language [16] and the R Framework2.
Within the context of ICT-based and math-related education, the academic community has shown great interest in the role and importance of social and individual constructivism ([19], [18], [4], [12]) and its implementation in statistics education in particular [13].
The following quote summarizes the importance and the great interest of educational researchers in constructivism [10]:
Constructivism is a philosophy that supports student construction of knowledge. Since students uniquely construct their knowledge, instructional strategies that support constructivist philosophies naturally advocate student understanding. Instructional trends in the mathematics and statistics education communities support the active-learning orientation of constructivist philosophy. I posit that, while not the only philosophy of teaching and learning, constructivism is one of the best such philosophies. One question remains: "How do instructional strategies that support student knowledge construction address the needs of all students?"
In September 2007, our early research results were presented at the Applied Statistics conference: the relationships between student's learning attitudes [11], social interaction (through group work and Peer Assessments), learning experiences [11], and exam scores were investigated [20]. One of the conclusions in the presentation was that social interaction through Peer Assessment (which is used as a "learning activity" rather than a "evaluation tool") was very beneficial for the learning experiences of students, which in turn is correlated with final exam performance.
In the presentation it was also concluded that the main disadvantage of the proposed constructivist approach to statistics education lies in the fact that students (and the educator) have to assess a series of workshop submissions that are (almost) irreproducible. Solving the difficulties that are involved in reproducing the research results from students is a "conditio sine qua non" if the constructivist approach to statistics learning is to be used on a large scale.
Another important aspect of this problem is related to the fact that educators are only able to assess the output (= submitted paper) when they request students to work on an assignment. The educator has a pretty good idea of what the learning goals are and what the end result should be. There is, however, no information about the learning/research process that leads to the result. Therefore, the educator is unaware of any difficulty that might have occurred during the process:
technical (computer-related) problems,
statistical pitfalls (do students understand every aspect of the analysis?),
plagiarism and free-riding (who is the real author of the submitted paper?),
heavy workload (how much time was spent to do the analysis?), etc...
The Compendium Platform solves all of these problems through its underlying Reproducible Computing technology. The main benefits of Reproducible Computing are based on the fact that it effectively supports:
the creation of interactive learning environments where students are able to experiment, communicate, and collaborate
the dissemination of truly reproducible, empirical research (at no cost)
pedagogical research about statistics learning based on objectively measured learning activities that are otherwise unobservable
These are the goals of the workshop:
Understand the underlying technology – on an intuitive level (session 1).
Get familiar with the literature and research (session 1).
Explore the benefits of Reproducible Computing (session 1):
for statistics education, empirical papers, and master theses
empirical research
science dissemination
Learn how to (re)produce calculations and use them in derived research/work (session 2).
Learn how to create and use Compendia (in LaTeX scripts, Word processors, Presentations, Wikis) using archived computations in www.freestatistics.org (session 2).
Learn how Peer Assessment can be performed using the R framework integrated in www.freestatistics.org (session 2).
Learn to integrate R modules and the FreeStatistics archive in your learning environment (session 2).
How can we explore the information contained in the archive. What can we learn? How can we improve? (session 2).
Learn to create new R modules – or provide our team with the information that is necessary to create them for you (session 3).
Learn about the RC package which connects the R console to the Compendium Platform (session 3).
Learn to use the database of objectively measured learning activities to build statistical models (session 3).
Learn about the data mining tools that allow you to explore the database (e.g., social interaction between students during assignments).
Every session takes about 50 minutes – depending on the feedback we receive from registered participants. There is a 10 minute break between each session. The detailed outline shown below is subject to change and primarily depends on reported interests of participants. Registration is required and participants are asked to send us feedback about their interests through an online voting system: aspects with a high number of votes are emphasized during each session. Participants who use R scripts in education/research are encouraged to send us samples so that we can integrate them in the workshop.
Session 1. Brief description of the underlying technology and pedagogical aspects of Reproducible Computing:
A brief overview of pedagogical theories (Constructivism, Constructionism, Behaviorism, etc...) [2]
What is Reproducible Computing? [14], [17] How does it work? [1], [3], [5], [6], [7], [8], [21], [25], [32]
R Framework (www.wessa.net)
Statistical Computations Archive (www.freestatistics.org)
A real example of a constructivist statistics course (in Moodle)
Some ethical considerations
Some empirical results and past experiences, incl. testimonials [27].
Guidelines to build effective learning environments [29], [15]:
what can and what can't be done in an electronic learning platform (such as Blackboard, or Moodle)
how to create effective assignments and workshops
what about timing?
how many assignments?
do learning attitudes and gender matter? [9]
Guidelines to build reproducible course materials (two extensive examples are provided).
Some relevant, statistical considerations:
why objective measurements are better than self-reported data (biases and measurement errors, [26], [34]).
how to measure learning outcomes with exams and relate them to learning activities – an approach based on objective exam score transformations [28].
taking care of multiple dimensions in data mining and avoiding the type I error trap [31].
the data structure of social interaction and networking based on Reproducible Computing.
Session 2. Hands-on session
(create your own calculations and use them in your research texts or
courses). Note: a comprehensive multimedia tutorial will be made
available on CD to all participants.
create new R modules (and publish them in www.wessa.net)
create a "derived" R module based on previously stored computations
how to submit a proposal for a new R module (that is created by our team)
implement R-modules in your learning environment
generate computations with R-modules (insert datasets, change parameters, change settings, tricks&treats)
how do sessions work?
save results for later use (in www.freestatistics.org)
search computations and generate simple reports
reproduce & reuse computations (with or without changes to the dataset, parameters or underlying R code)
insert the saved results in your research (how to create reproducible research in LaTeX, Word, OpenOffice, Presentations, Wikis)
Session 3. Hands-on session:
(requires some basic knowledge of the R or S-plus language)
The RC package:
store, retrieve, and share image files of R sessions
archive, search, retrieve, and share R scripts
convert and publish R scripts on the internet (such that the resulting web software can be used without downloading or installing anything)
security issues, moratorium dates
search features
Data mining tools to explore the Compendium Platform's database
What if I want to setup my own server?
Technical limitations
Fraud detection/prevention [30]
How to integrate Reproducible Computing Technology in other applications? An example based on an online stock market game. During a short demonstration, participants can actively trade shares on a real trading platform and analyze the stock prices in real time.
Outlook for the future:
joint research opportunities
Reproducible Computing for scientific publishing
towards a foundation of Reproducible Computing
The following requirements and limitations apply:
Attendees should preferably have an interest in Statistics Education. Attendees who wish to participate in session 3 of the workshop should have a basic knowledge of the R (or S-plus) language.
There is a maximum of 20 participants for sessions 2 and 3. There is no limitation for the first session.
Attendees of sessions 2 and 3 are encouraged to bring their own wifi-enabled laptop to benefit from the workshop.
Every participant is required to register before 31 July 2009 on the workshop home page. The registration application will be closed when the maximum number of participants is reached.
Printed documents and CDs will be made freely available to all registered attendees. In addition, attendees will be supported if they wish to implement aspects of this workshop in their education or research (free of charge). Additional copies (of printed materials or CDs) will be available for purchase.
The use of the Compendium Platform is free of charge for non-commercial purposes. Some restrictions might apply when using the Reproducible Computing infrastructure with (very) large student groups.
This research is funded by the OOF/13 2007 grant and supported by the K.U.Leuven Association.
[1]. de Leeuw J., “Reproducible research: the bottom line,” in Department of Statistics Papers, 2001031101, Department of Statistics, UCLA, 2001
[2]. Conole, G., Dyke, M., Oliver, M., and Seale, J.: Mapping pedagogy and tools for effective learning design, Computers & Education 43, 2004
[3]. Donoho D. L. and X. Huo, “Beamlab and reproducible research,” International Journal of Wavelets, Multiresolution and Information Processing, 2004
[4]. Eggen P. and D. Kauchak, Educational Psychology: Windows on Classrooms. Upper Saddle River, NJ: Prentice Hall, 5th ed. ed., 2001
[5]. Gentleman R., “Applying reproducible research in scientific discovery,” BioSilico, 2005
[6]. Green P. J., “Diversities of gifts, but the same spirit,” The Statistician, pp. 423–438, 2003
[7]. Koenker R. and A. Zeileis, “Reproducible econometric research (a critical review of the state of the art),” in Research Report Series, no. 60, Department of Statistics and Mathematics Wirtschaftsuniversität Wien, 2007
[8]. Leisch F., “Sweave and beyond: Computations on text documents,” in Proceedings of the 3rd International Workshop on Distributed Statistical Computing, (Vienna, Austria), 2003
[9]. Milis, K., Wessa, P., Poelmans, S., Doom, C., and Bloemen, E.: The Impact of Gender on the Acceptance of Virtual Learning Environments, Proceedings of the International Conference of Education, Research and Innovation, International Association of Technology, Education and Development, 2008
[10]. Miller, J. B.: Examining the interplay between constructivism and different learning styles, www.stat.auckland.ac.nz/ ~iase/publications/1/8a4_mill.pdf, 2005
[11]. Moodle: A Free, Open Source Course Management System for Online Learning, http://www.moodle.org, 2008
[12]. Moreno L., C. Gonzalez, I. Castilla, E. Gonzalez, and J. Sigut, “Applying a constructivist and collaborative methodological approach in engineering education,” Computers & Education, vol. 49, pp. 891–915, 2007
[13]. Mvududu, Nyaradzo: A Cross-Cultural Study of the Connection Between Students' Attitudes Toward Statistics and the Use of Constructivist Strategies in the Course, Journal of Statistics Education 11(3), 2003
[14]. Peng R. D., F. Dominici, and S. L. Zeger, “Reproducible epidemiologic research,” American Journal of Epidemiology, 2006
[15]. Poelmans, S., Wessa, P., Milis, K., Bloemen, E., and Doom, C.: Usability and Acceptance of E-Learning in Statistics Education, based on the Compendium Platform, Proceedings of the International Conference of Education, Research and Innovation, International Association of Technology, Education and Development, 2008
[16]. R Development Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN 3-900051-07-0
[17]. Schwab M., N. Karrenbach, and J. Claerbout, “Making scientific computations reproducible,” Computing in Science & Engineering, vol. 2, no. 6, pp. 61–67, 2000
[18]. Smith E., “Social constructivism, individual constructivism and the role of computers in mathematics education,” Journal of mathematical behavior, vol. 17, no. 4, 1999
[19]. Von Glasersfeld E., “Learning as a constructive activity,” in Problems of Representation in the Teaching and Learning of Mathematics, pp. 3–17, Hillsdale, NJ: Lawrence Erlbaum Associates, 1987
[20]. Wessa, P., Learning Attitudes, Peer Assessment, and Gender in the context of a Social Constructionist Statistics Course, Applied Statistics Conference, 2007
[21]. Wessa P., “Learning statistics based on the compendium and reproducible computing,” in Proceedings of the World Congress on Engineering and Computer Science (International Conference on Education and Information Technology), UC Berkeley, San Francisco, USA, 2008
[22]. Wessa P. and E. van Stee, Statistical Computations Archive (online software at http://www.freestatistics.org). K.U.Leuven Association, Belgium, 2008
[23]. Wessa P., “How reproducible research leads to non-rote learning within a socially constructivist e-learning environment,” in Proceedings of the 7th European Conference on e-Learning, (Cyprus), 2008
[24]. Wessa P., Free Statistics Software (online software at http://www.wessa.net). Office for Research Development and Education, 1.1.23-r2 ed., 2008
[25]. Wessa P., “A framework for statistical software development, maintenance, and publishing within an open-access business model,” Computational Statistics, 2008
[26]. Wessa P., “Measurement and control of statistics learning processes based on constructivist feedback and reproducible computing,” in Proceedings of the 3rd International Conference on Virtual Learning, (Constanta, Romania), 2008
[27]. Wessa, P.: Assessment of Reproducible Computing as an E-Learning Tool in Statistics Education, Proceedings of the World Conference on E-Learning in Corporate, Government, Healthcare, & Higher Education, 2008
[28]. Wessa, P.: Discovering Computer-Assisted Learning Processes based on Objective Exam Score Transformations, Proceedings of the World Congress on Educational Sciences, 2009
[29]. Wessa, P.: Designing Statistical Learning Environments with Educational Compendium Technology, Proceedings of Computer-Assisted Learning (CAL'09), 2009
[30]. Wessa, P., and Baesens B.: Fraud Detection in Statistics Education based on the Compendium Platform and Reproducible Computing, IEEE Proceedings of the World Congress on Computer Science and Information Engineering (CSIE), 2009
[31]. Wessa, P., and Baesens, B.: Explorative Data Mining of Constructivist Learning Experiences and Activities with Multiple Dimensions, Proceedings of the International Conference on Computer and Instructional Technologies, World Academy of Science, Engineering and Technology, 2009
[32]. Wessa, P.: Reproducible Computing: a new Technology for Statistics Education and Educational Research, IAENG Transactions on Engineering Technologies, American Institute of Physics, Eds: Rieger, Burghard, Amouzegar, Mahyar A., and Ao, Sio-Iong, *forthcoming*, 2009
[33]. Wessa, P.: How Reproducible Computing Leads to Non-Rote Learning Within Socially Constructivist Statistics Education, Electronic Journal of e-Learning 6, *forthcoming*, 2009
[34]. Wessa, P.: Quality Control of Statistical Learning Environments and Prediction of Learning Outcomes through Reproducible Computing, International Journal of Computers, Communications & Control 4(2), 2009
1The purpose of this project is to facilitate the creation, maintenance, and permanent storage of statistical computation objects that empower authors to publish reproducible and reusable research (Compendium) through a series of web services. A Compendium is defined as any document that contains references (URLs) to permanently stored objects that can be retrieved, recomputed, and reused in real time without the need to download or install anything on the client machine. The underlying philosophy is that referencing stored computations allows authors to create reproducible and reusable research. In addition, this mechanism effectively facilitates peer review and collaboration among students and scientists. The use of this system is free of charge for educational and research purposes.
2There are several fundamental problems with statistical software development in the academic community. In addition, the development and dissemination of academic software will become increasingly difficult due to a variety of reasons. To solve these problems, a new framework for statistical software development, maintenance, and publishing was developed: it is based on the paradigm that academic and commercial software should be both cost-effectively created/maintained and published with Marketing Principles in mind. The framework has been seamlessly integrated into a highly successful website (www.wessa.net) that operates as a provider of free web-based statistical software.