Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. Today, there are already 33 modules directly linked to the field, excluding the courses where statistics and data science are solely used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.
I decided to make a list of all Coursera courses that use R as either their first choice, or as one of the many statistical software packages allowed to use by students to perform the homework’s assignment. Coursera does not publish all data on how many students enroll in their courses, but most (some?) courses reach well over a hundred thousand students each year.
To have some kind of indication of their popularity, I list below all courses using R ranked by the number of facebook likes:
Given the unwillingness of Coursera’s search function, I had to manually draft the list above. Therefore, it is possible I overlooked some of the courses. Feel free to mention them in the comment section, and I will make sure to update the list. In case you are interested in taking (or teaching) interactive data analysis courses, make sure to have a look at our own educational startup DataMind.
While I expect that most of you are familiar with Coursera, for those who don’t a quick summary: Coursera is one of the leading providers of Massive Open Online Courses (MOOCs). Today they have more then 100+ institutional partners offering 500+ courses to over 5 million students worldwide. So despite being criticized by some, it is becoming more and more clear that they are here to stay.
www.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.
We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ‘embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.
Working together with the help of R-fiddle
You can use R-fiddle to share code snippets with colleagues when tossing around ideas, in order to find that annoying bug, or by making your own variations on others people code. It’s easy: Just go to http://www.R-fiddle.org, type your code, and get your public URL by pressing ‘share’. This is a lot easier for your potential troubleshooter/colleague/.. since (s)he can immediate run and check the code, save it once finished and share it again. So by sharing your R-code through R-fiddle, you can not only help others to better understand your code, but they can also help you!
Embedding an R-fiddle in your blog or website
Embedding the interactive code of your fiddle on a website or blog is easy. R-fiddle automatically generates a piece of code that you can then simply paste in your HTML at the desired place.
You can choose between two ways to embed the code: with or without the console. If you embed a fiddle with the console, your visitors can edit and run your code within the environment of your own site. If you embed a fiddle without the console, your visitors can see the code with a link to the r-fiddle website where they can edit and run it. For more information on how to embed interactive code, just check the documentation at http://www.r-fiddle.org/#/help
The R-fiddle working environment
Working with R-fiddle is very straightforward. The page exists out of two sections. The main section of the site (on the left) is divided into two areas: the editor and the console. Here is were you put your code. They work just like the standard editor and console you are familiar with from your IDE. For example, it colour-codes the syntax. The right pane is the discussion area. Here others can comment on your code, make suggestions, or ask questions. You can immediately see the comments others made, making collaboration easy.
The R-fiddle buttons
The R-fiddle interface provides plenty of features to assist in your development. The buttons at the top of the page include:
Save: By clicking save you activate the Embed and Share buttons. You always have to click save first, that’s when R-fiddle knows things are getting serious.
Embed: This allows you to embed your code on your website and blog with the help of an iframe.
Share: This allows you to share code from the R-fiddle page with other users. You can share it through a web link, Facebook and Twitter. These users can than provide feedback or even adapt/fix your code within their own browser.
Run:Executes the code entered in the editor, and displays the results in the console area.
Graph: Here you can find the graphs that are possibly created by your code.
With this quick tour on R-fiddle, we hope to have given you a better understanding of what it provides and why you should use it. Please be aware that R-fiddle is a hosted application in beta, so performance can degrade during peak usage. As R-fiddle usage increases, we will add more servers to it asap. Check out www.R-fiddle.org today, and you will discover its power!
Last week, we launched the early stage beta version of our interactive online learning platform for R: DataMind.org. The development of this educational platform required the creation of a new IT infrastructure able to run R in the cloud. In this post, we share our approach and insights on the design of such an application and hope it can provide an inspiration for the development of other web- based interfaces to R. Just for the record, the architecture we developed for DataMind.org is intended for relatively complex operations. If you want to run R in the cloud for more simple use cases, we recommend you to have a look at the powerful shiny packageor the opencpu project (or for integration in business processes, at the R Service Bus). These approaches are most useful for those interested in building web apps on top of R for specific use cases. If you want to do data analysis yourself while running R in the cloud with e.g. RStudio server, have a look at this great post from Tal Galili.
The DataMind platform consists of two parts: a front-end application, which our users see on their pc, laptop or tablet and that runs in their browser; and a collection of backend applications that handle all the interaction between users and the platform (see Figure below).
The back-end application falls into two parts that are both hosted in the cloud: a cluster of R servers and the DataMind web application. The former is hosted on Amazon EC2, a very popular pay-as-you-go virtualization platform allowing easy and cheap scaling of the R computation capacity. The DataMind web application itself is written in Ruby on Rails, an agile programming framework that runs on the Ruby programming language, and hosted on the flexible and well-known cloud application provider Heroku. The Ruby on Rails application handles user accounts, the management of the exercise’s data, and the ability to create new exercises, etc. In our experience we found Ruby on Rails to be one of the the most productive ways to build web applications. To manage the connection between the DataMind web application and the R servers we use the Rserve package on the R servers. This package makes an R installation remotely accessible. To end, we ensure security with the help of the nice RAppArmor package of Jeroen Ooms.
As mentioned in the beginning, the platform is still in heavy development and we’re experimenting with new and different options on a daily base. If you have any feedback or suggestions on how we can improve this educational platform, please do not hesitate to share them with us: email@example.com.
DataMind is the first free interactive online learning platform for R. Through an in-browser coding environment we offer exercise-based learning-by-doing. Our goal is to build a fun learning experience for data analysis and R, while allowing anyone to create courses! You can check out an early stage beta version at www.DataMind.org !
With DataMind, we focus on three things: (1) make the educational experience interactive and fun for students, (2) make the platform and the content available for free, and (3) stimulate content creation by the community (you! Drop us a line if you are interested to create courses, the course creation interface is work in progress). Our focus on interactivity and fun is driven by our believe that you learn data analytics by doing! We do not believe in copying the classroom online. That is why all our courses are constructed around an in-browser coding interface, allowing users to start coding R from day one with the help of instant feedback. Over time, challenges and competitions will be added to courses as well, so users can also interact with each other.
We were inspired to start this project by innovative start-ups who offer interactive web development courses. These start-ups put a focus on learning-by-doing through in-browser coding, elements of gamification, and community provided content. It turned out this approach was a huge hit, but we got frustrated it didn’t exist for R and data analysis. Having experience in teaching statistics, we were convinced data analytics education could greatly benefit from such a didactic approach that focuses on learning-by-doing. Next, the data science industry itself is experiencing a huge increase in popularity. And last but not least, we strongly believe data analytics and its visualisation needs a somewhat tailored learning approach compared to web development.
So we started coding!
We are developing DataMind in such a way that it supports, and even stimulates, content creation by the community. The key succesfactor of an online learning platform depends on the strength of the available content. Today, R is used in many domains that are often relatively unrelated. (e.g. finance and biostatistics). With community content generation, experts of these diverse fields can share and create interactive content much faster and of much higher quality than we could ever do ourselves. For you as a course creator, it’s a scalable way to spread knowledge, build reputation and provide a fun learning experience to your students. In other words, we need you
Where do we stand today? At www.DataMind.org you can check out an early stage beta version of the platform and enroll in our first course ‘Summer of R‘. ‘Summer of R’ is aimed at those new to R that want to master the basics so they can start doing their own analysis. Furthermore, we’re working very hard on the course creation interface so everyone can start creating interactive courses soon.
If you feel enthusiastic about this project, and want to create interactive courses either for academic purposes, professional reasons or just for fun. Or if you have suggestions, feedback, questions… Do not hesitate to send an e-mail to firstname.lastname@example.org. (We would love feedback!)
Last week, I was working on an educational R project when I needed to consult the help files of different R packages and functions online. After doing some Google searches, it appeared to me that finding an easy-to-use tool was not as simple as I had expected. The closest that I got, were the websites Inside-R and R search, but as a user it wasn’t as “smooth” as what I was looking for. (I needed something really user-friendly for this educational project). Therefore, inspired by the documentation websites of programming languages/frameworks such as Ruby on Rails and AngularJS, I decided to build an online documentation search interface for R myself together with colleagues. Check the result on www.Rdocumentation.org!
Checking R documentation online instead of with the built-in R help function, can often provide some extra benefits. First, you are capable of searching through the latest version of all R packages, even those that are not installed on your device. This makes it not only a help tool, but also a tool for discovery. Second, I added the discussion system ‘Disqus’. For every function and package, Disqus allows users to ask questions, add extra examples to the documentation, etc. Furthermore, today’s web development tools allow you to build a more user-friendly interface. Especially for an R-beginner this can be helpful. And last but not least, since R is a “one letter word”, googling for “R” + “something” is always a challenge. Having all the documentation in one place can at least eliminate that frustration.
I wrote the code for www.Rdocumentation.org together with some colleagues. It is quite dirty code since it only needed to get the job done, but for those interested just send me a request. Also, while coding, we discovered the great staticdocs package of Hadley Wickham, it was not exactly what we needed but maybe it can be used for other/similar initiatives. For all packages on CRAN, the help files were generated in html. Next, these html files were parsed and inserted into an SQL database. We opted for Ruby on Rails to build the web app, that serves all the documentation on R packages and functions. Finally, using JQuery and Twitter Bootstrap, we built the instant search tool that allows you to see all R packages and R functions immediately while typing.
In this post, we briefly summarize and discuss the results of our survey on “R and education”. Before diving into the figures, we would like to express our sincere gratitude and appreciation to the 286 R enthusiasts that invested their valuable time to fill out this survey. Furthermore, you can download the complete dataset of the survey or browse an overview of all questions (see bottom of the post for more information), so feel free to do your own analysis, and share it. Note that the right panel of this page provides the answers to some open-ended questions in the survey.
Interestingly, respondents came from diverse backgrounds, both geographically as well as in terms of occupation. The left panel of Figure 1 illustrates respondents are mainly active as academics (50.5%), followed by professionals (30%) and students (19.5%). Academics from about 80 different universities, mainly located in the US and Europe, participated. About 24 respondents were R package authors.
The online survey was distributed through the R mailing lists and our personal contacts. Figure 1 demonstrates the geographical origin of the respondents. Individuals from all 4 continents participated, with the majority based in the US. Although there is selection bias when conducting an online survey in this way, we believe the current diversity of respondents is interesting and adds some flavor to the results.
Next, we first discuss the main takeaways regarding the respondent’s views on R in general. A more focused section follows on R and education. To end, we discuss the next steps we want to undertake based on this survey’s results.
Why you love R and expect its market share to go up
Respondents (from the group “professionals that use R”) are very optimistic when asked about the future spreading of R in the world, as illustrated in Figure 2. An impressive 79.7% of respondents expect the future usage to go up in comparison to other statistical packages such as SAS and SPSS, only 11.9% expects it will remain stable, and just 3.4% of the respondents take a pessimistic view, expecting it will go down.
Figure 3 shows that respondents (from the group “professionals that use R”) mainly love R because of its functionality (86.2%) and the community (65.5%). Other reasons to love R cited under “other” are (among others): “many packages”, “cross platform” and “wonderful for graphics”. All that glitters is not gold though. When asked about their biggest frustration when using R, only 19% answers “Nothing, R is perfect”. The biggest frustrations reported by respondents are “the lack of documentation” (29.3%) and “the lack of consistency” (22.4%). A large number of respondents (34.5%) provided an open-ended response on this question as well. We listed the open-ended responses to this question in the right panel of this page as well as the open-ended responses to what respondents consider as the main disadvantages of R.
Major interest in online learning and teaching R
“R best matches the concept of ‘computational thinking’, a core idea that my students need”
Whether you are completely new to R, or you are a veteran with multiple years of experience, there is always room to learn and improve. As illustrated in Figure 4, one of the main sources to develop new R skills are online resources such as websites and online communities. This is true for both academics (92.4%), and professionals (94.9%). The second most cited educational source is the build-in R help feature, mentioned by 77.2% of the academics, and 83.1% of the professionals. Textbooks, which can be seen as a more traditional way to learn and teach, are placed third.
Today, numerous online courses on statistics are already making use of the R language to explain data analytics concepts. Some of the most noteworthy and successful examples are the Coursera courses from Roger D. Peng (Computing for Data Analysis), and Eric Zivot (Introduction to Computational Finance and Financial Econometrics). This proven need for online educational sources for statistics and R, raises the question if it would be possible to identify different and even more engaging ways to learn R online. The ‘R in Education’ survey indicates over 75% of students are interested to take online courses with an interactive component. Of the Academic respondents, 68.6% shows interest in online interactive courses and 13% would be willing to pay for these courses (see Figure 5). Our survey results are thus in line with the observation that online interactive courses as offered by codecademy.com, codeschool.com, etc. have gained enormous popularity recently.
Naturally, in open-source communities most things are developed and offered for free. As noted in the previous paragraph, interactive online courses would be a valuable addition to the current spectrum of R’s educational sources. Since our results indicate that demand for free courses would be high, the question manifests itself: Who will develop these free courses? A reasonable assumption would be to look at people already developing free software such as the R package authors. Indeed, 70% percent of R package authors in the survey indicated that, given an easy-to-use development platform exists, they would be willing to create such interactive learning tools for their packages for free (note that the sample is small though). Therefore, it might be interesting to develop and eventually provide such a platform as a way to spread data analytics knowledge in general, and the R statistical programming language in specific.
New educational tools to teach R and statistics?
This survey largely confirmed our believe that there is a need for more online educational tools to teach R. These tools should take into account the added value of an interactive approach, as well as the characteristics and benefits of an open-source community. Therefore, we started working on an open interactive exercise platform for statistics and R.
To receive updates on our future progress, or if you are willing to provide us with feedback while building this learning platform, please leave your e-mail address below.
Download the full dataset of the survey here. The dataset is structured as follows: qla is a list in which each list-item contains the information of exactly one question in the survey. Each list-item in qla is itself again a list with the following items:
First list-item: The question asked
Second list-item: The answer possibilities
Third list-item: The data with the answers. Rows for respondents, columns for answers.
NOTE: For privacy reasons we removed all information from the dataset that could result in identification of the respondents (e.g. emails, university affiliation,..). Please contact us in case we overlooked something.
We would like to offer our apologies for the following errors that ended up in the survey:
When selecting that R is more complex to learn than other statistical languages, one of the following questions stated that you indicated that R was less complex to learn.
In order to better target the questions and to avoid making the survey even longer, we opted to mostly ask different questions to each type of respondent (Students/Academics/Professionals). Therefore, it is not often possible to make comparisons of the different types of respondents, which is a pity in hindsight.