RapidMiner R Extension


The RapidMiner R extension integrates RapidMiner with the widely used open source statistics package R. Besides providing new RapidMiner operators that execute R functions, it also comes with a comfortable R console that can also be used to generate R plots.

RapidMiner ExampleSets can be directly used as input to R operators, and are internally converted to the table representation of R. Furthermore, the user can execute arbitrary R scripts as RapidMiner operators. To save the user from defining frequently used scripts over and over again for each process, they can define such re-usable scripts as custom RapidMiner operators.

For the analysis of bio-data, the R bioconductor packages are particularly relevant. These also operate with this extension, and an example process has been posted to myExperiment.

Some pre-defined R functions can be used inside RapidMiner just as any other RapidMiner operator. For a list of these operators, see the tab labelled "Operators"

In order to include your own R scripts in RapidMiner, you need to use the Execute Script (R) operator. This operator has three parameters, script, inputs, and resultse. Clicking on the script button, you can enter the R script that will be executed by this operator. But how can you connect the script to your RapidMiner objects?

RapidMiner objects used by the script are delivered to the operator through its input ports. These objects will be converted to R objects, where possible. E.g., example sets are converted to R's data frames. The variable names under which these objects are made available to the script can be defined by the user by clicking on the inputs button. Finally, after completing the script, R objects can be retrieved from the script and converted back to RapidMiner objects and assigned to the operator's output ports. Again, the names of the output variables can be defined when clicking on the results button. In addition to that, you can also specify how conversion is done. Data frames can be converted back to RapidMiner's example sets. Alternatively, they can be treated as generic R objects. This means that RapideMiner does not interpret or visualize them in any way. However, you can still feed these objects into subsequent operators.

Arbitrary R scripts can be executed from within RapidMiner processes. However, Several R modelling methods are directly available as RapidMiner operators making them still easier to use:

  • k-Nearest Neighbor from kknn package: kknn
  • Naive Bayes from e1071 package: naiveBayes
  • Linear Discriminant Analysis from MASS package: lda
  • Quadratic Discriminant Analysis from MASS package: qda
  • Regularized Discriminant Analysis from klaR package: rda
  • Mixture Discriminant Analysis from mda package: mda
  • Logistic Regression from stats package: lm
  • Decision Tree from rpart package: rpart
  • Random Forest from randomForest package: randomForest
  • Boosting from ada package: ada
  • Boosting from adabag package: adaboost.M1
  • Gradient boosting machine from gbm package: gbm
  • Support Vector Machines from kernlab package: ksvm
  • Neural Network from nnet package: nnet
  • Simple linear regression from stats package: lm
  • Ridge regression from penalized package: penalized
  • Lasso regression from penalized package: penalized
  • K-Nearest-Neighbor regression from kknn package: kknn
  • Gradient boosting machine from gbm package: gbm
  • Gradient boosting with regression trees from mboost package: blackboost
  • Support Vector Machines from kernlab package: ksvm
  • Neural Network from nnet package: nnet

The following video demonstrates the usage of the RapidMiner R Extension:

Using the statistics package R within RapidMiner.

The R Extension can be downloaded from the Rapid-I Marketplace from within RapidMiner.

For using the R Extension of RapidMiner, you need to have R installed on your computer. You will need an R version of at least 2.11.1. Since there have been some changes from 2.11.x to 2.12.x of R, there are two separate guides.
If you don't have R installed, please refer to the homepage (http://www.r-project.org) of the R Project to get the latest download version. Please make sure that R is compiled for the same architecture as RapidMiner: An 32 bit RapidMiner can only access a 32 bit R, and a 64 bit RapidMiner needs 64 bit R.
Please follow the instructions below to complete the installation on R 2.11.x:

  1. After you have installed a proper version of R, you need to open the R console to prepare R to support being accessed by RapidMiner. You have to install rJava available on CRAN by typing the following command on the console:
    install.packages("rJava")
  2. You should now type .libPaths() and note the listed directories. One of them is used to store the rJava package and you will need to remember it when RapidMiner starts. If finished successfully, R is prepared for RapidMiner.
  3. Now you have to make sure, that RapidMiner can find the program libraries of R by adding it to the PATH environment variable of your operating system: Add <R installation directory>/bin to the PATH variable on Linux systems or <R installation directory>\bin on Windows machines. This directory must contain the dynamic link library of R and all dependent libraries, which is called libR.so, R.so or R.dll depending on your operating system.
  4. If you don't already have a valid JAVA_HOME environment variable pointing to the installation directory of your java, you need to create it. If you don't have java installed on your computer, point it to <RapidMiner installation directory>/jre. You have to make sure, that the java is of the same architecture as R and RapidMiner, either 32bit or 64bit.
  5. If you don't already have a valid R_HOME environment variable pointing to the installation directory of R, you need to create it. Be careful that this points to the actual used R directory, so that the architecture matches.
  6. Now you can configure RapidMiner. If you click on Next, you will be asked to enter the path to the jri library that has been installed with rJava. This is located in one of the .libpath directories noted above and more exact in <libpath directory>/rJava/jri/. The filename depends on your operating system and is jri.dll on Windows and libjri.so or jri.so on Unix.
  7. If you have selected the file, RapidMiner will exit, since the new environment variables are still unknown and you need to restart the program manually. If you have started RapidMiner from console, please restart the console as well.

Please follow the instructions below to complete the installation on R 2.12.x:

  1. After you have installed a proper version of R, you need to open the R console to prepare R to support being accessed by RapidMiner. You have to install rJava available on CRAN by typing the following command on the console:
    install.packages("rJava")
  2. You should now type .libPaths() and note the listed directories. One of them is used to store the rJava package and you will need to remember it when RapidMiner starts. If finished successfully, R is prepared for RapidMiner.
  3. Now you have to make sure, that RapidMiner can find the program libraries of R by adding it to the PATH environment variable of your operating system: Add <R installation directory>/bin/<architecture> to the PATH variable on Linux systems or <R installation directory>\bin\<architecture> on Windows machines. The architecture depends on the architecture of RapidMiner. If you want to execute 32bit RapidMiner, you must set the path to the i386 subdirectory, otherwise choose the x64 subdirectory. This directory must contain the dynamic link library of R and all dependent libraries, which is called libR.so, R.so or R.dll depending on your operating system. Make sure that you only have one version of each libary in the PATH. Otherwise the first version found will be used, which might not be of the correct architecture.
  4. If you don't already have a valid JAVA_HOME environment variable pointing to the installation directory of your java, you need to create it. If you don't have java installed on your computer, point it to <RapidMiner installation directory>/jre. You have to make sure, that the java is of the same architecture as R and RapidMiner, either 32bit or 64bit.
  5. If you don't already have a valid R_HOME environment variable pointing to the installation directory of R, you need to create it. Be careful that this points to the actual used R directory, so that the architecture matches.
  6. Now you can configure RapidMiner. If you click on Next, you will be asked to enter the path to the jri library that has been installed with rJava. This is located in one of the .libpath directories noted above and more exact in <libpath directory>/rJava/jri/. The filename depends on your operating system and is jri.dll on Windows and libjri.so or jri.so on Unix.
    On R 2.12.x there are 32 and 64 bit versions of the libary present in the i386 or x64 subdirectories. Please make sure that the one in the top directory is of the correct architecture.
  7. If you have selected the file, RapidMiner will exit, since the new environment variables are still unknown and you need to restart the program manually. If you have started RapidMiner from console or another program, please restart the console or the program as well.

If the new R Perspective has been added after restart, everything runs just fine. If not this dialog will show up again, please check if the file is in the correct directory, is accessible by RapidMiner, if the path is set to the correct R version and if all environment variables have been adapted correctly.
If you don't manage to get it to run, please refer to our community forum. You can prevent this message from popping up every time by deactivating the R Extension in the Help / Manage Extension Menu.