R Packages for Data Science – 2016 December

When writing R scripts at very early stage, I was quite struggle on which R packages I should or had better to use so that my codes would be efficient and effective. After a couple of years of practicing and searching, I found some. In this post, I am going to share those that I think they are awesome.

The first one is from Analytics Vidhya, a simple picture:

infographics123

Another list of recommended packages is from Garrett Grolemund for Quick list of useful R packages.

To load data

RODBC, RMySQL, RPostgresSQL, RSQLite – If you’d like to read in data from a database, these packages are a good place to start. Choose the package that fits your type of database.

XLConnect, xlsx – These packages help you read and write Micorsoft Excel files from R. You can also just export your spreadsheets from Excel as .csv’s.

foreign – Want to read a SAS data set into R? Or an SPSS data set? Foreign provides functions that help you load data files from other programs into R.

R can handle plain text files – no package required. Just use the functions read.csv, read.table, and read.fwf. If you have even more exotic data, consult the CRAN guide to data import and export.

To manipulate data

dplyr – Essential shortcuts for subsetting, summarizing, rearranging, and joining together data sets. dplyr is our go to package for fast data manipulation.

tidyr – Tools for changing the layout of your data sets. Use the gather and spread functions to convert your data into the tidy format, the layout R likes best.

stringr – Easy to learn tools for regular expressions and character strings.

lubridate – Tools that make working with dates and times easier.

To visualize data

ggplot2 – R’s famous package for making beautiful graphics. ggplot2 lets you use the grammar of graphics to build layered, customizable plots.

ggvis – Interactive, web based graphics built with the grammar of graphics.

rgl – Interactive 3D visualizations with R

htmlwidgets – A fast way to build interactive (javascript based) visualizations with R. Packages that implement htmlwidgets include:

 

googleVis – Let’s you use Google Chart tools to visualize data in R. Google Chart tools used to be called Gapminder, the graphing software Hans Rosling made famous in hie TED talk.

To model data

car – car’s Anova function is popular for making type II and type III Anova tables.

mgcv – Generalized Additive Models

lme4/nlme – Linear and Non-linear mixed effects models

randomForest – Random forest methods from machine learning

multcomp – Tools for multiple comparison testing

vcd – Visualization tools and tests for categorical data

glmnet – Lasso and elastic-net regression methods with cross validation

survival – Tools for survival analysis

caret – Tools for training regression and classification models

To report results

shiny – Easily make interactive, web apps with R. A perfect way to explore data and share findings with non-programmers.

R Markdown – The perfect workflow for reproducible reporting. Write R code in your markdown reports. When you run render, R Markdown will replace the code with its results and then export your report as an HTML, pdf, or MS Word document, or a HTML or pdf slideshow. The result? Automated reporting. R Markdown is integrated straight into RStudio.

xtable – The xtable function takes an R object (like a data frame) and returns the latex or HTML code you need to paste a pretty version of the object into your documents. Copy and paste, or pair up with R Markdown.

For Spatial data

sp, maptools – Tools for loading and using spatial data including shapefiles.

maps – Easy to use map polygons for plots.

ggmap – Download street maps straight from Google maps and use them as a background in your ggplots.

For Time Series and Financial data

zoo – Provides the most popular format for saving time series objects in R.

xts – Very flexible tools for manipulating time series data sets.

quantmod – Tools for downloading financial data, plotting common charts, and doing technical analysis.

To write high performance R code

Rcpp – Write R functions that call C++ code for lightning fast speed.

data.table – An alternative way to organize data sets for very, very fast operations. Useful for big data.

parallel – Use parallel processing in R to speed up your code or to crunch large data sets.

To work with the web

XML – Read and create XML documents with R

jsonlite – Read and create JSON data tables with R

httr – A set of useful tools for working with http connections

To write your own R packages

devtools – An essential suite of tools for turning your code into an R package.

testthat – testthat provides an easy way to write unit tests for your code projects.

roxygen2 – A quick way to document your R packages. roxygen2 turns inline code comments into documentation pages and builds a package namespace.

And my list is:

Data Type Processing

  • dplyr  – Next generation tools for working with data frames
  • data.table  – R’s data.table package extends data.frame
  • jsonlite – A Robust, High Performance JSON Parser and Generator for R
  • stringr  – A fresh approach to string manipulation in R
  • fuzzyjoin – Join tables together on inexact matching

Database Connection

  • RODBC – A ODBC database interface
  • DBI – A database interface (DBI) for communication between R and RDBMSs
  • RMySQL – R interface to MySQL and MariaDB
  • ROracle – R interface to Oracle
  • RPostgreSQL – R interface to PostgreSQL
  • rmongodb – R driver for MongoDB
  • rredis – R driver for Redis
  • RCassandra – R direct interface (not java) to Apache Cassanda
  • RHive – An R extension facilitating distributed computing via Apache Hive
  • RNeo4j – Neo4j Driver for R

Visualization

  • ggplot2  – An implementation of the Grammar of Graphics in R
  • ggalt – Extra Coordinate, Geoms, Statistical Transformations & Scales for ‘ggplot2’
  • ggtree – Visualization and annotation of phylogenetic trees
  • ggplot2 Extensions – ggplot2  official extension mechanism
  • lattice – The lattice add-on package is an implementation of Trellis graphics for R
  • extrafont – Tools for using fonts in R graphics
  • showtext – Using Fonts More Easily in R Graphs
  • gganimate – Create easy animations with ggplot2
  • misc3d – A collection of miscellaneous 3d plots, including isosurfaces

Report & Export from R

  • rapport – An R package that facilitates creation of reproducible report templates
  • rmarkdown  – Supports dozens of static and dynamic output formats
  • slidify – Generate reproducible html5 slides from R markdown
  • ReporteRs – An R package for creating Microsoft Word and PowerPoint documents
  • bookdown – Tools to write HTML, PDF, ePub, and Kindle books

Web Processing Tools

  • shiny  – Easy interactive web applications with R
  • RCurl – General network (http, FTP, etc.) client interace
  • XML  – Reading and creating XML (and HTML) documents (including DTDs)
  • Rfacebook – Facebook API for R

Parallel & Distribution Computing

Machine Learning

  • AnomalyDetection  – Twitter’s Anomaly Detection
  • ahaz – Association Rule Mining
  • bigrf – Big Random Forest Learning
  • C50 – Decision Tree Learning
  • caret  – Classification and Regression Learning
  • e1071 – Misc Functions of the Department of Statistics, Probability Theory Group
  • forecast – Forecast by using ARIMA, ETS, STLM, TBATS, and Neural Network
  • h2o  – Deeplearning, Random forests, GBM, KMeans, PCA, GLM
  • LogicReg – Logic Regression Learning
  • maptree – Mapping, pruning, and graphing tree models
  • mboost – Model-Based Boosting
  • randomForest – Breiman and Cutler’s random forests
  • rattle – Graphical User Interface for Data Mining in R
  • rpart – Recursive Partitioning and Regression Trees
  • RSNNS – Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
  • svmpath – svmpath: the SVM Path algorithm
  • varSelRF – Use random forest to select features
  • xgboost  – eXtreme Gradient Boosting Tree model, good speed and performance

Natural Language Processing

  • text2vec – Fast Text Mining Framework for Vectorization and Word Embeddings
  • openNLP – Apache OpenNLP interface
  • NLP – Basic classes and methods for Natural Language Processing
  • topicmodels – Topic modeling interface to the C code developed by by David M
  • SnowballC – Snowball stemmers based on the C libstemmer UTF-8 library
  • quanteda – An R package for the Quantitative Analysis of Textual Data

Logging

  • log4r – R interface to log4j
  • logging – Implements the ubiquitous log4j package

R Development Tools

  • Package Development List – Instructions of R Package Development
  • devtools  – Tools to make an R developer’s life easier
  • testthat  – Test R codes
  • roxygen  – Process source codes and comments to produce Rd files
  • packrat – Dependency management system for R.
  • installr – Functions for installing software from within R
  • import – An Import Mechanism For R
  • modules – Replacing packages: An alternative module system for R
  • Rocker  – R configurations for Docker
  • drat – Creation and use of R repositories on R
  • lintr – Static Code Analysis for R
  • staticdocs – Create R’s HTML docs

R in HTML & Javascript

  • d3heatmap – A D3.js-based heatmap htmlwidget for R
  • DataTables – An R interface to the DataTables library
  • scatterD3 – D3 scatter map
  • plotly  – Create interactive web graphics via plotly’s JavaScript graphing library

Interface with Other Programming Language

  • Rcpp – Seamless R and C++ Integration
  • rJava – Interface to Java
  • jvmr – Interface to Java and Scala
  • rJython – Interface to Python/Jython
  • rPython – Call Python in R
  • runr – Call Julia and Bash in R
  • RJulia – Call Julia in R
  • RinRuby – A library for Ruby
  • R.matlab – Read and write matlab file
  • RcppOctave -Interface to Octave and Matlab
  • RSPerl – Interface to Perl
  • V8 – Embedded JavaScript Engine for R
  • htmlwidgets – Best Javascript visualization in R
  • rpy2 – Python Interface to R

Finally, a comprehensive list for R packages are here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s