Securely store API keys in R scripts with the “secret” package

When we use an API key to access a secure service, through R, or when we need to authenticate in order to access a protected database, we need to store this sensitive information in our R code somewhere. This typical practice is to include those keys as strings in the R code itself — but as you guessed it, it’s not secure. By doing that, we are also storing our private keys and passwords in plain-text on our hard drive somewhere. And as most of us use Github to collaborate on our code, we will also end up, unknowingly, including those keys in a public repo.

Now there is a solution to this – its the “secret” package developed by Gábor Csárdi and Andrie de Vries for R. This package integrates with OpenSSH, providing R functions that allow us to create a vault to keys on our local hard drive, and also define trusted users who can access those keys, and then include encrypted keys in R scripts or packages that can only be decrypted by the person who wrote the code, or by people he/she trusts.

Here is the presentation by Andrie de Vries at useR!2017, where they demoed this package, and here is the package itself.


RStudio v1.0 is out

RStudio v1.0 is out

RStudio has finally moved out of “beta” status last week, and the first official production version is now available. This is great news for all of us who use RStudio as the primary IDE for R programming. 

Check out this link for the release history of RStudio and all changes that’s been it has gone through over the last 6 years.

Some of the major new functionality added in this release are:

  • Support for R Notebooks, a new interactive document format combining R code and output. It’s similar to (but not based on) Jupyter Noteooks, in that an R Notebook includes chunks of R code that can be processed independently (as opposed to R Markdown documents that are processed all at once in batch mode.)
  • GUI support for the sparklyr package, with menus and dialogs for connecting to a Spark cluster, and for browsing and previewing the available Spark Dataframe objects.
  • Profiling tools for measuring which parts of your R code are consuming the most processing time, based on the profvis package.
  • Dialogs to import data from file formats including Excel, SAS and SPSS, based on the readr, readxl and haven packages.

Checkout the official blog for more information about this release. 

An R based analysis of Cubs and Indians performance 

An R based analysis of Cubs and Indians performance 

Here is a great use of the Lahman package in R, to analyse the historical performance of the two teams Chicago Cubs and Cleveland Indians.

This comes at the right time after the nail biting game yesterday.

In recognition of the event, and the fact that simple data analysis is all I can muster today, I thought I’d use the excellent Lahman package, which provides a trove of baseball statistics for R, to have a look at the historical performance of the two teams. 

Two Centuries of Population, Animated, using R

An interesting visualisation – history of a growing United States – mapping built using R. 

The animated map above shows population density by decade, going back to 1790 and up to recent estimates for 2015. The time in between each time period represents a smoothed transition. This is approximate, but it gives a better idea of how the distribution of population changed.

The Data used for this mapping is from the Census Bureau amd made better accessible by NHGIS.

R moves up to 5th place in IEEE language rankings

R moves up to 5th place in IEEE language rankings

IEEE has published its annual Top Computer programming languages rankings report. It starts with the line “C is No. 1, but big data is still the big winner”, indicating the rise of R, the defacto programming language used in Big Data analytics, including Cyber Security domain. 

I think this is an extraordinary result for a language which is domain-specific (big data and data science). If you compare R to the other four languages, which are general purpose languages (C, Java, Python amd C++) in Top 5, it’s a great feat, and is a clear indication of the adoption and heavy use and relevance of R in today’s Information Age where every device, system, or a “thing” (IoT) generates some form of data (logs). This also reflects the critical important of Data Science (where R is the defacto programming language used by Data Scientists), as a descipline today. 

Some interesting lines from the report:

Another language that has continued to move up the rankings since 2014 is R, now in fifth place. R has been lifted in our rankings by racking up more questions on Stack Overflow—about 46 percent more since 2014. But even more important to R’s rise is that it is increasingly mentioned in scholarly research papers. The Spectrumd efault ranking is heavily weighted toward data from IEEE Xplore, which indexes millions of scholarly articles, standards, and books in the IEEE database. In our 2015 ranking there were a mere 39 papers talking about the language, whereas this year we logged 244 papers.

R’s steady growth in this and numerous other surveys and rankings over time reflects the growing importance of Data Science applied using R. And application of Data Science concepts in Cyber security, especially in detecting  cyber attacks, is only becoming more and more relevant. 
Using conventional security monitoring tools which use rule based detection engines (yes they are called SIEM!), to detect cyber attacks, is not working anymore. Let’s face it; SIEM has come off age. Using Machine learning approach to detect cyber attacks, has become one of the most important developments in the cyber security domain in the last 10 years. And its relevance in today’s world, where there is surplus amounts of data (also called “Big Data”) being churned out by all forms of computer systems, is at its peak. And R is playing a very important role in helping Security Data Scientists build “algorithmic models” that can detect better cyber attacks 

So I am very excited and happy to see R’s popularity and adaption growing year on year. 

This is a core area of study I am currently focusing on, and I will be writing more about this here on my blog, in the coming months. 

Picture Courtesy: