Introduction to “R” in Statistical Analysis and Data Science

Statistical Analysis and Data Analytics are getting more popular day by day. R programming language has gained a lot of popularity over the years because of it’s simple and easy to use approach after Python. R was created and developed by Ross Ihaka and Robert Gentleman. The name “R” was partly derived from the first letters of the authors’ names and also as a play on the name of the S programming language. R is very domain specific unlike Python.

Why is R so important?

R is taught all over the world in many universities and used in many companies for vital and important business operations. In various data science and statistics operations and applications, we have to deal with various types of data. R can be use to perform tasks, such as Data Cleaning, Feature Selection, Feature Engineering and so on. It is also easily connected with databases like Spark and Hadoop. R provides excellent features for data exploration and data investigation. Apart from that, R provides you with the ability to build aesthetic web-applications. Using the R Shiny package, you can develop interactive dashboards straight from the console of your R IDE.

Differences between R and Python:
Differences between R and Python:

The main distinction between the two languages is in their approach to data science. Both open source programming languages are supported by large communities, continuously extending their libraries and tools. But while R is mainly used for statistical analysis, Python provides a more general approach to data wrangling.

R provides various packages for the graphical interpretation of data. Python also has libraries for visualization, but it is a bit complex than R. R has a pretty-printed library which helps in building publication-quality graphs.

To use R, developers and analysts start with R Studio. In the case of Python, Anaconda is used. 

Well, in the end, whether to use R or Python is decided by your need and demand of the project you are going to work on. It will also depend on the problem you are trying to solve. 

Getting started with R is very simple. One needs to have basic math, statistics and programming knowledge.

Some Advantages of R:

R is platform independent. Basically, it can run without any issues on Windows, Mac or Linux. 

R has powerful tools for statistics. It has a consistent and incorporated set of tools which can be used to do various tasks. The notation of vectors in R programming is a very powerful feature.

As I said earlier, R is open source, we don’t need to pay money or buy a license to use R. Anyone can use R, without any limitations.

The R community is constantly growing. Many new packages are getting created in R.

Let’s understand some basic concepts of R language

Before proceeding with this section, you should have a basic understanding of coding. A basic understanding of any of the programming languages will help you in understanding the R programming concepts.

R Language data types

1 . Data Types:

In all programming languages, we store data in various variables. All these variables have their data types. Some space is stored in the memory for storing the data. Let us have a look at the various data types in R.

Logical Data Type

We all know that logical data type is basically either true or false. Let us implement it using code.

var_1<- FALSE  

cat(var_1,"\n")  

cat("The data type is: ",class(var_1),"\n\n")
R Language data types
Output

Numeric Data Type

The float/ decimal value in R is known as Numeric Data type in R. It is taken as the default computational type for data.

var_2<- 234.56  

cat(var_2,"\n")  

cat("The data type is: ",class(var_2),"\n\n")
data types
output

Integer Data Type

Non decimal or floating point numbers are stored as integers. The only difference between the implementation of the numeric and integer data type is the “L” which indicates R to store it as an integer. Integer data type is available in all programming languages and same is the case for R.

var_3<- 45L

cat(var_3,"\n")  

cat("The data type is: ",class(var_3),"\n\n")
 data types
output

Complex Data Type

Complex data types are also available in R. Implementation is very easy and simple. Let us have a look.

var_4<- 34+ 3i

cat(var_4,"\n") 
 
cat("The data type is: ",class(var_4),"\n\n")
R in Data Science
output

Character Data Type

It is used to store strings and characters in R. Use and implementation is very simple and easy. 

var_5<- "R Programming"

cat(var_5,"\n")  

cat("The data type is: ",class(var_5),"\n\n")
data types
Output

2 . Variables:

A variable is nothing but a memory location, which is used to store values in a program. Variables in R language can be used to store numbers (real and complex), words, matrices, and even tables.

# Variable example using equal operator.  
variable.1 = 6
  
# Variable example using leftward operator.  
variable.2 <- "Capable Machine"     
  
#  Variable example rightward operator.     
13L -> variable.3             
  
print(variable.1)  
cat ("variable.1 is ", variable.1 ,"\n")  
cat ("variable.2 is ", variable.2 ,"\n")  
cat ("variable.3 is ", variable.3 ,"\n")  
Variables:
Output
  • Decision making:

Decision making is most familiar concept in coding i.e. If-else statement.
The decision making statement executes a block of code if a specified condition is true (+). If the condition is false (-), another block of code is executed.

Scala - IF ELSE Statements
Image from google
# Create vector quantity
quantity <-  10000

# Set the is-else statement
if (quantity > 7500) {
    print('Popular Blog on CapableMahine')

} 

else {
    print('Not Popular')  
}
R in Data Science
Output

3 . Loops:

A loop statement allows us to execute a statement or group of statements multiple times. There are three loops in R programming languages.

For Loop


This Loop is used for repeating a specific section of code a known number of times. 

for (initialization_Statement; test_Expression; 

update_Statement)  
{  
    // statements inside the body of the loop   
}  

Repeat Loop

This is used to iterate a section of code as other loops. But, It is a special kind of loop where there is no condition to exit from the loop. For exiting, we include a break statement with a user-defined condition.

repeat {  
 
   commands   
   if(condition) {  
      break  
   }  

}  

While Loop


This Loop is used to repeat a specific section of code an unknown number of times, until a condition is met.

while (test_expression)

{  
   statement  
}  

4 . Function

A function is a section of code that performs a specific task. Function can be called and reused multiple times in the code. You can pass some information to a function and it can send that information back. R programming languages have built-in functions that you can access, but you can create your own functions too.

“An R function is created by using the keyword function.” There is the following syntax of R function:

func_name <- function(arg_1, arg_2, ...) 

{  
   Function body   
}  

Function Components

The different parts of a function are −

  • Function Name − Name of the function stored in R environment as an object with name.
  • Arguments − An argument is referred to the values that are passed within a function when the function is called.
  • Function Body − The function body contains a logic part that defines what the function does.
  • Return Value − The return value of a function is the last expression in the function body to be evaluated.

This was the brief overview of R programming language. If you want to learn R in detail then I would suggest to take tutorials from YouTube or other online platforms.

Conclusion –

R community is changing and seeing that it’s been a part of the rapid expansion of the data science field. Within the next several years we may expect many new machine learning start-ups to be created which will aim at robust connectivity with R and other open-source analytical and Big Data tools. This is an exciting area of research and hopefully, the coming years will shape and strengthen the position of the R language in this field. For more information on R, please visit: https://www.r-project.org/

Reference –
For more articles do visit to my profile on Analytics Vidhya –
<strong>Prateek Majumder</strong>
Prateek Majumder

Learner | Engineering Student

R in Data Science R in Data Science R in Data Science

Leave a Reply

Capable Machine