http://hughesbennett.co.uk/RProject
©2012 Hughes Bennett Education
Hughes Bennett EducationFree live workbooks
Create workbook

Statistical analysis
R Project
Gretl

Numerical computing
Octave
Maxima

Data visualization
Graphviz
Asymptote
Ploticus
Processing

Discrete simulations
OMNeT++
INET Framework

Programming languages
ANSI C
Perl

Object oriented languages
Visual Basic .NET
Objective-C
C#
C++

Concurrent programming
Erlang

Circuit analysis
Gnucap

Services
Training courses
Software development
Sponsor this website



Recent changes
Search

Terms & conditions
Privacy policy
About us

Updated 2012-01-10 06:11:16
©2012 Hughes Bennett Education
Published using WikkaWiki

R Project


The R Project software provides a wide variety of statistical and graphical features, such as linear and nonlinear modelling, classical statistical tests and time-series analysis. It is similar to the S language often used for research in statistical methodology.

NEW! Create your own free R scripting workbook!

Click the button to run the script. Click the icon with arrows to toggle between the script and output panels. Click the red x icon to clear the active panel.
   
Reference documents
Average rating is  



Share your R script!





December 23, 2011

Descriptive statistics using R


Descriptive statistics are useful for making summary statements about values for a variable. These summary statements are often more convenient to share than the original data values. This article aims to describe calculations and scripts for using descriptive statistics with your own data. For example, if you're about to make a presentation, this article will help to create summary statements and graphs rather than displaying all data values.

The statistics calculations are intended for data with discrete values and a normal distribution. This approach keeps the article short and practical. Readers are advised to use a textbook for coverage of continuous values and specialized distributions.

R is free open source software for statistical computing and graphics, which is usually installed on a personal computer. In this article, we will use an online toolbox which has adapted the R software for use within a web browser. There is no need for software download or installation.

The examples in this article used data from the UK Premier League during the 2010-11 football season. Specifically, we will use the number of points earned by each football team. The values are displayed below in alphabetical order of the team, starting with 68 points for Arsenal.

68,48,39,43,39,46,71,54,49,58,80,71,46,46,47,62,47,33,42,40


>> Read more
October 30, 2011

Simple math expressions in R


R scripts evaluate each expression and displays the result, which is similar to the behaviour of calculators. Usually, an expression is typed on its own line. Multiple expressions can be combined on the same line by using a ; (semicolon) to separate each expression. Anything that appears after the # symbol is considered to be comments and not evaluated.

>> Read more
September 3, 2011

Getting help with R functions


R scripts use 3 convenient commands to display information about functions. The help() command displays all information about a function. The args() command displays a brief summary of the parameters to a function. The example() command displays examples of the function. Information may not be available for some functions and there may be some text that does not display properly in a web browser.

>> Read more
September 3, 2011

Types of research data


The specific attribute that is being measured or observed is known as a variable. For example, height, weight, annual revenue or annual expenses could be variables. There are 3 common scales for measuring values of a variable:
  • Cardinal
    This is the most common scale for measuring variables. Cardinal values may be ranked and compared in a meaningful way. Revenue and expenses are examples of cardinal data values because the amount of money can be ranked and the difference in values is meaningful in a financial or economic context.
  • Ordinal
    These values are similar to cardinal values in that they may be ranked. However, the important difference is that ordinal values may not be compared in a meaningful way. For example, let's take 5 random companies and list them in order of lowest to highest annual profit. Each company will be assigned a rank of 1 to 5. The order of the rank values are meaningful to the extent that the company at rank 1 has less annual profit than the company at rank 2. However, it is not possible to estimate or predict that the annual profit of the company at rank 2 is twice as high as the company at rank 1. The difference in annual profit between each rank is not constant or predictable.
  • Nominal
    These types of data have the most limitations for analysis. Nominal values are classified by categories and may be not be ranked or compared in a meaningful way. For example, values such as yes and no, heads and tails, male and female and Tory and Labour are nominal values. These values may not be ranked, e.g. yes does not necessarily come before or after no and male does not necessarily come before or after female. Additionally, these values may not be compared, e.g. heads are not necessarily higher or lower than tails and Tory is not necessarily higher or lower than Labour.

Numeric values may be described as continuous or discrete.
  • Continuous variables may take on any value within a range. For example, 20, -52 or 25.91 are all possible temperature values.
  • Discrete variables are limited to fixed values within a range. For example, the number of students in a room must be integer values. It does not make sense to have 25.91 students in a room!

Data values (sample) are collected from the larger group (population) containing all possible elements. Analysis of the sample is used to draw conclusions (inferences) about the larger population.

Lists of data variables can be represented in an R script by using the c command.

>> Read more
September 3, 2011

Correlation analysis using Pearson and Spearman coefficients


Correlation analysis can be used to examine whether two data variables change together in a consistent manner. This technique does not provide any information about whether one variable could be used to predict the other variable and it does not provide any indication about whether one variable is the cause of changes in the other variable.

For example, there is a correlation between the number of employees and the annual revenue of a company because both variables change together. However, the annual revenue is not determined or caused by the number of employees.

Linear correlation is measured by calculating the Pearson correlation coefficient. This coefficient is symbolized by r for a sample of data values and by the Greek letter ρ for a population. It is common practice to simply refer to this as the correlation coefficient.

The correlation coefficient varies between -1.00 and +1.00. An r value of 1 indicates a perfect positive linear correlation. This happens when the values of both variables increase together and their coordinates on a scatter plot form a straight line. An r value of -1 indicates a perfect negative linear correlation. This happens when the values of one variable increases while the other variable decreases and their coordinates on a scatter plot form a straight line. Values of r that are not zero show decreasing significance as they approach zero. The scatter plot of variables with r values not equal to 1 or -1 do not form a straight line.

>> Read more
September 3, 2011

Linear regression analysis


Regression analysis goes beyond correlation and attempts to identify the extent to which the independent variable determines (or predicts or forecasts) the dependent variable. Although this technique identifies a functional relationship between the independent and dependent variables, it does not require the independent variable to be the cause of changes in the dependent variable.

For example, it may be possible to use regression analysis to identify a linear relationship between a person's age and his or her annual income. In this example, age is the independent variable and annual income is the dependent variable. Age could be used to predict annual income, but annual income does not predict age! Furthermore, age is not the cause of changes with annual income.

Linear regression is represented as a function, with X as the independent variable and Y as the dependent variable:

Y = a + bX

In the regression function, b is the slope and a is the intercept value.

The regression function is calculated in an R script by using the lm command.

>> Read more
September 3, 2011

Generating random numbers for use in simulations


An important task during simulations is the construction of the random numbers to imitate changing input values. The discrete uniform distribution is useful for this purpose because it has the property that all events in a set N have the same probability of 1/N.

R scripts can generate random numbers from the uniform distribution by using the runif( n, [min = 0, max = 1 ] ) command, where,
  • n is the quantity of random numbers to generate
  • min is the lower limit of the distribution
  • max is the upper limit of the distribution

>> Read more
September 3, 2011

Scatter plot for 2 data variables


Scatter plots can be displayed in an R script by using the plot command.

>> Read more