R Project
The R Project software provides a wide variety of statistical and graphical features, such as linear and nonlinear modelling, classical statistical tests and time-series analysis. It is similar to the S language often used for research in statistical methodology.
Reference documents
- Introduction to R
- R language definition
- Frequently asked questions about R
- r-help mailing list archives: Dec 2011 (Nov, Oct, Sep, Aug, Jul, Jun, May, Apr, Mar, Feb, Jan)
Bar chart express
Modify the script with your data and run it to create your own custom bar chart image! Save and use the generated image in your document.
Pie chart express
Modify the script with your data and run it to create your own custom pie chart image! Save and use the generated image in your document.
Line chart express
Modify the script with your data and run it to create your own custom line chart image! Save and use the generated image in your document.
Scatter plot express
Modify the script with your data and run it to create your own custom scatter plot image! Save and use the generated image in your document.
Descriptive statistics using R
Descriptive statistics are useful for making summary statements about values for a variable. These summary statements are often more convenient to share than the original data values. This article aims to describe calculations and scripts for using descriptive statistics with your own data. For example, if you're about to make a presentation, this article will help to create summary statements and graphs rather than displaying all data values.
The statistics calculations are intended for data with discrete values and a normal distribution. This approach keeps the article short and practical. Readers are advised to use a textbook for coverage of continuous values and specialized distributions.
R is free open source software for statistical computing and graphics, which is usually installed on a personal computer. In this article, we will use an online toolbox which has adapted the R software for use within a web browser. There is no need for software download or installation.
The examples in this article used data from the UK Premier League during the 2010-11 football season. Specifically, we will use the number of points earned by each football team. The values are displayed below in alphabetical order of the team, starting with 68 points for Arsenal.
68,48,39,43,39,46,71,54,49,58,80,71,46,46,47,62,47,33,42,40
Central tendency
Mean, median and mode are 3 common measures of central tendency, which refers to the location (middle or center) that we typically find most of the values for a variable.Mean is the sum of all values divided by the count of values for a variable. This is the most commonly used measure of central tendency and is often referred to as the average value.
The text box below is used to write the R commands, which collectively is called a script. The script displayed below contains 2 commands.
The script is short because R has a pre-defined command to calculate the mean. The first line of the script defines the values for the variable. All values must be included between the pair of parentheses and each value must be separated by a comma. Re-use the script by replacing the sample data with your own data.
>> Read more
Types of research data
The specific attribute that is being measured or observed is known as a variable. For example, height, weight, annual revenue or annual expenses could be variables. There are 3 common scales for measuring values of a variable:
- Cardinal
This is the most common scale for measuring variables. Cardinal values may be ranked and compared in a meaningful way. Revenue and expenses are examples of cardinal data values because the amount of money can be ranked and the difference in values is meaningful in a financial or economic context. - Ordinal
These values are similar to cardinal values in that they may be ranked. However, the important difference is that ordinal values may not be compared in a meaningful way. For example, let's take 5 random companies and list them in order of lowest to highest annual profit. Each company will be assigned a rank of 1 to 5. The order of the rank values are meaningful to the extent that the company at rank 1 has less annual profit than the company at rank 2. However, it is not possible to estimate or predict that the annual profit of the company at rank 2 is twice as high as the company at rank 1. The difference in annual profit between each rank is not constant or predictable. - Nominal
These types of data have the most limitations for analysis. Nominal values are classified by categories and may be not be ranked or compared in a meaningful way. For example, values such as yes and no, heads and tails, male and female and Tory and Labour are nominal values. These values may not be ranked, e.g. yes does not necessarily come before or after no and male does not necessarily come before or after female. Additionally, these values may not be compared, e.g. heads are not necessarily higher or lower than tails and Tory is not necessarily higher or lower than Labour.
Numeric values may be described as continuous or discrete.
- Continuous variables may take on any value within a range. For example, 20, -52 or 25.91 are all possible temperature values.
- Discrete variables are limited to fixed values within a range. For example, the number of students in a room must be integer values. It does not make sense to have 25.91 students in a room!
Data values (sample) are collected from the larger group (population) containing all possible elements. Analysis of the sample is used to draw conclusions (inferences) about the larger population.
Lists of data variables can be represented in an R script by using the c command.
http://hughesbennett.co.uk/RProject