Thursday, January 7, 2010

R 2.10.1 Statistical Suite

I have been busy teaching myself R scripts for the last 24 hours. R is a suite of statistical software that offers almost limitless capabilities. Instead of a limited point and click interface like excel it uses a scripting language known as "R". Excel is fine for its intended purpose but it is limited in its capacity for automation, presentation and repeatability. I therefore decided to take the plung and learn R and have been pleasantly surprised how easy it is to learn enough in a day to do something useful. I was prompted to do this when I realised the presentation shortcomings of excel were preventing me from developing the sort of graphs I wanted to display on this blog to review some of the data. I also wanted to be able to use a tool that would automatically download new data as it was released on FTP sites and graph it without having to manually do this each time. After reading a few scripts I decided to download R and teach myself and after a day I feel I have made good progress.

Once I get the basics sorted out to my satisfaction I will write a PERL script to automatically call my R scripts on a regular basis so the computer can do all the work and produce the graphs once I have developed the scripts - "look mum no hands!"

Exciting stuff.

You can follow me along as I get this little project up and running if you are interested in R or merely look at the graphs and stats.



The first graph I have produced is the Australian Mean Temperature Anomaly from the adjusted data available on the BOM server for the period 1 January 1910 to 31 December 2009. Unfortunately there is no raw data available so it is not much use by itself for investigating however as there are other data sets available it will be useful for comparative purposes. We will be drilling down to individual station data sets and seeing what we find concerning local conditions. I will also be posting a detailed series of Satellite data which is the only data I trust for reasons I will discuss as we go on over a period of time.

The chart I developed is a step graph of the average yearly anomaly based on BOM adjusted temperature data. I have applied a LOESS smoothing function which is the blue line you can see. I applied a LOESS smoothing parameter of q = 0.10 as it retained just enough of the randomness to prove of interest.

We will reserve any discussion of the graph for another post as I beg your indulgence to let me enjoy a self satisfied glow at having achieved some small mastery over R scripts today. The actual script is posted under the Read More link if you want to have a good laugh at my first script.


This is the script I knocked up tonight. There are many improvements to be made but as a quick hack I am pleased to say it works and produced the graph above. I also want to use the Cairo package to get the graphic output antialiased and prettied up.

Update: Well it looks like my blog trashes the formatting on my script so I will put an image of it up tomorrow instead to replace the mess below.

#################################################################
#################################################################
##                                                             ##
## Step chart of NASA GISS annual temperature anomaly (C) data ##
##                                                             ##
#################################################################
#################################################################

###############################
#  Step 1: SETUP SOURCE FILE  #
###############################

rm(list=ls()) 
link <- "C:\\Rdata\\Bom_Yrly_Anom_1910_2009.csv"
   script<- " C:/Rdata/Australian_Data_Script_V3.r"

#############################################################
#  Step 2: READ DATA FROM CSV FILE WITH FIRST ROW HEADINGS  #
#############################################################

   my_Data <- read.table(link,
  sep = ",", dec=".", skip=0,
  row.names = NULL, header = T,
  colClasses = rep("numeric", 2),
  na.strings = c("", "*", "-", -99.99, 99.9, 999.9))

#############################
#  STEP 3: MANIPULATE DATA  #
#############################

  # Construct chart title - nothing to manipulate this time
  Title <- paste("Annual Mean Surface Air Temperature Anomaly - Australia (1910 - 2009) \nBOM - High Quality Annual T Network (adjusted data)" )

#########################
#  STEP 4: CREATE PLOT  #
#########################

## GRAPHIC PARAMETERS
   win.graph(width = 10.5, height = 7, pointsize = 12) ### Sets size of graphics window ###
   par(bg = "white"); # sets background colour, default = transparent
   par(las = 1)       # Sets Y axis label orientation to vertical, set text font sizes
   par(cex.main=0.8); par(cex.sub=0.7); par(cex.lab = 0.8); par(cex.axis =.75);
   
## HIGH LEVEL GRAPHIC FUNCTION CALL
#usr <- par("usr") ,
#rect(usr[1], usr[3], usr[2], usr[4], col="cornsilk", border="black"), 
   
   plot(Anomaly ~ Year, my_Data,
         ylim = c(-1.25,1.25),
         type=c("s"),
         col = "dark grey",
         lwd = 2, #line width
         xlab = "Year",
         asp = "FULL",
         bty ="o",
         ylab = expression(paste("Annual Temperature Anomaly - ",degree, "C (Baseline: 1961-1990)")),
         main = Title,
         sub = "Source: BOM - @ http://reg.bom.gov.au/climate/change/hqsites/data/temp/meanT.040842.annual.anom.txt")

## LOW LEVEL GRAPHIC FUNCTION CALL
   lines(lowess(my_Data$Anomaly ~ my_Data$Year, f=0.1),col = "blue", lwd = 3)
   arrows(1961,-1.0, 1990, -1.0, code = 3, angle = 20, col = "dark grey")
   abline(h=0, col = "grey")          # o.o horizontal line
   text(1965, -.990, "Baseline\n Period\n1961-1990", cex = 0.6, pos=3)
   text(1931, -1.20, "LOESS smoothing, q = 0.10", cex =0.65, pos =1)
   points(c(1910,1921), c(-1.3,-1.3), col="blue",  type = "l", lwd = 2)
   text(2005, -1.2, "The Dog Ate My Data", cex=0.65, pos=1)
## Add script name &  print date to chart
   my_date <- format(Sys.time(),"%d/%m/%y ")
   mtext(script, side=1, line = -1, outer = T, adj = 0, cex = 0.65)
   mtext(my_date, side = 1, line = -1, outer = T, adj = 1, cex = 0.7)

###################
#  STEP 5: CLOSE  #
###################

#  todo

5 comments:

sierra said...

Put a PRE & /PRE tag around the whole region, and it will format as monospace with preserved linebreaks

Charles said...

Thanks Sierra. I will repost the code with the proper tags tonight. Much appreciated.

sierra said...

You also need to replace the angle brackets within the code with &lt; or &gt; entities, or they will be misinterpreted as HTML. Usually better in these cases to link to a separate file containing the raw text, in which case you don't need to encode characters for presentation within an HTML page.

Charles said...

Yep I see what you mean. The code seems to be displayed now however due to the line lengths I used and the format of the blog page it is still difficult to read. I'll have to set up files to link to as I have some better scripts just about ready to go when I get a chance to finalise them. More to follow shortly on the blog.

sierra said...

To prevent long lines of code from wrapping within the relatively narrow column, try adding this to your PRE tag:

<pre style="overflow:scroll">

Post a Comment