Friday, April 5, 2013

An Annotated R History Log - Part 3

data(package="ggplot2") - show the datasets in the package ggplot2.

require(ggplot2) - import the package into the workspace. Then you can get at the datasets like this:

diamonds - print the diamonds dataset out.

list.files() - list files in the working directory.

getwd() - what is the working directory.

setwd(val) set the working directory.

read.csv("unemp.csv") - read the comma separated file unemp.csv from the workspace (would be better off assigning this to a variable with <-

unemp[x] - show the x entry of dataset unemp. 1 indexed.

t(unemp) - transpose the unemp dataset.

sub() - perform text substitution using regular expressions. First match only. gsub for global.

plot(ts(unemp), ylab="Live Register", xlab="Date", main="Irish Unemployment Numbers", type="o")
plot a time series plot of the unemp dataset. x and y axis labels are given along with a title. The chart type is o for overwrite (symbols with lines overwritten)

summary(unemp) - provide summary information on the unemp dataset. min, max median etc.



Thursday, April 4, 2013

An Annotated R History Log - Part 2


quit() - end the current R session. q() works too.

x<-c(1,2,4,5) - Create a vector containing the numbers given.

mean(x) - calculate the mean of those numbers

barplot(x) - chart the values of x. Starting to get interesting.

abline(h=mean(x)) - add an a-b line to the previous plot. The line is horizontal with a height of mean(x). Useful for adding 'targets' to a chart.

help(package=ggplot2) - an extended use of the help function. Profice a value to the package parameter.

data() - list the data sources available in the current session.

AirPassengers - One of the included datasets. Entering it like this just prints it.

plot(AirPassengers) - produce a plot of the number of air passengers over time. This is what it looks like. I am a long way from being able to quantify that seasonal variation, but it is an interesting thing to look at when I know how.
AirPassengers plot

women - dataset of womens height vs weight.

plot(women$height, women$weight) - plots a scatter of womens weight vs weight, but unnecessary as the data is in columns. plot(women) would do the same job.

mdeaths - this one is the monthly deaths from lung disease in the UK.

Note that you can use tab to autocomplete a lot of stuff in R. For example datasets. There are a large number of them and tab allows you to pick out the ones you need nice and quick.

Note: Typing help(dataset) gives you information on the dataset, including sources.

uspop - population of the United States.

There are a lot of datasets that some with R. They are in nice formats unlike most of the stuff I have tried to play with so far. Hopefully with increased skill I will be able to press the data into the format I need to plot and manipulate it.

Wednesday, April 3, 2013

An Annotated R History Log

As part of my learning of the R statistical programming language I am reviewing the contents of my history() log at the end of each days session. Here is a dump of that with an explanation of what I was trying to do:


help(solve) - get help on the solve function. Something to do with solving systems of equations. Not what I was looking for.

help() - help on 'help'. Not that useful.

help.start() - starts the help server and shows you the home page. Has some handy reference links, but just using Google whenever I get stuck seems to be doing the trick.

help(mean) - what is a mean.

example(mean) - this is handy. Gives you just the example section of the help. Without starting the http help server which can be a bit annoying.

 x <- c(0:10, 50) - create a vector containing the numbers 0 to 10 and 50. c stands for combine.

x - this just prints the values of x. You can use print(x), but I am not sure why you would.

mean(x) <- calc the mean of a vector x.

help(sink) - this function sends r data to a destination. A file for example.

ls()  - show all the objects in the current session.

rm(x) - remove a named object from the current session.

history() - show this history file. Be careful to use this version: history(max.show=Inf) if you want to see more than 25 lines.


This is all a bit basic at the moment, but hopefully with time it will get a lot more useful to people who want to try to learn R too.

Goal 3 - Get to 11 Stone

This is the third of 4 goals that I am aiming for over the next 3 months. In the first quarter of this year I set myself a goal of getting to 11.5 stone from about 12. I made that, but only just. I researched the best value to aim for. I reckoned that 'I look thin enough', or 'I look fat' are not precise measurements. I want to find out what I should weigh and then aim for that in the long term. 
I tried this http://www.bmicalculator.ie/ first. It said I was fine at 11.5 stone for my height, but above the mid point on the ideal BMI band. This seemed to suggest that I could be lighter and better off. Then I tried this one: http://nhlbisupport.com/bmi/bmi-m.htm

I got a number out of this 23.6 as my current BMI. The range for my height is 18.5–24.9. The mid point of this range is 21.7. I then entered values into the weight box progressively until I got to 21.7. This gives me an ideal BMI (assuming the midpoint is ideal) of 10.5 stone. Quite a bit of weight to lose. 

All of this ignores things like body composition - 11.5 stone of fat is worse than... ,activity levels and other important stuff. That said, being lighter is generally considered healthier. Rats live longer(!) This is some information about visceral fat. Dangerous stuff. Better to be safe than sorry I reckon. 
Hence Goal 3. First week of the quarter is over now and I think I have done quite well. I did not miss any workout days (there are 5 per week). Will see if I can get to 11 stone which is stage 2 of my weight loss plan. I am doing quite a few weight training sessions at the moment - will be interested to see if this makes is easier or harder to get 'lighter'.

Monday, April 1, 2013

Another useful resource for learning R

This is a pretty good presentation from UCLA for people who need to use R. Have only started it, but it is not moving too slow, or too fast. I reckon a few more tutorials like this and I will be able to manipulate real basic real world data. There seems to be a lot of data cropping up in public these days, so hopefully I can take an objective look at some of it.

Sunday, March 31, 2013

Adding Wolfram Alpha Widgets to Your Blog

I watched this video from Stephen Wolfram. It is a bit of a promotional piece, but if I had done as much as this guy I reckon every time I opened my mouth it would sound a bit promotional.
There are some interesting widgets in Wolfram and you can drop them into your blog, web site etc. They are just bits of html that load from wolframs servers and allow users to input some variables. I have put the 'tell me about my city' one at the bottom of all the blog posts here. It is a full width widget, so had to put it there so it wouldn't obscure the everything I am trying to say. The gallery of available widgets is here and you can make your own if the long list does not suit you.

Thursday, March 28, 2013

Working with CSO data in R

The Central Statistics Office produces a wide range of data about the Irish people. It can be filtered and extracted in a number of formats including CSV. This makes it ideal for importing into R.
For no specific reason I took the unemployment data for Ireland from 2002 to the present. This is what a financial crash looks like:
I am an R newbie, so the x-axis is a bit messed up, but you get the idea. I will fiddle with this a bit more in the coming days to see if I can find out anything interesting.

Wednesday, March 27, 2013

Some Statistics Learning Resources

As part of my stats learning I decided to go over my old college text book. Bit out of date at this stage. It refers to pencil and paper and even log graph paper. That said it is well written and was on my shelf. It only took a couple of hours to go over it and refresh my mind. Turns out I understood it better then I thought at the time :) I didn't spend too much time on it as I reckon the online stuff is going to be better.

I started yesterday with the manuals for R. They are pretty good, but not the most exciting way to get to grips with the system. I tried out a course from Code School this morning and it is excellent. It's called 'Try R' and is sponsored by O'Reilly. The course is here. It's free and well worth the time so far.

O'Reilly gives you a bunch of resources related to big data - which I suppose is the reason why I am doing this. The O'Reilly Try-R page is here. I have started reading Big Data Now on my Kindle. Its a free e-book.

Hopefully I will get to the point of being able to import real data into R soon. There are plenty of open data initiatives around (this for example), so there is bound to be something interesting in there.

Tuesday, March 26, 2013

Goal 2 - 4 Goals

Bit of a mouthful that title. I am writing an Android app to track progress against the 4 goals. I started doing a bit of this a few weeks ago, so there is something there at this stage. It is however very basic. Not worth putting out into the wild just yet. Once it is even a bit useful I will release it and continue to evolve it at least over the next 3 months.
I have always been a bit of a goal tracker type person, so this is right up my street.

Goal 1 - Learn Statistics

I studied maths in college about 20 years ago and statistics was in there. It didn't interest me at the time and I never really got a handle on it. Stats seems to be everywhere now, so I am making the effort to get on top of it. I have started by downloading the R-Project software.
I have given this a bash before, but I think the discipline of the 3 month timeline in the 4 Goals books will give me the push to really get to grips with this.
Now that I have the software installed I am going to make my way through the manuals. At least as far as I think I need to. Drip, drip, drip as Seth says.
This might seem a bit like putting the cart before the horse. Learn the complex software package before the basics, but I am happy that I know the basics and don't need to bore myself unnecessarily. I'll dig into some stats sites once I know how to use the tools.

4 Goals with Zig Ziglar and Seth Godin

I have just finished my first 3 months using the Seth Godin/Zig Ziglar goal setting system. I have been doing stuff roughly like this for years, but this has been very successful. My 4 goals for the first quarter of this year were

  • Lose half a stone of weight
  • Read a non fiction book a week
  • Get 150 customers (more on that later)
  • Get proficient at JavaScript
I managed to meet the weight reduction goal. This was probably the one I most wanted. It was not easy, but the system definitely kept me on track.
I read just over 8 books in the quarter. A bit short of the 12 I had wanted, but I got a huge amount out of them. I have removed this as a goal for Q2, but am going to keep doing this anyway. 
The 150 customers did not happen at all. I am developing a product which I had intended to be on sale in the middle of the quarter. However the development has taken longer than I expected. I am just finishing off the details at the moment, so no 150 customers for Q1. I am carrying this goal over though. Hopefully this time I can make it happen. 
About 2 months in I stopped the JavaScript goal as I was happy with the level I had achieved. I switched to Android development. I am happy that I got some benefit from this goal/s, but I need to be more specific about what I want from these learning type of goals in future. 
I will go through my new Q2 goals tomorrow.