Monday, 24 September 2007

Checking the premiership results

I decided to load the data from CSV files so that I could do some intermediate checks on the data before loading it.

I only did some rudimentary checks on each year separately and then all of the Premiership results together. The checks included:
  • Counting the total number of matches in the file - should equal (n*n)-n where n is the number of teams in the league - 20 post-1996, 22 prior to that
  • Counting the number of home and away games for each team and checking that they were equal
  • Depending on the year checking that they had played a multiple of 42 (pre 1996) or 38 games (1996 onwards)
  • Looking at the list of names and making sure that there were not synonyms - if so going back to my table of cleansing transformations

