Monday, 24 September 2007

Correcting team names

Due to the number of different authors on the RSSSF site we also get a number of different spellings and punctuation associated with team names. Therefore as part of the extraction process I clean the team names so that they are all consistent across years using this following Perl associative array.



my %clean =
( "A Villa" => "Aston Villa", # eng99 page
"Blackburn" => "Blackburn Rovers", # eng02 page
"Blackburn R" => "Blackburn Rovers", # eng99 page
"Blackburn Rov" => "Blackburn Rovers", # eng98 page
"Bolton" => "Bolton Wanderers", # eng02 page
"Bradford" => "Bradford City", # eng01 page
"Charlton" => "Charlton Athletic", # eng01,eng02 page
"Charlton A" => "Charlton Athletic", # eng99 page
"Coventry" => "Coventry City", # eng00 eng01 page
"Coventry C" => "Coventry City", # eng00 eng01 page
"Derby" => "Derby County", # eng00 eng01 eng02 page
"Derby C" => "Derby County", # eng99 page
"Derby Co" => "Derby County", # eng99 page
"Derby Co." => "Derby County", # eng99 page
"Ipswich" => "Ipswich Town", # eng00 eng01 eng02 page
"Leeds" => "Leeds United", # eng00 eng01 eng02 page
"Leeds U" => "Leeds United", # eng00 eng01 eng02 page
"Leeds Utd" => "Leeds United", # eng99 page
"Leicester" => "Leicester City", # eng00 eng01 eng02 page
"Leicester C" => "Leicester City", # eng99 page
"Man City" => "Manchester City", # eng01 page
"Manchester U" => "Manchester United", # eng06 page
"Manchester Utd" => "Manchester United", # eng99 page
"Man Utd" => "Manchester United", # eng01 page
"Man. Utd." => "Manchester United", # eng01 page
"Man United" => "Manchester United", # eng01 page
"Middlesbro" => "Middlesbrough", # eng01 page
"Manchester" => "Manchester United", # eng00 page
"Newcastle" => "Newcastle United", # eng98 page
"Newcastle U" => "Newcastle United", # eng00 eng01 eng02 page
"Newcastle Utd" => "Newcastle United", # eng00 eng01 eng02 page
"Nottm Forest" => "Nottingham Forest", # eng99 page
"Nottingham F" => "Nottingham Forest", # eng99 page
"Sheff. Wed." => "Sheffield Wednesday", # eng00 page
"Sheffield Wed" => "Sheffield Wednesday", # eng00 page
"Sheffield W" => "Sheffield Wednesday", # eng00 page
"Sheffield W." => "Sheffield Wednesday", # eng00 page
"Tottenham" => "Tottenham Hotspur", # eng00 eng01 eng02 page
"Tottenham H" => "Tottenham Hotspur", # eng99 page
"West Ham" => "West Ham United", # eng00 eng01 eng02 page
"West Ham U" => "West Ham United", # eng99 page
"West Ham Utd" => "West Ham United", # eng99 page
"Wigan" => "Wigan Athletic" # eng06 page
);

The comments after the mapping shows the page that that name is used on. The names on the left hand side are the variants and the right hand side are the names that I am mapping to.

No comments: