Sunday, January 10, 2016

Cutting Up DraftKing's csv, with awk and sed.

On Outsider Math I've published a few articles about using math and the R programming language to pick line ups. Now I think I am going to break down the how, of using gnu tools to cut up a DraftKings nfl and nhl csv. These methods will work on mac as well, and if you need to do it on windows you can install cygwin to get it working. Here is a one liner for the nfl, you would just type in your terminal window on linux or mac, and in your cygwin window on windows:

awk -F "\"*,\"*" '{print $1,"," $2,"," $3,"," $5}' DKSalaries.csv | sed "s/\"QB /QB/g" | sed "s/\"RB /RB/g" | sed "s/\"WR /WR/g" | sed "s/\"TE /TE/g" | sed "s/\"DST /DST/g" > nflredux

I do end up manually editing the header after I'm all done.Ok so let's start with the first part of the awk:

awk -F "\"*,\"*"

This says, I want to use a comma(,) as the delimiter, but it also says I want to take the quotes out as I go. That being said it will not take out the first quote in the csv.

{print $1,"," $2,"," $3,"," $5}' DKSalaries.csv

This says I want to cut out the fields in the order of first field, second field, third, and fifth. With a space and then a comma.
The next bit:

| sed "s/\"QB /QB/g"

Turn quote QB space or '"QB ' into QB. Same thing with RB, WR, TE, and DST.The ">nflredux"is saying output to nflredux.
Your output will be like this (like I said I usually end up manually editing the header):

"Position ,Name ,Salary ,AvgPointsPerGame
WR,Antonio Brown ,9300 ,25.167
WR,Odell Beckham Jr. ,9000 ,23.779
WR,Julio Jones ,8500 ,25.080
WR,DeAndre Hopkins ,8400 ,22.213
WR,Brandon Marshall ,7900 ,22.773
WR,Allen Robinson ,7800 ,20.213
RB,Devonta Freeman ,7600 ,22.429
QB,Cam Newton ,7500 ,25.491
WR,A.J. Green ,7500 ,18.220


So for the nfl it's pretty much the same deal, and here's the example for you:

awk -F "\"*,\"*" '{print $1,"," $2,"," $3,"," $5}' DKSalaries.csv | sed "s/\"G /G/g" | sed "s/\"LW /W/g" | sed "s/\"C /C/" | sed "s/\"RW /W/g" | sed "s/\"D /D/" > dknhl

If you need more help, look no further than here.

No comments: