13 min to read
Crunching CSVs with AWK for Fun and Profit
Gawk! (or possibly Mawk)

Normally you’d think crunching log data and CSVs nowdays mostly abstracted away behind nifty analytics tools like ELK and Splunk. When living off the land, whether because these useful toolkits aren’t available to you or when you’re elbows deep in a client’s network performing a penetration test, AWK is still one of the most formidable tools for turning kludgey data into useful information. The best part is it ships with just about every *NIX distribution you’re likely to see in the wild.
So that this is applicable to the broadest group possible, I’m going to try to keep this as close to the syntax available to the default flavor of AWK that ships with debian - GAWK, the GNU flavor of AWK.
For this article, we’re also going to be using the following data generated from Mockaroo:
time,ipv4_address,user_agent,droidspeek,department,shirt_size
2020-02-06T21:11:02Z,106.57.58.166,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",Proactive static help-desk,Accounting,XL
2019-09-01T18:00:21Z,17.110.49.47,"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1",Secured web-enabled definition,Sales,XS
2019-10-28T10:32:08Z,89.221.150.185,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11",Devolved systemic help-desk,Business Development,XS
2020-01-15T11:31:49Z,249.140.147.244,"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11",Open-architected 5th generation adapter,Services,M
2019-09-20T16:11:25Z,26.93.197.68,"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/12.0.702.0 Safari/534.24",Assimilated composite access,Engineering,2XL
2020-06-25T15:32:18Z,122.65.87.156,Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20130401 Firefox/31.0,User-centric full-range artificial intelligence,Legal,M
2020-04-23T00:55:35Z,159.218.39.19,"Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1",Synergized reciprocal Graphic Interface,Sales,3XL
2019-09-13T16:00:17Z,16.128.193.132,"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.669.0 Safari/534.20",Synchronised disintermediate extranet,Legal,S
2019-08-05T03:43:58Z,137.240.27.42,"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",Team-oriented multimedia archive,Research and Development,S
2019-10-16T16:09:09Z,174.250.32.66,"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1",Programmable global monitoring,Business Development,S
2020-07-17T19:52:59Z,211.168.240.61,"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24",Up-sized systematic matrix,Research and Development,L
2019-08-23T00:05:19Z,92.24.69.133,Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,Optional contextually-based application,Accounting,L
2019-10-30T13:54:20Z,154.207.58.180,"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36",Balanced hybrid array,Sales,2XL
2019-12-24T13:59:16Z,18.221.55.62,"Mozilla/5.0 (Windows; U; Windows NT 6.1; ko-KR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27",Ergonomic maximized productivity,Business Development,M
2019-08-22T21:36:27Z,45.213.213.2,"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36",Visionary asynchronous analyzer,Services,3XL
2020-07-01T20:43:47Z,103.234.165.68,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.20 Safari/535.1",Horizontal needs-based algorithm,Support,2XL
2020-01-22T08:16:41Z,37.1.239.68,"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11",Diverse fault-tolerant capability,Training,3XL
2020-07-01T14:09:10Z,26.188.166.130,"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Chrome/11.0.652.0 Safari/534.17",Ergonomic logistical software,Services,S
2019-08-13T00:40:09Z,94.88.80.241,"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.813.0 Safari/535.1",Cross-group heuristic task-force,Business Development,3XL
2020-03-27T08:46:45Z,196.127.150.166,"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/14.0.825.0 Chrome/14.0.825.0 Safari/535.1",Future-proofed multi-state initiative,Marketing,3XL
For anyone who wants to grok AWK from the fundamentals, I’d highly recommend the Grymoire’s guide for AWK which is where I taught myself the bulk of what I know about the program. They also have a great guide to sed and regular expressions that are worth checking out too!
Dealing with field seperators inside quoted strings
The most common question I see online around using AWK is dealing with common field seperator characters inside quoted strings. Usually this ends up with a lot of less-than-ideal solutions which amount to either using a different tool to cut the data. For the rest of this article, you’ll see this pattern used to make sure that field seperators (commas in the case of a CSV) inside quotes are not interpreted.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} {print $3}' MOCK_DATA.csv
user_agent
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/12.0.702.0 Safari/534.24"
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20130401 Firefox/31.0
"Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1"
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.669.0 Safari/534.20"
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3"
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24"
Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36"
"Mozilla/5.0 (Windows; U; Windows NT 6.1; ko-KR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.20 Safari/535.1"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11"
"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Chrome/11.0.652.0 Safari/534.17"
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.813.0 Safari/535.1"
"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/14.0.825.0 Chrome/14.0.825.0 Safari/535.1
To break this down, first we’re using the BEGIN
statement to tell AWK we only want to run the next block inside curly braces ({}
) once at the start of the script. Inside the first set of curly braces, we define the value of the FPAT variable. According to the AWK manual, The FPAT variable and field splitting based on field values.
What this means is the FPAT variable contains a regular expression which defines how a field is defined. What we’re passing it, ([^,]+|\"[^\"]+\")
, says that a field is either purely comma seperated ([^,]+
) or wrapped within quotes (\"[^\"]+\"
).
The second block, {print $3}
, will run for every line in the CSV because it isn’t prepended by a statement like BEGIN
or END
. The statement inside it is a fairly pedestrian print statement which will print the third column.
Grep in AWK
Much in the same vein as Useless Use of Cat award, a common mistake I see with AWK with regularity is piping AWK into grep, i.e. awk '{print $5}' | grep "Accounting"
. Instead, you can use the if
command to do arithmetic comparisons in AWK. The equivalence operator (==
) convneriently also does string matching. The script below for example matches any row which has the exact string Accounting
in it’s fifth field.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} {if ($5 == "Accounting") {print}}' MOCK_DATA.csv
2020-02-06T21:11:02Z,106.57.58.166,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",Proactive static help-desk,Accounting,XL
2019-08-23T00:05:19Z,92.24.69.133,Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,Optional contextually-based application,Accounting,L
It’s even possible to do fuzzy/pattern matching with regular expressions using the match
command and a pattern. The statement $5 ~ /^Acc/ { print $0 }
will match anything in the 5th field which starts with ‘Acc’ including the Accounting group.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} $5 ~ /^Acc.*/ { print $0 }' MOCK_DATA.csv
2020-02-06T21:11:02Z,106.57.58.166,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",Proactive static help-desk,Accounting,XL
2019-08-23T00:05:19Z,92.24.69.133,Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,Optional contextually-based application,Accounting,L
You can also print the header row by using the NR
command to match the first row and print it. So we don’t accidentally match the header row with the rest of our script, we then test against NR>1
to only match rows beyond the header for our string matching. This way we get the header as well as the matching data, but the header won’t accidentally be printed twice if the pattern between the slashes matches a header field.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} NR==1 { print $0 } NR>1 && $5 ~ /^Acc.*/ { print $0 }' MOCK_DATA.csv
time,ipv4_address,user_agent,droidspeek,department,shirt_size
2020-02-06T21:11:02Z,106.57.58.166,"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",Proactive static help-desk,Accounting,XL
2019-08-23T00:05:19Z,92.24.69.133,Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,Optional contextually-based application,Accounting,L
Using AWK to uniq fields
Awk can also be used to uniq fields (as I found out browsing StackExchange) instead of having to perform arcane shenanigans to split, uniq, and then mash data back together. For example, if you wanted to uniq the user agent strings in our example data you could specify {!a[$3]++}
. What this does it creates an array called ‘a’ and populates it with the value it finds in column 3, the user agent, on the current row. The not operator (!
) ensures that duplicates aren’t matched and added, and the increment operator (++
) advances the row after each match. After processing the whole CSV, the END
statement is called to print everything. We assign each value one by one in a loop to the variable ‘b’ which we print to screen.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} {!a[$3]++} END {for (b in a) {print b}}' MOCK_DATA.csv
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.813.0 Safari/535.1"
"Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1"
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1"
"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Chrome/11.0.652.0 Safari/534.17"
"Mozilla/5.0 (Windows; U; Windows NT 6.1; ko-KR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36"
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36"
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20130401 Firefox/31.0
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/12.0.702.0 Safari/534.24"
"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/14.0.825.0 Chrome/14.0.825.0 Safari/535.1"
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.20 Safari/535.1"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24"
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1"
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.669.0 Safari/534.20"
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11"
user_agent
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69
You can even concatenate multiple fields together, for example user agent and IP address:
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} {!a[$3","$2]++} END {for (b in a) {print b}}' MOCK_DATA.csv
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.813.0 Safari/535.1",94.88.80.241
"Mozilla/5.0 (Windows; U; Windows NT 6.1; ko-KR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27",18.221.55.62
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/12.0.702.0 Safari/534.24",26.93.197.68
"Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1",159.218.39.19
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11",89.221.150.185
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.669.0 Safari/534.20",16.128.193.132
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36",45.213.213.2
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11",249.140.147.244
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11",37.1.239.68
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24",211.168.240.61
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1",17.110.49.47
"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Chrome/11.0.652.0 Safari/534.17",26.188.166.130
"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/14.0.825.0 Chrome/14.0.825.0 Safari/535.1",196.127.150.166
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.20 Safari/535.1",103.234.165.68
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1",174.250.32.66
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",137.240.27.42
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",106.57.58.166
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36",154.207.58.180
Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,92.24.69.133
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20130401 Firefox/31.0,122.65.87.156
user_agent,ipv4_address
The downside is the data will be printed in reverse order. Reordering this is a pain given that awk arrays are not like regular arrays in a language like c. Instead, they are associative arrays based on key/value pairs, much like a map or a hash in other languages. As we’re not too fussed about anything other than the placement of the header row, we can get around having to reorder the whole lot by printing the parts of the header row we want with NR==1 {print $3","$2}
and then loop through the body like normal with NR>1 {!a[$3","$2]++}
before hitting our end statement just like we did with the grep example. What this is doing is running a one-off print for the first row, then adding everything else to our array if it hasn’t been seen already. At the end, we print the keys like normal as we’re not doing anything with the values.
kitsutron@lappentoppen:~$ awk 'BEGIN {FPAT = "([^,]+|\"[^\"]+\")"} NR==1 {print $3","$2}; NR>1 {!a[$3","$2]++}; END {for (b in a) {print b}}' MOCK_DATA.csv
user_agent,ipv4_address
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.813.0 Safari/535.1",94.88.80.241
"Mozilla/5.0 (Windows; U; Windows NT 6.1; ko-KR) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27",18.221.55.62
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/12.0.702.0 Safari/534.24",26.93.197.68
"Mozilla/5.0 (Macintosh; PPC Mac OS X 10_6_7) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.790.0 Safari/535.1",159.218.39.19
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.65 Chrome/17.0.963.65 Safari/535.11",89.221.150.185
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.669.0 Safari/534.20",16.128.193.132
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36",45.213.213.2
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_4) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11",249.140.147.244
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11",37.1.239.68
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24",211.168.240.61
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1",17.110.49.47
"Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.17 (KHTML, like Gecko) Chrome/11.0.652.0 Safari/534.17",26.188.166.130
"Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/14.0.825.0 Chrome/14.0.825.0 Safari/535.1",196.127.150.166
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.20 Safari/535.1",103.234.165.68
"Mozilla/5.0 (Windows NT 5.2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1",174.250.32.66
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",137.240.27.42
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36",106.57.58.166
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36",154.207.58.180
Mozilla/5.0 (X11; FreeBSD amd64) AppleWebKit/536.5 (KHTML like Gecko) Chrome/19.0.1084.56 Safari/1EA69,92.24.69.133
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20130401 Firefox/31.0,122.65.87.15
See GNU’s documentation on AWK for more details on how to crunch arrays.