LAgent: an agent framework in F# – Part IX – Counting words …

-

Download frame­work here.

All posts are here:

Let’s now use our mapRe­duce to do some­thing more in­ter­est­ing, for ex­am­ple find­ing the fre­quency of words in sev­eral books. Now the agent that processes the out­put needs to be a bit more com­plex.

let gathererF = fun msg (data:List<string * int>, counter, step) ->
                    match msg with
                    | Reduced(key, value)   ->
                        if counter % step = 0 then
                            printfn "Processed %i words. Now processing %s" counter key
                        data.Add((key, value |> Seq.hd))
                        data, counter + 1, step
                    | MapReduceDone         ->
                        data
                        |> Seq.distinctBy (fun (key, _) -> key.ToLower())
                        |> Seq.filter (fun (key, _) -> not(key = "" || key = """ ||
(fst (Double.TryParse(key))))) |> Seq.to_array |> Array.sortBy snd |> Array.rev |> Seq.take 20 |> Seq.iter (fun (key, value) -> printfn "%Att%A" key value) printfn "All done!!" data, counter, step let gatherer = spawnAgent gathererF (new List<string * int>(), , 1000)

Every time a new word is re­duced, a mes­sage is printed out and the re­sult is added to a run­ning list. When every­thing is done such a list is printed out by first ma­nip­u­lat­ing it to re­duce weird­ness and limit the num­ber of items. BTW: there are at least two bugs in this code, maybe more (late night quick-and-dirty-see-if-the-algo-works kind of cod­ing).

We want to max­i­mize the num­ber of proces­sors to use, so let’s split the books in chunks so that they can be op­er­ated in par­al­lel. The code be­low roughly does it (I say roughly be­cause it does­n’t chunk the lines in the right or­der, but for this par­tic­u­lar case it does­n’t mat­ter).

let gatherer = spawnAgent gathererF (new List<string * int>(), , 1000)
let splitBook howManyBlocks fileName =
    let buffers = Array.init howManyBlocks (fun _ -> new StringBuilder())
    fileName
    |> File.ReadAllLines
    |> Array.iteri (fun i line -> buffers.[i % (howManyBlocks)].Append(line) |> ignore)
    buffers
let blocks1 = "C:UserslucabolDesktopAgentsAgentskjv10.txt" |> splitBook 100
let blocks2 = "C:UserslucabolDesktopAgentsAgentswarandpeace.txt" |> splitBook 100
let input =
    blocks1
    |> Array.append blocks2
    |> Array.mapi (fun i b -> i.ToString(), b.ToString())

And let’s ex­e­cute!!

mapReduce input map reduce gatherer 20 20 partitionF

On my ma­chine I get the fol­low­ing, which could be the right re­sult.

"a"        16147
"And"        13071
"I"        11349
"unto"        8125
"as"        6400
"her"        5865
"which"        5544
"from"        5378
"at"        5175
"on"        5155
"have"        5135
"me"        5068
"my"        4629
"this"        3782
"out"        3653
"ye"        3399
"when"        3312
"an"        2841
"upon"        2558
"so"        2489
All done!!

Tags