Downloading stock prices in F# - Part III - Async loader for prices and divs
Luca Bolognese -Other parts:
- Part I - Data modeling
- Part II - Html scraping
- Part IV - Async loader for splits
- Part V - Adjusting historical data
- Part VI - Code posted
It is now time to load our data. There is a bit of uninteresting code to start with, but things get interesting afterward. Let’s start with functions that create the right URLs to download prices and dividends. We’ll talk about splits in the next installment.
let commonUrl ticker span = @"http://ichart.finance.yahoo.com/table.csv?s=" + ticker + "&a="
+ (span.Start.Month - 1).ToString() + "&b=" + span.Start.Day.ToString() + "&c="
+ span.Start.Year.ToString() + "&d=" + (span.End.Month - 1).ToString() + "&e="
+ span.End.Day.ToString() + "&f=" + span.End.Year.ToString() let priceUrl ticker span = commonUrl ticker span + "&g=d&ignore=.csv" let divUrl ticker span = commonUrl ticker span + "&g=v&ignore=.csv"
We will also need to construct an observation given a comma delimitated line of text. Again, for spits things will be harder.
let parsePrice (line: string) = let tokens = line.Split([|','|]) { Date = DateTime.Parse(tokens.[0]); Event = Price ({Open = money (Double.Parse(tokens.[1])) ;
High = money (Double.Parse(tokens.[2])); Low = money (Double.Parse(tokens.[3])); Close = money (Double.Parse(tokens.[4]));
Volume = volume (Double.Parse(tokens.[5]))})} let parseDiv (line: string) = let tokens = line.Split([|','|]) let date = DateTime.Parse(tokens.[0]) let amount = money (Double.Parse(tokens.[1])) {Date = date; Event = Div amount}
Nothing noteworthy about this code. We have a couple of other ’infrastructure pieces before we get to the Async pieces. The next function is recursive. It takes a StringReader and reads lines out of it. For each line it calls a parsing function that takes the line as input and returns an object as output. The function gathers all such objects in the listOfThings list. If you are new to F# the following construct (parseLineFunc line:: listOfThings) means: execute the parseLineFunc with argument line, take the result and create a list that has the result as head and listOfThings as tail).
let rec loadFromLineReader (reader:StringReader) listOfThings parseLineFunc = match reader.ReadLine () with | null -> listOfThings | line -> loadFromLineReader reader (parseLineFunc line::listOfThings) parseLineFunc
The next function is rather uninteresting. It just converts a string to a StringReader, cut out the first line (header) and calls loadFromLineReader.
let loadFromLineString text listOfThings parseLineFunc = let reader = new StringReader(text) reader.ReadLine ()|> ignore // skip header loadFromLineReader reader listOfThings parseLineFunc
We now come to the first Async function. But what is an Async function? There are several possible technically correct definition as: it is an instance of the monad pattern or it is a function that returns an Async object or it is a way to release your thread to the thread pool. These definition don’t help me much. I need something intuitive to latch one.
The way that I personally visualize it is: there are things in the world that are very good at executing certain tasks and like to be hit by multiple parallel requests for these tasks. They’d like me to give them their workload and get out of their way. They’ll call me when they are done with it. These ‘things’ are disk drives, web servers, processors, etc Async is a way to say: hey, go and do this, call me when you are done.
Now, you can call the asynchronous APIs directly, or you can use the nice F# language structures to do it. Let’s do the latter.
let loadWebStringAsync url = async { let req = WebRequest.Create(url: string) use! response = req.AsyncGetResponse() use reader = new StreamReader(response.GetResponseStream()) return! reader.AsyncReadToEnd()}
This function retrieves a web page as a string asynchronously. Notice that even if the code looks rather normal, this function will likely be executed on three different thread. The first thread is the one the caller of the function lives on. The function AsyncGetResponse causes the thread to be returned to the thread pool waiting for a response back from the web server. Once such a response arrives, the execution resumes on a different thread until AsyncReadToEnd. That instruction returns the execution thread to the thread pool. A new thread is then instantiated when the string has been completely read. The good thing is that all of this is not explicitly managed by the programmer. The compiler ‘writes the code’ to make it all happen. You just have to follow a set of simple conventions (i.e. putting exclamation marks in the right place).
The return result of this function is an Async
Async is somehow contagious. If you are calling an Async function you have to decide if propagate the Asyncness to your callers or remove it by executing the function. Often propagating it is the right thing to do as your callers might want to batch your function with other aync ones to be executed together in parallel. Your callers have more information than you do and you don’t want to short-circuit them. The following function propagates ayncness.
let loadFromUrlAsync url parseFunc = async { let! text = loadWebStringAsync url return loadFromLineString text [] parseFunc}
Let’s see how the functions presented to this point compose to provide a way to load prices and dividends (splits will be shown afterward).
let loadPricesAsync ticker span = loadFromUrlAsync (priceUrl ticker span) parsePrice let loadDivsAsync ticker span = loadFromUrlAsync (divUrl ticker span) parseDiv
This composition of functions is very common in functional code. You construct your building blocks and assemble them to achieve your final goal. Functional programming is good at almost forcing you to identify the primitive blocks in your code. All right, next in line is how to load splits.
Tags
- FSHARP
10 Comments
Comments
Luca Bolognese's WebLog : Down
2008-09-12T16:25:24ZPingBack from http://blogs.msdn.com/lucabol/archive/2008/09/05/downloading-stock-prices-in-f-part-ii-html-scraping.aspx
Keith
2008-09-13T11:36:17ZNice article. One potential improvement: why not use sprintf to avoid all those annoying ToString()s in the commonUrl function?
Luca Bolognese
2008-09-15T11:59:14ZYou are so very right. My excuse is that the code for URL func is cut and paste of an old C# code I have. That is not even an excuse given that you can do much better in C# as well :)
Luca Bolognese's WebLog
2008-09-19T17:59:39ZOther parts: Part I - Data modeling Part II - Html scraping Part III - Async loader for prices and divs
Carl
2008-09-24T07:02:18ZVery nice! I am learning a lot, please keep it up.
For better readability, I wrote your url functions as follows:
let commonHttpQuery ticker span =
let query = new StringBuilder();
Printf.bprintf query “s=”
Printf.bprintf query “%s” ticker
Printf.bprintf query “&a=”
Printf.bprintf query “%d” (span.Start.Month - 1)
Printf.bprintf query “&b=”
Printf.bprintf query “%d” span.Start.Day
Printf.bprintf query “&c=”
Printf.bprintf query “%d” span.Start.Year
Printf.bprintf query “&d=”
Printf.bprintf query “%d” (span.End.Month - 1)
Printf.bprintf query “&e=”
Printf.bprintf query “%d” span.End.Day
Printf.bprintf query “&f=”
Printf.bprintf query “%d” span.End.Year
query.ToString()
This allows the same query to be used in commonUrl and splitUrl:
let commonUrl ticker span =
let urlString query =
let urlBuilder = new UriBuilder()
urlBuilder.Scheme <- “http”;
urlBuilder.Host <- “ichart.finance.yahoo.com”
urlBuilder.Port <- 80
urlBuilder.Path <- “table.csv”
urlBuilder.Query <- query
urlBuilder.ToString();
urlString (commonHttpQuery ticker span)
let splitUrl ticker span page =
let urlString query =
let urlBuilder = new UriBuilder()
urlBuilder.Scheme <- “http”;
urlBuilder.Host <- “finance.yahoo.com”
urlBuilder.Port <- 80
urlBuilder.Path <- “q/hp”
urlBuilder.Query <- query
urlBuilder.ToString();
urlString (commonHttpQuery ticker span) + sprintf “&g=v&z=66&y=%d” (66 * page)
lucabol
2008-09-24T11:32:39ZNice, thanks. I didn’t even know urlbuilder existed.
Luca Bolognese's WebLog
2008-09-26T16:04:19ZOther parts: Part I - Data modeling Part II - Html scraping Part III - Async loader for prices and divs
JohnG
2008-09-30T11:57:45ZError1The field, constructor or member ‘AsyncGetResponse’ is not defined. ???
lucabol
2008-09-30T12:21:54ZYou have to reference the FSharp Powerpack. I’m not posting the code yet because I’m working on the UI and want to post everything together.
Luca Bolognese's WebLog
2008-10-20T18:45:52ZOther parts: Part I - Data modeling Part II - Html scraping Part III - Async loader for prices and divs