bkkeepr blog RSS

This is the development blog for bkkeepr. You can contact us over at Get Satisfaction.



Worldcat, xISBN, and preg_match_all

Some books are still cropping up as the dreaded “Unknown Work by Somebody”. This is because someone has supplied a valid ISBN that doesn’t appear anywhere at Amazon UK or US (in either format, see posts passim).

Now, so far these have all been actual books - the problem is, they’re not in English, so they don’t appear on the English-language Amazon. It is possible to query other Amazons (very easily - you just change the .co.X endpoint of AWS queries) but this is getting a bit much.

So I’ve implemened WorldCat’s xISBN if all else fails (specifically, the API getmetadata call) - it’s a lot more comprehensive than Amazon, and covers all languages. However, it’s free version is very limited in terms of the number of calls you’re allowed to make, so it only gets called if Amazon has failed to come up with anything.

xISBN data doesn’t turn up in nice XML either - a metadata request returns something like this:

<rsp stat=”ok”>isbn oclcnum=”66720726” form=”BA” year=”2005” lang=”dut” title=”Droomvader” author=”Matthew Sharpe ; vert. [uit het Engels] Paul van der Lecq.” publisher=”Cossee” city=”Amsterdam”>9059360621</isbn></rsp>

So instead of a standard XML parse, I had to use PHP’s preg_match_all and spend an hour struggling with regular expressions to get some code that looks like this:

preg_match_all(“/(title\="(.*?)")/”, $book_data, $matches);
$title = $matches[2][0];
preg_match_all(“/(author\="(.*?)")/”, $book_data, $matches);
$author = $matches[2][0];

Which seems to do the job. I’m going to manually change the ‘Unknown Work’s that have appeared so far, and hope this prevents a lot more.