Tuesday, March 30, 2010

Yapeal, memory, and XML part 2

In the last blog post I covered some background on extensions available in PHP to work with XML. In this one I'll be look at the Eve APIs and how much memory they use in a very general way. I'll also cover how they effect Yapeal memory usage as a result. Let's get to it.


Most of the Eve APIs return a well defined amount of data like your current account balance or a list of things like the implants in your current clone etc. In the case of things like implants or skills the list can vary in length from 0 to some max number (10 for implants and currently 390 for skills) but there is a limit that can be figured out and planned for. These limits are generally small and take up little in the way of memory no matter which of the XML extensions is used. Next you have APIs like Wallet Journals and Transactions which can be consider large but still limited since they return at most 1000 entries at a time. Finally there's stuff like Asset List, Standings, and most of the corporation ones that even if they aren't truly unlimited their limits are much higher and for our proposes it will be assumed they can become very large. As an example recently while working with an application developer trying out Yapeal who had a list of a few hundred API keys we saw memory use climb to 128MB or more. That doesn't sound like much but when added to the memory needed by MySQL and web server etc on a shared host it does become a problem and caused Yapeal to run very slowly. Given that CCP is unlikely to change the APIs just for Yapeal it needs to be changed so it uses less memory and hopefully lessen the effects cause by a large amount of API data.

Now let's take a quick look at how Yapeal works with the API data. First it looks for a cached copy either in files on the harddrive or in DB cache table if they don't exist or it's past the cachedUntil date found in the copy it has it decided to try getting a fresh copy uses cURL from the API server or a configured proxy. The XML is first returned as a string then converted to a simpleXML object. With simple rowset type APIs the data is convert first to an associative array which is then used to built a SQL 'upsert' (combined insert on duplicate update) query. For more complex APIs the XML is converted into multiple arrays and upserted to one or more tables. You notice that there are times when there are up to 3 copies of the same data existing side by side. To make it worse there are actually times when there are at least other partial additional copy cause by limitations in the ability to separate out only the part needed while building the arrays in complex APIs. From the above it is clear that if any of the large or very large APIs are complex the memory requirement can grow very fast and it actually multiples inside Yapeal in most cases.

Ok now that we understand in general the types of APIs, their memory needs, and some of the bad effects they can have in Yapeal we have a good basis on which to start look at which XML extensions from part 1 would be best match to the need of Yapeal. Since doing so will take a while (read several more paragraphs) I'll leave that to the third and probably last part in this series.

3 comments:

  1. It seems XMLReader is the best way to go in terms of memory management, and instead of loading the entire XML into arrays isn't it possible to insert into MySQL directly from the XML.

    Another thing you could do instead of compiling huge arrays is to use multiple MySQL queries, or just append query parts to a string (Which might save you memory over an array)

    ReplyDelete
  2. I think for the really important APIs (with a huge memory footprint) XMLReader is pretty much the way to go. I'd cache any file to hard drive (remove the mysql caching) and open it with xmlreader from there, so the file will never have to be in memory completely.
    I just did a test with very simple xml files (going through them and reading out attributes) - for 10MiB and 100MiB files i got less than 1 MB memory used (I didn't save the attributes into an array, so this is just the pure XML reading)

    ReplyDelete
  3. First sorry about not getting your comments posted before but I'm still learning how everything work and didn't see them until today :P

    Thanks for the comments and I've been looking at some of the very same things and trying some of it out in a branch to see if there's any side effects or other things that would cause problems. I'm not going to go into what all I've tried or decided to do but I'll put it in the 3rd part which I'm going to try having out sometime later this week.

    ReplyDelete