Tuesday, March 30, 2010

Yapeal, memory, and XML part 2

In the last blog post I covered some background on extensions available in PHP to work with XML. In this one I'll be look at the Eve APIs and how much memory they use in a very general way. I'll also cover how they effect Yapeal memory usage as a result. Let's get to it.


Most of the Eve APIs return a well defined amount of data like your current account balance or a list of things like the implants in your current clone etc. In the case of things like implants or skills the list can vary in length from 0 to some max number (10 for implants and currently 390 for skills) but there is a limit that can be figured out and planned for. These limits are generally small and take up little in the way of memory no matter which of the XML extensions is used. Next you have APIs like Wallet Journals and Transactions which can be consider large but still limited since they return at most 1000 entries at a time. Finally there's stuff like Asset List, Standings, and most of the corporation ones that even if they aren't truly unlimited their limits are much higher and for our proposes it will be assumed they can become very large. As an example recently while working with an application developer trying out Yapeal who had a list of a few hundred API keys we saw memory use climb to 128MB or more. That doesn't sound like much but when added to the memory needed by MySQL and web server etc on a shared host it does become a problem and caused Yapeal to run very slowly. Given that CCP is unlikely to change the APIs just for Yapeal it needs to be changed so it uses less memory and hopefully lessen the effects cause by a large amount of API data.

Now let's take a quick look at how Yapeal works with the API data. First it looks for a cached copy either in files on the harddrive or in DB cache table if they don't exist or it's past the cachedUntil date found in the copy it has it decided to try getting a fresh copy uses cURL from the API server or a configured proxy. The XML is first returned as a string then converted to a simpleXML object. With simple rowset type APIs the data is convert first to an associative array which is then used to built a SQL 'upsert' (combined insert on duplicate update) query. For more complex APIs the XML is converted into multiple arrays and upserted to one or more tables. You notice that there are times when there are up to 3 copies of the same data existing side by side. To make it worse there are actually times when there are at least other partial additional copy cause by limitations in the ability to separate out only the part needed while building the arrays in complex APIs. From the above it is clear that if any of the large or very large APIs are complex the memory requirement can grow very fast and it actually multiples inside Yapeal in most cases.

Ok now that we understand in general the types of APIs, their memory needs, and some of the bad effects they can have in Yapeal we have a good basis on which to start look at which XML extensions from part 1 would be best match to the need of Yapeal. Since doing so will take a while (read several more paragraphs) I'll leave that to the third and probably last part in this series.

Monday, March 29, 2010

Yapeal, memory, and XML part 1

Ok as everyone that been using Yapeal for a while knows it has become a bit of a memory hog of late. Partly this has been caused by adding some APIs that can be very large but some of it has been because of an early design decision to use simpleXML. Don't get me wrong I love simpleXML and I might not have even start on the project that grow into Yapeal without it but there's some things about it that are now cause issues. Let me take a little deter into the options available to work with XML in PHP in this post so everyone has some background.

  • DOM - Been around for a long time and was design by the same people that did standard for XML, HTML, etc. It's based on seeing the XML document as a tree of elements with attributes and is considered OOP but not being made for PHP originally it object model is somewhat different. It is very powerful and there are many things it can do that none of the other options have but it's also hard to use and the tree it builds has to all fit in memory which start to become a problem with large documents. You can find DOM libraries for most programming languages on most OS platforms which makes it something everyone should learn and understand and know how to use if you are going to work with XML a lot.
  • simpleXML - It was design as an easier to understand and use option to using DOM in PHP. It is an OOP design like the DOM but a better fit with the object model use in PHP since it was developed for it. It originally only allowed the document to be read but writing has since been added and you can use it to make new documents as well. You can do almost everything with it that you can with DOM but for creating complex documents etc it's usually better to use DOM instead. You can find implementation of it in other languages now but it's mostly use in PHP. It uses an in-memory tree like DOM does because they share a common low level library. Because of this passing a document between them is easy and is sometimes used to help fill hole where simpleXML can't do stuff that DOM can or where it's easier in DOM or the other way around.
  • SAX - It's been around longer then the DOM has and instead of viewing XML as a document and making it into a tree it views it as a bunch of pieces or a stream of them and fires 'events' that have to be handled for each of them. It is not OOP but has only a function based interface to the XML. With SAX you can only go forward through the XML and the programmer have to keep track of where they are in the document and making sense out of all the 'events' is totally up to them. Most programmer find it very difficult to work with and once DOM became available most of them switched because of this. One big advantage to SAX is that it uses very little memory normally making it useful where either the target platform has very little or the document is very large and can't fit in memory. SAX can't be used to make XML documents only read them. It is usually available as a library for all platforms and programming languages.
  • XSL - Used to transform XML documents into either another type of XML document or some other type of text output. The stylesheet used to do the transform is itself a XML document. XSL is great for converting XML to HTML or a CSV file but finds little application outside of that. It's based on DOM as well with many of the same issues with memory though there have been implementations based on SAX that allow for documents larger than memory.
  • XMLReader - How I've come to think of it is like simpleXML for SAX. Instead of having to react to a bunch of 'events' with no context you iterate through a sequence of nodes (objects). Like SAX it have a small memory footprint since it only view the document in small pieces but because they are objects they help the programmer by provided more information than just something happened like in SAX. You can check if the node is empty or contents  something else, if it has attributes and their values, etc. All of these thing and others make using it and keeping track on where you are in the document as a programmer much easier.
  • XMLWriter - It is a hybrid extension that let's you use either function or OOP access to creating XML documents. It's a compliment to XMLReader and fits some where between simpleXML and the DOM in complexity.
Ok that the end of this post in the next one I'll cover the different kinds of Eve APIs and how they differ mostly in the amount of memory each uses.

Experimenting with Yapeal blog

Ok I've decide to try a blog instead of just using the eve Yapeal thread and wiki for stuff. I've never done one before or even really followed anyone else's so no telling how much or what will end up here or if I'll continue to use it but had some ideas that I thought might be better serviced using a blog vs something else.

One thing I was thinking of use it for is a place to post some of my design ideas and about any changes I'm thinking about make to Yapeal. By using a blog maybe people will be more willing to add feedback in the comments.

I might also use it to post a few tutorials about Yapeal or programming in general.

Anyway that's some of my ideas and I'm very open to any other ideas people might have about what they would like to see here.