Saturday, August 14, 2010

Programmers wanted

Just thought I'd let the couple people that actual look at this blog know I am looking for programmers to help with Yapeal's development. If you're interested contact me via owner info on project.

Sunday, April 11, 2010

Yapeal, memory, and XML part 3

Hi all sorry it took so long before getting to this post but while writing the last one I got some ideas and decided to try them out in Yapeal.

Ok Let's do a summary from the other posts here. First there's several ways to work with XML in PHP. Some work with the whole document at a time like DOM, SimpleXML, and most versions of XSL. Others work with small pieces like SAX and XMLReader.

Next we know some APIs like Account Balance are small, some are large but limited like Wallet Journal, and others can become very large and aren't limited like Asset List. We also know that as Yapeal now stands it's doing a lot of converting and copying and duplicating of the XML which multiples the memory used. When Yapeal was first started it mostly did the small, or large limited APIs so memory never become much of a problem and SimpleXML made it just that, simple. Since then Yapeal has gone through a lot of changes but largely the logic used to process the XML hasn't changed even after adding some of the largest unlimited APIs. PHP has also gone through some changes in PHP 5.3 and it's now showing the memory use for some extensions that was hidden in prior versions. Putting all those things together with my ability to only test it with a small number of accounts at a time (< 5 and often only a couple) it's not to surprising that when we did some testing with a few hundred accounts there were a few issues. The biggest one was it used around 128MB of memory and there were few other issues with the ordering of the APIs  which let earlier APIs keep later ones from having a chance to get their data. I'm not going to get into the ordering problems here but I do plan on covering that in a future blog.

One of the great things with SimpleXML as well as DOM and XSL is you have XPath to help you cut out small part of well designed XML like a skilled surgeon with a scalpel but because of some poor design in many of the APIs IMHO trying to use it is more like trying to hack off of a piece of meat from a charging wild animal with a dull broad sword without get trampled.  Often times in Yapeal trying to use it ends up doing little more than cutting off the head and maybe getting it cut up into quarters that still have the skin on them. As I said XPath can let you get small parts but it actually does this in SimpleXML and DOM by making a copy or at least you end up having to do so yourself to use the result which when added to the XML design on the larger APIs ends up being a problem. Now just to make it clear I really like XPath, SimpleXML and most of the larger APIs actually don't have many of the bad design issues as some of the other APIs but it's very hard to work with them without using a lot of memory.

So give the above the question becomes would one of the other extensions work better in Yapeal? The DOM would have the same problems as now but be harder to work with. XSL if it used a SAX type backend would work but the one in PHP isn't so it's not going to be helpful either. SAX could improve the memory issues but it very hard to use and I believe it would make Yapeal un-maintainable and I don't see converting over to it. So that leaves just two extensions and only one of them is made for reading XML and not making it.

Let's look at XMLReader some more. It uses much less memory than SimpleXML, DOM, or XSL because it only deals with the XML in small pieces and it's easier to use than SAX. So far it sound like it could be a better fit for Yapeal but "the true is in the code" so to say. Just to keep this blog from getting any longer than needed I'll say after working with XMLReader for almost two weeks now that it does seem to be a better fit for Yapeal. So far in trying it with some other changes I've made I've seen memory use drop to less than half what it was before.

Now the only other thing that may be a problem but shouldn't be is do most hosting sites include XMLReader? They should because it's a standard included and enabled extension for PHP since version 5.1.0 but as I've found out after releasing Yapeal some hosting sites don't even have SPL (Standard PHP Library) available. So I'd like to hear from some people to get some idea how common it is or what they needed to do to get their host to make it available. For anyone that want to check for it try this:
php -r 'if (extension_loaded("xmlreader")) {print "YES!" . PHP_EOL;};'

Other than needing a different extension this change shouldn't be visible outside of Yapeal itself in any way but some of the other changes to work around the issues with the order in which Yapeal does the APIs may be more visible as they may require so database changes in the util* tables that may have to be done manually. If it does I'll write the instructions up with the SQL and post them.

That's it for now see you down the blog.

Tuesday, March 30, 2010

Yapeal, memory, and XML part 2

In the last blog post I covered some background on extensions available in PHP to work with XML. In this one I'll be look at the Eve APIs and how much memory they use in a very general way. I'll also cover how they effect Yapeal memory usage as a result. Let's get to it.


Most of the Eve APIs return a well defined amount of data like your current account balance or a list of things like the implants in your current clone etc. In the case of things like implants or skills the list can vary in length from 0 to some max number (10 for implants and currently 390 for skills) but there is a limit that can be figured out and planned for. These limits are generally small and take up little in the way of memory no matter which of the XML extensions is used. Next you have APIs like Wallet Journals and Transactions which can be consider large but still limited since they return at most 1000 entries at a time. Finally there's stuff like Asset List, Standings, and most of the corporation ones that even if they aren't truly unlimited their limits are much higher and for our proposes it will be assumed they can become very large. As an example recently while working with an application developer trying out Yapeal who had a list of a few hundred API keys we saw memory use climb to 128MB or more. That doesn't sound like much but when added to the memory needed by MySQL and web server etc on a shared host it does become a problem and caused Yapeal to run very slowly. Given that CCP is unlikely to change the APIs just for Yapeal it needs to be changed so it uses less memory and hopefully lessen the effects cause by a large amount of API data.

Now let's take a quick look at how Yapeal works with the API data. First it looks for a cached copy either in files on the harddrive or in DB cache table if they don't exist or it's past the cachedUntil date found in the copy it has it decided to try getting a fresh copy uses cURL from the API server or a configured proxy. The XML is first returned as a string then converted to a simpleXML object. With simple rowset type APIs the data is convert first to an associative array which is then used to built a SQL 'upsert' (combined insert on duplicate update) query. For more complex APIs the XML is converted into multiple arrays and upserted to one or more tables. You notice that there are times when there are up to 3 copies of the same data existing side by side. To make it worse there are actually times when there are at least other partial additional copy cause by limitations in the ability to separate out only the part needed while building the arrays in complex APIs. From the above it is clear that if any of the large or very large APIs are complex the memory requirement can grow very fast and it actually multiples inside Yapeal in most cases.

Ok now that we understand in general the types of APIs, their memory needs, and some of the bad effects they can have in Yapeal we have a good basis on which to start look at which XML extensions from part 1 would be best match to the need of Yapeal. Since doing so will take a while (read several more paragraphs) I'll leave that to the third and probably last part in this series.

Monday, March 29, 2010

Yapeal, memory, and XML part 1

Ok as everyone that been using Yapeal for a while knows it has become a bit of a memory hog of late. Partly this has been caused by adding some APIs that can be very large but some of it has been because of an early design decision to use simpleXML. Don't get me wrong I love simpleXML and I might not have even start on the project that grow into Yapeal without it but there's some things about it that are now cause issues. Let me take a little deter into the options available to work with XML in PHP in this post so everyone has some background.

  • DOM - Been around for a long time and was design by the same people that did standard for XML, HTML, etc. It's based on seeing the XML document as a tree of elements with attributes and is considered OOP but not being made for PHP originally it object model is somewhat different. It is very powerful and there are many things it can do that none of the other options have but it's also hard to use and the tree it builds has to all fit in memory which start to become a problem with large documents. You can find DOM libraries for most programming languages on most OS platforms which makes it something everyone should learn and understand and know how to use if you are going to work with XML a lot.
  • simpleXML - It was design as an easier to understand and use option to using DOM in PHP. It is an OOP design like the DOM but a better fit with the object model use in PHP since it was developed for it. It originally only allowed the document to be read but writing has since been added and you can use it to make new documents as well. You can do almost everything with it that you can with DOM but for creating complex documents etc it's usually better to use DOM instead. You can find implementation of it in other languages now but it's mostly use in PHP. It uses an in-memory tree like DOM does because they share a common low level library. Because of this passing a document between them is easy and is sometimes used to help fill hole where simpleXML can't do stuff that DOM can or where it's easier in DOM or the other way around.
  • SAX - It's been around longer then the DOM has and instead of viewing XML as a document and making it into a tree it views it as a bunch of pieces or a stream of them and fires 'events' that have to be handled for each of them. It is not OOP but has only a function based interface to the XML. With SAX you can only go forward through the XML and the programmer have to keep track of where they are in the document and making sense out of all the 'events' is totally up to them. Most programmer find it very difficult to work with and once DOM became available most of them switched because of this. One big advantage to SAX is that it uses very little memory normally making it useful where either the target platform has very little or the document is very large and can't fit in memory. SAX can't be used to make XML documents only read them. It is usually available as a library for all platforms and programming languages.
  • XSL - Used to transform XML documents into either another type of XML document or some other type of text output. The stylesheet used to do the transform is itself a XML document. XSL is great for converting XML to HTML or a CSV file but finds little application outside of that. It's based on DOM as well with many of the same issues with memory though there have been implementations based on SAX that allow for documents larger than memory.
  • XMLReader - How I've come to think of it is like simpleXML for SAX. Instead of having to react to a bunch of 'events' with no context you iterate through a sequence of nodes (objects). Like SAX it have a small memory footprint since it only view the document in small pieces but because they are objects they help the programmer by provided more information than just something happened like in SAX. You can check if the node is empty or contents  something else, if it has attributes and their values, etc. All of these thing and others make using it and keeping track on where you are in the document as a programmer much easier.
  • XMLWriter - It is a hybrid extension that let's you use either function or OOP access to creating XML documents. It's a compliment to XMLReader and fits some where between simpleXML and the DOM in complexity.
Ok that the end of this post in the next one I'll cover the different kinds of Eve APIs and how they differ mostly in the amount of memory each uses.

Experimenting with Yapeal blog

Ok I've decide to try a blog instead of just using the eve Yapeal thread and wiki for stuff. I've never done one before or even really followed anyone else's so no telling how much or what will end up here or if I'll continue to use it but had some ideas that I thought might be better serviced using a blog vs something else.

One thing I was thinking of use it for is a place to post some of my design ideas and about any changes I'm thinking about make to Yapeal. By using a blog maybe people will be more willing to add feedback in the comments.

I might also use it to post a few tutorials about Yapeal or programming in general.

Anyway that's some of my ideas and I'm very open to any other ideas people might have about what they would like to see here.