Monday, March 29, 2010

Yapeal, memory, and XML part 1

Ok as everyone that been using Yapeal for a while knows it has become a bit of a memory hog of late. Partly this has been caused by adding some APIs that can be very large but some of it has been because of an early design decision to use simpleXML. Don't get me wrong I love simpleXML and I might not have even start on the project that grow into Yapeal without it but there's some things about it that are now cause issues. Let me take a little deter into the options available to work with XML in PHP in this post so everyone has some background.

  • DOM - Been around for a long time and was design by the same people that did standard for XML, HTML, etc. It's based on seeing the XML document as a tree of elements with attributes and is considered OOP but not being made for PHP originally it object model is somewhat different. It is very powerful and there are many things it can do that none of the other options have but it's also hard to use and the tree it builds has to all fit in memory which start to become a problem with large documents. You can find DOM libraries for most programming languages on most OS platforms which makes it something everyone should learn and understand and know how to use if you are going to work with XML a lot.
  • simpleXML - It was design as an easier to understand and use option to using DOM in PHP. It is an OOP design like the DOM but a better fit with the object model use in PHP since it was developed for it. It originally only allowed the document to be read but writing has since been added and you can use it to make new documents as well. You can do almost everything with it that you can with DOM but for creating complex documents etc it's usually better to use DOM instead. You can find implementation of it in other languages now but it's mostly use in PHP. It uses an in-memory tree like DOM does because they share a common low level library. Because of this passing a document between them is easy and is sometimes used to help fill hole where simpleXML can't do stuff that DOM can or where it's easier in DOM or the other way around.
  • SAX - It's been around longer then the DOM has and instead of viewing XML as a document and making it into a tree it views it as a bunch of pieces or a stream of them and fires 'events' that have to be handled for each of them. It is not OOP but has only a function based interface to the XML. With SAX you can only go forward through the XML and the programmer have to keep track of where they are in the document and making sense out of all the 'events' is totally up to them. Most programmer find it very difficult to work with and once DOM became available most of them switched because of this. One big advantage to SAX is that it uses very little memory normally making it useful where either the target platform has very little or the document is very large and can't fit in memory. SAX can't be used to make XML documents only read them. It is usually available as a library for all platforms and programming languages.
  • XSL - Used to transform XML documents into either another type of XML document or some other type of text output. The stylesheet used to do the transform is itself a XML document. XSL is great for converting XML to HTML or a CSV file but finds little application outside of that. It's based on DOM as well with many of the same issues with memory though there have been implementations based on SAX that allow for documents larger than memory.
  • XMLReader - How I've come to think of it is like simpleXML for SAX. Instead of having to react to a bunch of 'events' with no context you iterate through a sequence of nodes (objects). Like SAX it have a small memory footprint since it only view the document in small pieces but because they are objects they help the programmer by provided more information than just something happened like in SAX. You can check if the node is empty or contents  something else, if it has attributes and their values, etc. All of these thing and others make using it and keeping track on where you are in the document as a programmer much easier.
  • XMLWriter - It is a hybrid extension that let's you use either function or OOP access to creating XML documents. It's a compliment to XMLReader and fits some where between simpleXML and the DOM in complexity.
Ok that the end of this post in the next one I'll cover the different kinds of Eve APIs and how they differ mostly in the amount of memory each uses.

No comments:

Post a Comment