Experimenting with Web Game Development
RSS icon Bullet (black)
  • Switched to IMMIX for Internal Garbage Collection

    I did a little bit of profiling on the iPhone and found a bit too much time was spent doing garbage collection. The hxcpp runtime has 2 modes – “Boehm GC with explicit statics” and “internal”. The former is from a standard and robust code base, with the latter uses built in code with explicit marking. I added the second mode because Boehm GC was just too slow on the iPhone – not sure why because it is pretty good on the other platforms (maybe I missed a configuration option).

    The internal GC has some restrictions that make it mainly suitable for games. These are: the collection must be triggered explicitly, since no stack searching is done, which is most easily done once per frame. And it is not thread safe, which can be worked around. Within these confines, many different schemes can be tried. My first attempt could probably be termed “Naive Mark and Sweep”, and used free lists. On Windows/Mac this underperfromed Boehm GC, but on the iPhone, worked better.

    The current scheme is now “Simplified IMMIX“. It is simplified because it is single threaded, and I have not implemented overflow allocation, defragmentation (although there are hooks in there for moving) or any generational stuff. I think overflow allocation should be easy enough, and defrag should not be too hard in some form or other. The insertion of write barriers for generational control may also be straight-forward using the “operator =”. I may also change the code generation to separate stack variables (local, function args) from member variables since in the current scheme, stack variables never form roots, and therefore would not need to use write-barriers.

    Anyhow, on the “Physaxe” test, which creates lots of small list objects per frame, the Naive GC got about 51fps, Boehm GC got about 65fps and IMMIX got about 69fps – so a bit of a win there. For this test, I triggered all collections exactly once per frame. The difference between Naive and IMMIX is significant, and this perfromance gain also translates to the iPhone, which is good news.

    Since the internal scheme is precise, I feel it should be able to outperform Boehm GC by a bit more, and maybe the extra could come from a generational system. The code is actually not that complex (1 cpp file, 1 header file) so any budding GC researchers may want to see what they can do.

    Currently, the internal GC is default only for the iPhone, but you can try it on other platforms by changing the #define in hxGCInternal.h. The reason for this is the restrictions mentioned above – the easiet way to conform to these restrictions is to enable the “Collect Every Frame” in neash.Lib. To remove these restrictions, I will need to find some way of stopping the world (safe points?) and some way of capturing the stack (code mods to allow objects to push themselves on a shadow stack?), both of which are very doable, although I’m not sure on the effect on performance.

  • Haxe on the iPhone – For Real!

    iphone3 To progress this project a bit further, I needed a real device – so I convinced the little woman that an iPod touch would be a good thing to have around. She seems to have taken to it, so now I’m thinking I may need one each :) .

    After much phaffing about, I’ve finally managed to get stuff running on the actual device. I had to comment out quite a bit of NME, since I only used the base SDL, not all the extras. Boehm GC was also a bit tricky because I didn’t really know what I was doing, but I brought in some bits from the mono project and then disabled USE_MUNMAP because it caused it to crash. In the end, it seems to work – no crash, but then I may not have been running it long enough. I will have to try some memory thrashing later.

    One thing I found with Xcode is that if you ever change the project name/AppID settings then you really need to clean the project, exit Xcode and get back in. But the hardest part was working out where to go the get the developer certificate! I guess I’m a bit thick, or missed the meno, but it took me ages to get to the web form to create a certificate.

    So the big question is perfromance. In this demo, initally, it runs at about 2.5 frames a second (I don’t have a fps counter yet), but slows a bit later when things spread out. But this is using the high-quality, anti-aliasing software renderer. Next job is to hook up the OpenGLES renderer, then I’ll really know where I stand. But overall, pretty positive result I think.

  • HXCPP 0.2 – Huge performance increase.

    I have switched hxcpp over from using ref-counting to using Boehm garbage collection. I have also added some additional perfromance improvements, such as integer-index field names to make interaction with neko more efficient.

    The overall result is that for the Physaxe demo, the frame rate went from 24 fps to 82 fps (in opengl mode). The swf file runs at about 35 fps, and neko at about 8 fps. This is about what I was hoping for from the first round, but I got there in the end.

    You can download the updated files here.

  • Boehm GC, virtual inheritance and finalizers.

    I’m trying to get a speedup for the cpp backend for haxe by using garbage collection. Initial results are very promising – potentially about twice as fast. Howerver I spent a good few hours getting the the bottom of a little problem. Boehm garbage collection is a very impressive piece of work – it has all sorts of magic that does magical things, such as deal with virtual inheritance. This was a bit of a surprise because you do not always get “real” pointers, when you store an object pointer, you get one with an offset. However it seemed to work. Until I added finalizers to the external draw objectes used by the renderer. Apparently, you can only add finalizers to the “real” pointers (ie, those returned from “GC_MALLOC” et al), rather than a pointer to the same object related by virtual inheritance. The symptom was that the object gets finalized in the first “gc_collect”, even though it was still “used” as far as I was concerned. I guess this is not too surpising, and the fix was pretty easy, but the fact that everything else worked so well lulled be into not suspecting this initially.