-
Adding some type info to haXe
There are two good things about open source software; it’s open, and there’s source. So I had a hack with haXe and produced haxe-typed.exe, as a replacement for haxe.exe. This executable adds strong typing to strings, ints, floats and bools when they are local variables. It does not add types to function parameters and return types.
The results were promising:
AS3 haXe haXe typed CarDemo 7.6 ms 38.6 ms 38.9 ms SimpleLoop – for 115 ms 1218 ms 74 ms SimpleLoop – while 115 ms 1020 ms 76 ms PerfTest 1 – string manipulation, join 238 ms 383 ms 720 ms PerfTest 2 – string sort 620 ms 1284 ms 195 ms PerfTest 3 – string addition 125 ms 138 ms 130 ms PerfTest 4 – bit string manipulation 125 ms 134 ms 125 ms PerfTest 5 – floating point ops 2 ms 24 ms 2 ms PerfTest 6 – substr 2 ms 7 ms 7 ms PerfTest 7 – string indexOf 58 ms 75 ms 75 ms PerfTest 8 – Math.round/random 6 ms 38 ms 24 ms PerfTest 9 – integer addition 31 ms 343 ms 27 ms PerfTest 10 – string function calls 34 ms 103 ms 104 ms PerfTest 11 – MD5 68 ms 163 ms 171 ms PerfTest 12 – integer function calls 1 ms 14 ms 9 ms Here we see that where the processing is local, and in a tight loop, huge performance gains can be achieved. When functions calls are involved, there are no gains – sometimes it is even slower (presumably because of the conversion of specific type to generic type). Unfortunately, CarDemo makes extensive use of properties and object orientation nad therefore does not benefit at all from these optimisations.
HaXe even outperforms in a few tests – the integer loop and the sort function. My initial guess is haXe’s use of local variables for constants, rather than integer constants is what helps this loop, and I have no idea about the sort function. Perhaps there is a bug somewhere.
Implementation Use the above haxe-typed.exe at your own risk. There it was more hacking than science used to produce it. Also, there is a bunch of text-out that probably means nothing – this is basically my attempt at understanding what was going on. If you are after the source-code changes just let me know. The changes are probably not very nice to look at, or done very well – but at least I got it compiling.
The exe is not very widely tested (only on these 3 programmes !). When there is a type mismatch, the flash player throws an exception and the execution stops – often leaving you looking at a white screen. Use the “player/debug/SAFlashPlayer.exe”, provided with the flex SDK, to see these exceptions. More often than not, they are something like “Int and * can not be reconciled”. This means there is a bug in bytecode created by the modified haXe. The process of fixing the exe was to basically track these down one by one.
The code also relies on expressions having the correct type. This is generally the case with haXe, except for the case of Floats with constant ints, eg: “var f:Float = 2″. I had to change the source to “var f:Float = 2.0″ to get these to compile.
I had no reference to the AS3 assembly syntax other than what I could see in the haXe code and what I could deduce from the “abcdump.exe” of both haXe and flex outputs. The only principle for typing local varibles I can see is to initialise them with a specifically typed value. And this must be done at the “top” of the function – I’m not sure what logic is used to determine where the “top” finishes, but the following pseudo code works:
var i:Int = 0; while (something) { .... }Whereas the following does not define i as an integer:
While (something) { var i:Int = 0; .... }So the task of adding type information amounts to initialising all the local variables (including those that are limited in scope by blocks) at the top of the function.
I did not add strong types to object variables. This should be quite possible, since haXe knows the obejct types. External information suggests that this would be a very good thing to do, and I may get around to it one day. Maybe.
I also did not add types to function arguements or return values. I’m sure doing so would close the gap between haXe and AS3 native code, however the genswf9.ml code removes this type information reasonably early and it would take a bit of effort to get this working.
I found editing the ocaml code a bit of a slog. Firstly, “ml” style languages were new to me and it took me a while to grok it. It took me a while to workout how to even decompose the syntax into a series of statements (oh, it’s returning a function!). Secondly, the code is pretty much bereft of comments so it was hard to know where to start – I couldn’t really look a a bit of code in isolation whithout a good feeling for the whole code base. And Lastly, the ocaml penchant for terse variable names (eg, “let field_access ctx get f t e p =”).
This may be as far as I take these optimisations for now. The main goal was to see if the performance can be improved – and I think the answer is yes. I don’t think my changes are of high enough quality to re-submit them into the code base (in fact, I would almost certainly do a lot differently if I were to start again). It has been a great exercise and I’ve learnt a lot – including a new programming language.
-
Decompiling the differences
I have created about the simplest test I can, the “SimpleLoop” programmes. AS3 version HaXe version The difference in performace is dramatic – 10 to 20 fold. But all is not as bad as it seems…
I have been looking at the very interesting tool, abcdump.exe, as described here.
You can really get a feel for the differences between the outputs. And there is a simple explaination for the differences in file size – haXe includes a small library in the SWF. Fair enough.
I will concentrate on the “Run2″ test – both using “while” loops, rather than “for” loops. HaXe’s iterator-style for loops are slower than its while style loops – I’m hoping that this will not always be the case [Edit: actually the timings both vary and seem about the same]. (Although as a side note, I hope haXe will support both styles of for loops, since it makes porting easier amongst other reasons).
A quick look at the decompile of the Run2 function (two nested while… loops) shows the use of the command “iflt”, presumably “if less than”, which seems ideal for these loops. HaXe uses 3 statements here: coerce_a, lessthan, iffalse. I believe that haXe could easily use this optimisation, especially considering the for(i in 1…1000) style syntax. Also the increment operation. AS3 uses “inclocal_i”, where haXe uses 4 statements: getlocal, increment, coerce_a, setlocal. Again some low-hanging fruit for haXe to pick up.
Another trick is “pushshort” rather than “pushint” where size will allow, and it seems haXe integer constants are followed by “coerce_a”, whereas AS3 ones are not. AS3 used “convert_i” whereas haXe uses “coerce_a”. I’m not sure of the performace implications of this.
So, after some initial doubts, now I think haXe could get about a 10 fold increase in speed (in these very tight loops) pretty easily. HaXe (especially for flash 9) is very new, and I’m condifent these optimisation will come soon enough.
AS3 haXe function Run2():int /* disp_id 0*/ { // local_count=4 max_scope=1 // max_stack=2 code_len=60 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushbyte 0 7 setlocal2 8 pushbyte 0 10 setlocal3 11 pushbyte 0 13 setlocal1 14 pushbyte 0 16 setlocal2 17 jump L1 L2: 21 label 22 pushbyte 0 24 setlocal1 25 pushbyte 0 27 setlocal3 28 jump L3 L4: 32 label 33 getlocal1 34 getlocal3 35 add 36 convert_i 37 setlocal1 38 inclocal_i 3 L3: 40 getlocal3 41 pushshort 10000 44 iflt L4 48 inclocal_i 2 L1: 50 getlocal2 51 pushshort 1000 54 iflt L2 58 getlocal1 59 returnvalue }function Run2():* /* disp_id 0*/ { // local_count=4 max_scope=1 // max_stack=2 code_len=70 0 getlocal0 1 pushscope 2 pushbyte 0 4 coerce_a 5 setlocal1 6 pushbyte 0 8 coerce_a 9 setlocal2 10 jump L1 L2: 14 label L1: 15 getlocal2 16 pushint 1000 // 0x3e8 18 coerce_a 19 lessthan 20 iffalse L3 24 pushbyte 0 26 coerce_a 27 setlocal1 28 pushbyte 0 30 coerce_a 31 setlocal3 32 jump L4 L5: 36 label L4: 37 getlocal3 38 pushint 10000 // 0x2710 40 coerce_a 41 lessthan 42 iffalse L6 46 getlocal1 47 getlocal3 48 add 49 coerce_a 50 setlocal1 51 getlocal3 52 increment 53 coerce_a 54 setlocal3 55 jump L5 L6: 59 getlocal2 60 increment 61 coerce_a 62 setlocal2 63 jump L2 L3: 67 getlocal1 68 returnvalue 69 returnvoid }

