Decompiling the differences

I have created about the simplest test I can, the “SimpleLoop” programmes.
[AS3 version](
[HaXe version](
The difference in performace is dramatic – 10 to 20 fold. But all is not as bad as it seems…

I have been looking at the *very* interesting tool, abcdump.exe,
[as described here](

You can really get a feel for the differences between the outputs. And there is a simple explaination for the differences in file size – haXe includes a small library in the SWF. Fair enough.

I will concentrate on the “Run2” test – both using “while” loops, rather than “for” loops. HaXe’s iterator-style for loops are slower than its while style loops – I’m hoping that this will not always be the case _\[Edit: actually the timings both vary and seem about the same\]_. (Although as a side note, I hope haXe will support *both* styles of for loops, since it makes porting easier amongst other reasons).

A quick look at the decompile of the Run2 function (two nested while… loops) shows the use of the command “iflt”, presumably “if less than”, which seems ideal for these loops. HaXe uses 3 statements here: coerce\_a, lessthan, iffalse. I believe that haXe could easily use this optimisation, especially considering the for(i in 1…1000) style syntax. Also the increment operation. AS3 uses “inclocal\_i”, where haXe uses 4 statements: getlocal, increment, coerce\_a, setlocal. Again some low-hanging fruit for haXe to pick up.

Another trick is “pushshort” rather than “pushint” where size will allow, and it seems haXe integer constants are followed by “coerce_a”, whereas AS3 ones are not. AS3 used “convert\_i” whereas haXe uses “coerce\_a”. I’m not sure of the performace implications of this.

So, after some initial doubts, now I think haXe could get about a 10 fold increase in speed (in these very tight loops) pretty easily. HaXe (especially for flash 9) is very new, and I’m condifent these optimisation will come soon enough.

AS3 haXe
function Run2():int	/* disp_id 0*/
// local_count=4 max_scope=1
// max_stack=2 code_len=60
0     getlocal0     	
1     pushscope     	
2     pushbyte      	0
4     setlocal1     	
5     pushbyte      	0
7     setlocal2     	
8     pushbyte      	0
10    setlocal3     	
11    pushbyte      	0
13    setlocal1     	
14    pushbyte      	0
16    setlocal2     	
17    jump          	L1

21    label         	
22    pushbyte      	0
24    setlocal1     	
25    pushbyte      	0
27    setlocal3     	
28    jump          	L3

32    label         	
33    getlocal1     	
34    getlocal3     	
35    add           	
36    convert_i     	
37    setlocal1     	
38    inclocal_i    	3

40    getlocal3     	
41    pushshort     	10000
44    iflt          	L4

48    inclocal_i    	2

50    getlocal2     	
51    pushshort     	1000
54    iflt          	L2

58    getlocal1     	
59    returnvalue   	

    function Run2():*	/* disp_id 0*/
// local_count=4 max_scope=1
// max_stack=2 code_len=70
0     getlocal0     	
1     pushscope     	
2     pushbyte      	0
4     coerce_a      	
5     setlocal1     	
6     pushbyte      	0
8     coerce_a      	
9     setlocal2     	
10    jump          	L1

14    label         	

15    getlocal2     	
16    pushint       	1000	// 0x3e8
18    coerce_a      	
19    lessthan      	
20    iffalse       	L3

24    pushbyte      	0
26    coerce_a      	
27    setlocal1     	
28    pushbyte      	0
30    coerce_a      	
31    setlocal3     	
32    jump          	L4

36    label         	

37    getlocal3     	
38    pushint       	10000	// 0x2710
40    coerce_a      	
41    lessthan      	
42    iffalse       	L6

46    getlocal1     	
47    getlocal3     	
48    add           	
49    coerce_a      	
50    setlocal1     	
51    getlocal3     	
52    increment     	
53    coerce_a      	
54    setlocal3     	
55    jump          	L5

59    getlocal2     	
60    increment     	
61    coerce_a      	
62    setlocal2     	
63    jump          	L2

67    getlocal1     	
68    returnvalue   	
69    returnvoid    	

2 Responses to Decompiling the differences

  1. Lea Hayes says:

    Hey there!

    It has been quite a while since you wrote this post, but do you think that the performance of basic for (i in 0…1000) style loops has been optimised in Haxe 3?

    Or do integer iterators still get instantiated which inevitably get garbage collected? I am concerned that simple loops will lead to lots of extra GC in my game.

    Many thanks

  2. Lea Hayes says:

    Never mind, it seems that the iterator instances are quite efficiently inlined 🙂

    Integer iteration sequences are more efficient, but to my surprise even object iteration is pretty efficient!!!topic/haxelang/HDHQBcMNdp8

