Experimenting with Web Game Development
RSS icon Bullet (black)
  • Decompiling the differences

    I have created about the simplest test I can, the “SimpleLoop” programmes. AS3 version HaXe version The difference in performace is dramatic – 10 to 20 fold. But all is not as bad as it seems…

    I have been looking at the very interesting tool, abcdump.exe, as described here.

    You can really get a feel for the differences between the outputs. And there is a simple explaination for the differences in file size – haXe includes a small library in the SWF. Fair enough.

    I will concentrate on the “Run2″ test – both using “while” loops, rather than “for” loops. HaXe’s iterator-style for loops are slower than its while style loops – I’m hoping that this will not always be the case [Edit: actually the timings both vary and seem about the same]. (Although as a side note, I hope haXe will support both styles of for loops, since it makes porting easier amongst other reasons).

    A quick look at the decompile of the Run2 function (two nested while… loops) shows the use of the command “iflt”, presumably “if less than”, which seems ideal for these loops. HaXe uses 3 statements here: coerce_a, lessthan, iffalse. I believe that haXe could easily use this optimisation, especially considering the for(i in 1…1000) style syntax. Also the increment operation. AS3 uses “inclocal_i”, where haXe uses 4 statements: getlocal, increment, coerce_a, setlocal. Again some low-hanging fruit for haXe to pick up.

    Another trick is “pushshort” rather than “pushint” where size will allow, and it seems haXe integer constants are followed by “coerce_a”, whereas AS3 ones are not. AS3 used “convert_i” whereas haXe uses “coerce_a”. I’m not sure of the performace implications of this.

    So, after some initial doubts, now I think haXe could get about a 10 fold increase in speed (in these very tight loops) pretty easily. HaXe (especially for flash 9) is very new, and I’m condifent these optimisation will come soon enough.

    AS3 haXe

    function Run2():int /* disp_id 0*/
    {
    // local_count=4 max_scope=1
    // max_stack=2 code_len=60
    0     getlocal0         
    1     pushscope         
    2     pushbyte          0
    4     setlocal1         
    5     pushbyte          0
    7     setlocal2         
    8     pushbyte          0
    10    setlocal3         
    11    pushbyte          0
    13    setlocal1         
    14    pushbyte          0
    16    setlocal2         
    17    jump              L1
    
    
    L2: 
    21    label             
    22    pushbyte          0
    24    setlocal1         
    25    pushbyte          0
    27    setlocal3         
    28    jump              L3
    
    
    L4: 
    32    label             
    33    getlocal1         
    34    getlocal3         
    35    add               
    36    convert_i         
    37    setlocal1         
    38    inclocal_i        3
    
    L3: 
    40    getlocal3         
    41    pushshort         10000
    44    iflt              L4
    
    48    inclocal_i        2
    
    L1: 
    50    getlocal2         
    51    pushshort         1000
    54    iflt              L2
    
    58    getlocal1         
    59    returnvalue       
    }
    
    

        function Run2():*   /* disp_id 0*/
    {
    // local_count=4 max_scope=1
    // max_stack=2 code_len=70
    0     getlocal0         
    1     pushscope         
    2     pushbyte          0
    4     coerce_a          
    5     setlocal1         
    6     pushbyte          0
    8     coerce_a          
    9     setlocal2         
    10    jump              L1
    
    
    L2: 
    14    label             
    
    L1: 
    15    getlocal2         
    16    pushint           1000    // 0x3e8
    18    coerce_a          
    19    lessthan          
    20    iffalse           L3
    
    24    pushbyte          0
    26    coerce_a          
    27    setlocal1         
    28    pushbyte          0
    30    coerce_a          
    31    setlocal3         
    32    jump              L4
    
    
    L5: 
    36    label             
    
    L4: 
    37    getlocal3         
    38    pushint           10000   // 0x2710
    40    coerce_a          
    41    lessthan          
    42    iffalse           L6
    
    46    getlocal1         
    47    getlocal3         
    48    add               
    49    coerce_a          
    50    setlocal1         
    51    getlocal3         
    52    increment         
    53    coerce_a          
    54    setlocal3         
    55    jump              L5
    
    L6: 
    59    getlocal2         
    60    increment         
    61    coerce_a          
    62    setlocal2         
    63    jump              L2
    
    L3: 
    67    getlocal1         
    68    returnvalue       
    69    returnvoid        
    }