Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance is not particularly great compared to Lua #163

Open
vtereshkov opened this issue Jun 17, 2022 · 6 comments
Open

The performance is not particularly great compared to Lua #163

vtereshkov opened this issue Jun 17, 2022 · 6 comments
Labels
enhancement New feature or request Umka 2

Comments

@vtereshkov
Copy link
Owner

Take something like this

fn main() {
    x := 0.0
    for i := 0.0; i <= 10000000.0; i += 1 {
        x += i
        x /= 2.0
    }
    printf("%f\n", x);
}

versus this

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

First is umka, second is lua. On my computer at least, umka is about 3x slower. Which is a shame because it has potential to be much faster than lua. I usually don't pay attention to the speed of scripting languages, since it's almost always not the issue, however since I'm writing almost the entire game in umka, the speed matters, and it shows: Most of the "self" execution time is spent in vmRun.
The above isn't a perfect benchmark, so here's few more (these benchmarks mostly tackle umka's "raw" speed, e.g. crunching speed):

fn fib(x: int): int {
    if x < 2 {
        return x
    }
    return fib(x-1) + fib(x-2)
}

fn main() {
    printf("%d", fib(40))
}

20 seconds
vs

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

10 seconds
(lua about 2x faster)


type (
    StringMod = interface {
        f(s: str): str
    }

    TypeA = struct {}
    TypeB = struct {}
)

fn (t: ^TypeA) f(s: str): str {
    return s + "a"
}

fn (t: ^TypeB) f(s: str): str {
    return "b"
}

fn apply(iface: StringMod, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = iface.f(s)
    }
    return s
}

fn main() {
    a := TypeA{}
    b := TypeB{}
    apply(a, 100000)
    apply(b, 100000)
}

(interface version) 600ms

fn apply(f: fn (s: str): str, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = f(s)
    }
    return s
}

fn main() {
    apply(fn (s: str): str { return s + "a"; }, 100000)
    apply(fn (s: str): str { return "b" }, 100000)
}

(function pointer version) 900ms

function apply(interface, n)
    s = ''
    for i=0,n do 
        s = interface.f(s)
    end
    return s
end

a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }

apply(a, 100000)
apply(b, 100000)

("interface" version) 400ms.

function apply(f, n)
    s = ''
    for i=0,n do 
        s = f(s)
    end
    return s
end

apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)

(function pointer version) 566ms

There's more I can show. But I just think there's room for improvement when it comes to performance. I've tried to re-implement umka with focus of VM performance, but the project is nowhere near as complete as umka. While it's faster than Lua in general, I don't think it's fair to compare it with umka yet. (https://github.com/skejeton/elka), It's pretty much abandoned//

Either way, good luck. And if something, just ask me. I'll be glad to respond and/or help.

Originally posted by @skejeton in marekmaskarinec/tophat#30 (comment)

@vtereshkov vtereshkov added the enhancement New feature or request label Jun 17, 2022
@vtereshkov
Copy link
Owner Author

Related issue: #140

@vtereshkov vtereshkov changed the title The performance is not particularly great The performance is not particularly great compared to Lua Jun 18, 2022
vtereshkov added a commit that referenced this issue Sep 3, 2022
vtereshkov added a commit that referenced this issue Sep 3, 2022
vtereshkov added a commit that referenced this issue Sep 3, 2022
@vtereshkov
Copy link
Owner Author

vtereshkov commented Jun 17, 2023

@skejeton Your Lua example:

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

can be even 10x faster than Umka if you declare x as local x = 0, so that it can be treated as a conventional variable rather than a table item stored in an upvalue.

Sad for Umka.

@skejeton
Copy link
Contributor

Update W.R.T. performance.

Here I'm using Lua 5.4 downloaded from official website vs Umka from the current main branch compiled with build_windows_msvc.bat on VS 2022.

Please note that the performance differences may be a bit weird because last time when I ran it, I ran it on a Linux machine, not Windows. Nonetheless:

Tests

fn main() {
    x := 0.0
    for i := 0.0; i <= 10000000.0; i += 1 {
        x += i
        x /= 2.0
    }
    printf("%f\n", x);
}

= 1.121s

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

= 0.423s


fn fib(x: int): int {
    if x < 2 {
        return x
    }
    return fib(x-1) + fib(x-2)
}

fn main() {
    printf("%d\n", fib(40))
}

= 21.338s

function fib(x)
    if x < 2 then 
        return x
    else
        return fib(x-1) + fib(x-2)
    end
end

print(fib(40))

= 11.116s


type (
    StringMod = interface {
        f(s: str): str
    }

    TypeA = struct {}
    TypeB = struct {}
)

fn (t: ^TypeA) f(s: str): str {
    return s + "a"
}

fn (t: ^TypeB) f(s: str): str {
    return "b"
}

fn apply(iface: StringMod, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = iface.f(s)
    }
    return s
}

fn main() {
    a := TypeA{}
    b := TypeB{}
    apply(a, 100000)
    apply(b, 100000)
}

= 2.611s

function apply(interface, n)
    s = ''
    for i=0,n do 
        s = interface.f(s)
    end
    return s
end

a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }

apply(a, 100000)
apply(b, 100000)

= 0.487s


fn apply(f: fn (s: str): str, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = f(s)
    }
    return s
}

fn main() {
    apply(fn (s: str): str { return s + "a"; }, 100000)
    apply(fn (s: str): str { return "b" }, 100000)
}

= 2.532s

function apply(f, n)
    s = ''
    for i=0,n do 
        s = f(s)
    end
    return s
end

apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)

= 0.462s


Remarks

For some reason interfaces function significantly slowly, than before. I'm not sure whether it's because of an OS change or Umka change, or some other factor. Best way to verify is to test on Jun 17 version.

@luauser32167
Copy link
Contributor

For this script

fn main() {
  var x: real = 0.0;
  var y: real = 0.0;
  for y <= 100000000.0 {
      x += y;
      x /= 2.0;
      y += 1.0;
  }
  printf("%f\n", x);
}

umka -asm -check script.um generates the following instructions for the while loop

000000012      4                   PUSH_LOCAL real -24
000000013      4                         PUSH real 1e+08
000000014      4                       BINARY <= real 8
000000015      4                      GOTO_IF 18

000000016      8                         GOTO 37

000000017      4                         GOTO 12

000000018      5               PUSH_LOCAL_PTR -16
000000019      5                          DUP
000000020      5                        DEREF real
000000021      5                   PUSH_LOCAL real -24
000000022      5                       BINARY + real 8
000000023      5                       ASSIGN real 8
000000024      6               PUSH_LOCAL_PTR -16
000000025      6                          DUP
000000026      6                        DEREF real
000000027      6                         PUSH real 2
000000028      6                       BINARY / real 8
000000029      6                       ASSIGN real 8
000000030      7               PUSH_LOCAL_PTR -24
000000031      7                          DUP
000000032      7                        DEREF real
000000033      7                         PUSH real 1
000000034      7                       BINARY + real 8
000000035      7                       ASSIGN real 8
000000036      8                         GOTO 17

The body of the while loop has 25 instructions.

For this script

function main()
  local x = 0.0
  local y = 0.0
  while y <= 100000000.0 do
      x = x + y
      x = x / 2.0
      y = y + 1.0
  end
  print(x)
end
main()

luac -p -l script.lua > script.lua.txt generates the following instructions for the while loop

  3 [4] LOADK     2 0 ; 100000000.0
  4 [4] LE        1 2 0
  5 [4] JMP       7 ; to 13
  6 [5] ADD       0 0 1
  7 [5] MMBIN     0 1 6 ; __add
  8 [6] DIVK      0 0 1 ; 2.0
  9 [6] MMBINK    0 1 11 0  ; __div 2.0
  10  [7] ADDK      1 1 2 ; 1.0
  11  [7] MMBINK    1 2 6 0 ; __add 1.0
  12  [7] JMP       -10 ; to 3

The body of the while loop has 10 instructions.
In fact the JMP instruction that exits the while loop is only executed on the last iteration (snippet from lvm.c):

    case OP_LT: case OP_LE:
    case OP_LTI: case OP_LEI:
    case OP_GTI: case OP_GEI:
    case OP_EQ: {  /* note that 'OP_EQI'/'OP_EQK' cannot yield */
      int res = !l_isfalse(s2v(L->top - 1));
      L->top--;
#if defined(LUA_COMPAT_LT_LE)
      if (ci->callstatus & CIST_LEQ) {  /* "<=" using "<" instead? */
        ci->callstatus ^= CIST_LEQ;  /* clear mark */
        res = !res;  /* negate result */
      }
#endif
      lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_JMP);
      if (res != GETARG_k(inst))  /* condition failed? */
        ci->u.l.savedpc++;  /* skip jump instruction */   <=====
      break;
    }

Even if Umka could execute it's instructions way faster than Lua, that wouldn't help much because it still has to execute ~3x more instructions.

I think this is a problem in all implementations that execute bytecode (Umka, Puc/Lua, CPython, Perl, Quickjs, etc). One simply can not afford to write while loops in them, but instead must rely on built-in functions to the the heavy lifting/looping.

@marekmaskarinec
Copy link
Contributor

Perhaps we could benefit from adding more complex instructions? For example a variant of binary, which works on the top two elements of the stack. Would that be worth it?

@vtereshkov
Copy link
Owner Author

@luauser32167 You're right, the Umka virtual machine is not optimized enough. Please notice, however, that a one-to-one comparison between Umka and Lua 5 bytecode is hardly possible, since Umka is stack-based and Lua 5 is register-based.

@marekmaskarinec I sometimes think of having a single complex instruction for a for loop header in its most widely used form: for i := 0; i < expr; i++. AFAIK, even Lua 4 featured it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Umka 2
Projects
None yet
Development

No branches or pull requests

4 participants