Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packing Rhai into 128KB under no-std on STM32 #791

Open
liunianzmj opened this issue Dec 4, 2023 · 21 comments
Open

Packing Rhai into 128KB under no-std on STM32 #791

liunianzmj opened this issue Dec 4, 2023 · 21 comments

Comments

@liunianzmj
Copy link

image
error:
image

@schungx
Copy link
Collaborator

schungx commented Dec 4, 2023

That is strange as the exact same command line passed CI... The once_cell crate didn't need to specify alloc before...

I'll look into it. Perhaps a new version of once_cell.

In the meantime you can specially include once_cell/alloc in your dependencies.

@liunianzmj
Copy link
Author

Thank you very much, this problem has been solved, but now there is a new problem, I want to run on the 128kb memory stm32 microcontroller can be implemented? So far the following error has occurred
1701685268782

@schungx
Copy link
Collaborator

schungx commented Dec 4, 2023

The standard Rhai with all libraries is too large to fit inside 128KB.

You'd want to use a minimal build. Check this out: https://rhai.rs/book/start/builds/minimal.html

@liunianzmj
Copy link
Author

1701686118242
It still doesn't work

@schungx
Copy link
Collaborator

schungx commented Dec 4, 2023

Well, did you use Engine::new_raw?

Otherwise you'd be pulling in the entire standard library which will be very large.

@schungx
Copy link
Collaborator

schungx commented Dec 4, 2023

I would also say that, if you want to keep arrays and maps, it would probably be difficult to fit it inside 128KB, possibly together with your own code as well...

@liunianzmj
Copy link
Author

I used Engine::new_raw, the feature is also used, but it still does not work, what is the minimum memory required?
image
image

@liunianzmj
Copy link
Author

How do I get rhai to use this memory?
image

@schungx
Copy link
Collaborator

schungx commented Dec 5, 2023

Hhhmmm... No idea. I'm not an embedded programmer myself...

Maybe you can search on the net... Probably you'll need a custom allocator that can allocate from specific locations.

BTW l think I'd also need to turn off memory-heavy stuff like strings interning and function resolution caching... Maybe I'll add that into Rhai.

@liunianzmj
Copy link
Author

liunianzmj commented Dec 6, 2023

Thank you very much. I have solved this problem, but I found that about 160KB of FLASH was added to the packaged program after adding rhai. How can I solve this problem?
image

@schungx
Copy link
Collaborator

schungx commented Dec 6, 2023

about 160KB of FLASH was added to the packaged program

As I mentioned, it is quite difficult to fit inside 64K if you don't take out language features. I remember some user successfully packed it under 64K, but he had to disable arrays and objects support (no_index and no_object). However, that was quite a long time ago for a very old version, and much more code has been added since then.

If you want to pack under 128K, then I think it is possible...

Are you sure you're making a release build with optimization for size?

Also have you strip your binary? Unix symbol tables can be quite huge.

@schungx schungx changed the title An error occurs when you use a reference library in environment nostd Packing Rhai into 128KB under no-std on STM32 Dec 6, 2023
@liunianzmj
Copy link
Author

liunianzmj commented Dec 6, 2023

Yes, I'm making an optimized size version now. How to reduce the size? Currently I have disabled no_index, no_object, but it still needs 143KB.
1701836567417

@schungx
Copy link
Collaborator

schungx commented Dec 6, 2023

Have you stripped the binary?

@liunianzmj
Copy link
Author

It's stripped off but it's only 1KB too small
image

@schungx
Copy link
Collaborator

schungx commented Dec 6, 2023

Try to do a strip my_bin.obj etc. to see if it gets smaller...

@schungx
Copy link
Collaborator

schungx commented Dec 10, 2023

Just curious how the status is.

Did you succeed to squeeze it down further?

@chaosprint
Copy link

another curious user here 😄

@cbiffle
Copy link

cbiffle commented May 5, 2024

Since it seems like folks are still curious, here are my initial results on a Cortex-M7 (substantially similar to the M4). I'm not sure what the original poster was trying to do and whether this will be at all helpful to them, since they're asking about code size and then highlighting an SRAM, which is not typically where code lives on these parts.

Try to do a strip my_bin.obj etc. to see if it gets smaller...

FWIW stripping the binary doesn't help it go into flash, in general, because these systems don't get the debug symbols in flash. Only the text and data initialization image is actually written.

All measurements are taken at opt-level = "z" and lto = true in release, on Rhai 1.18, using Engine::new_raw with no additional packages loaded except where noted. Rust 1.77.2.

Default no_std build: 421,692 bytes

Turning on size reduction features individually:

  • no_optimize: 421444
  • f32_float: 415968
  • only_i64: 412380
  • no_custom_syntax: 412188
  • unchecked: 410024
  • only_i32: 407428
  • no_position: 399808
  • no_index: 397480
  • no_module: 387848
  • no_closure: 384116
  • no_object: 369972
  • no_float: 357920
  • no_function: 337168

Turning them all on gets the build down to 148,876, though it produces a language that doesn't meet my needs (no functions, for instance).

Turning back off only no_function (I like functions), no_index (I wanted arrays), and only_i32 (arrays of bytes, specifically): 240,008.

Adding CorePackage to that: 432,756
Adding StandardPackage instead: 693,800

Both get significantly smaller if you enable only_i32 but, for my use case, I kinda need u8.

I looked into what was responsible for the significant size increase when adding CorePackage to the image, and it turns out that the various packages' init routines are generating really big code. A typical one consists of repeated sequences like the one below, over and over, always ending in set_into_module_raw:

 8039a38:       f642 3003       movw    r0, #11011      @ 0x2b03
 8039a3c:       e9cd b603       strd    fp, r6, [sp, #12]
 8039a40:       f8ad 0088       strh.w  r0, [sp, #136]  @ 0x88
 8039a44:       f240 2002       movw    r0, #514        @ 0x202
 8039a48:       f8ad 00a0       strh.w  r0, [sp, #160]  @ 0xa0
 8039a4c:       f44f 7080       mov.w   r0, #256        @ 0x100
 8039a50:       46da            mov     sl, fp
 8039a52:       f244 0b20       movw    fp, #16416      @ 0x4020
 8039a56:       e9cd 440a       strd    r4, r4, [sp, #40]       @ 0x28
 8039a5a:       f2c2 0b00       movt    fp, #8192       @ 0x2000
 8039a5e:       f8cd 408a       str.w   r4, [sp, #138]  @ 0x8a
 8039a62:       2108            movs    r1, #8
 8039a64:       f8cd 408e       str.w   r4, [sp, #142]  @ 0x8e
 8039a68:       f8ad 4092       strh.w  r4, [sp, #146]  @ 0x92
 8039a6c:       f8ad 0098       strh.w  r0, [sp, #152]  @ 0x98
 8039a70:       9425            str     r4, [sp, #148]  @ 0x94
 8039a72:       9420            str     r4, [sp, #128]  @ 0x80
 8039a74:       464c            mov     r4, r9
 8039a76:       f8cd 8014       str.w   r8, [sp, #20]
 8039a7a:       f8cd 9008       str.w   r9, [sp, #8]
 8039a7e:       f89b 0000       ldrb.w  r0, [fp]
 8039a82:       2004            movs    r0, #4
 8039a84:       f7cd fd53       bl      800752e <<embedded_alloc::Heap as core::alloc::global::GlobalAlloc>::alloc>
 8039a88:       2800            cmp     r0, #0
 8039a8a:       f005 8677       beq.w   803f77c <<rhai::packages::arithmetic::ArithmeticPackage as rhai::packages::Package>::init+0x60b2>
 8039a8e:       f643 2134       movw    r1, #14900      @ 0x3a34
 8039a92:       e9c0 5500       strd    r5, r5, [r0]
 8039a96:       f6c0 0106       movt    r1, #2054       @ 0x806
 8039a9a:       ad0a            add     r5, sp, #40     @ 0x28
 8039a9c:       e9cd 012b       strd    r0, r1, [sp, #172]      @ 0xac
 8039aa0:       aa02            add     r2, sp, #8
 8039aa2:       9901            ldr     r1, [sp, #4]
 8039aa4:       ab2a            add     r3, sp, #168    @ 0xa8
 8039aa6:       f04f 0803       mov.w   r8, #3
 8039aaa:       4628            mov     r0, r5
 8039aac:       46cb            mov     fp, r9
 8039aae:       f88d 80a8       strb.w  r8, [sp, #168]  @ 0xa8
 8039ab2:       f005 fe67       bl      803f784 <rhai::module::FuncRegistration::set_into_module_raw>

I haven't looked at the code generator, but this sort of thing is pretty common in programs that haven't been written with text size in mind -- my guess is that you've got a code generator producing these init routines as a long series of unique Rust statements, instead of a compact routine driven by a table (which tends to be much smaller), or setting up the datastructures entirely at compile time so they can go into ROM (which tends to be dramatically smaller).

All in all, Package::init routines like this account for 105,782 bytes added when including CorePackage.

So, the system as it stands can fit into larger STM32 parts (I'm building these tests for the STM32H753 with 1MiB of flash) but the codebase doesn't appear to have been written with size (or startup time) in mind. (Which is fine! You haven't claimed otherwise. But I wanted to post these numbers for the next person who tries to fit this into a small microcontroller.)

In case you're curious, here's the test program. I derived it from the no_std example.

#![no_std]
#![no_main]

extern crate alloc;

use core::{mem::MaybeUninit, ptr::addr_of};

use panic_halt as _;
use rhai::{packages::Package, Engine, INT};
use stm32_metapac as _;
use embedded_alloc::Heap;

#[global_allocator]
static HEAP: Heap = Heap::empty();

#[cortex_m_rt::entry]
fn main() -> ! {
    {
        const HEAP_SIZE: usize = 16384;
        static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];
        unsafe {
            HEAP.init(addr_of!(HEAP_MEM) as usize, HEAP_SIZE);
        }
    }
    let mut engine = Engine::new_raw();

    // this bit gets commented out to test size without Core
    let std = rhai::packages::CorePackage::new();
    std.register_into_engine(&mut engine);

    loop {

        // Evaluate a simple expression: 40 + 2
        let _ = engine.eval_expression::<INT>("40 + 2").unwrap() as isize;
        cortex_m::asm::nop();
    }
}

@schungx
Copy link
Collaborator

schungx commented May 6, 2024

Both get significantly smaller if you enable only_i32 but, for my use case, I kinda need u8.

That's a very interesting observation!

my guess is that you've got a code generator producing these init routines as a long series of unique Rust statements, instead of a compact routine driven by a table (which tends to be much smaller),

You're absolutely correct. That's what the code generator does: generates a bunch of individual function registration calls. And yes, they most probably can go into a table instead...

I'll experiment with that and report back.

@cbiffle
Copy link

cbiffle commented May 6, 2024

That's a very interesting observation!

Yeah, I'm specifically looking at scripting options for doing embedded handling of arrays of bytes. I felt like no_index and only_i32 would make that difficult -- but perhaps I don't totally understand the features (or Rhai, which I freely admit).

@schungx
Copy link
Collaborator

schungx commented May 7, 2024

There is a builtin data type called Blob which is an array of bytes.

I write device drivers with Rhai so I added that into Rhai many versions ago. Give that a spin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants