Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomly occurring libpng bug in png_read_IDAT_data #21930

Closed
curiousdannii opened this issue May 13, 2024 · 12 comments · Fixed by #21995
Closed

Randomly occurring libpng bug in png_read_IDAT_data #21930

curiousdannii opened this issue May 13, 2024 · 12 comments · Fixed by #21995

Comments

@curiousdannii
Copy link
Contributor

curiousdannii commented May 13, 2024

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.59 (0e4c5994eb5b8defd38367a416d0703fd506ad81)
clang version 19.0.0git (https:/github.com/llvm/llvm-project df762a1643bb5b0b3c907611d118c82d4b68a39d)

I've found a randomly occurring bug. I uploaded a good build to https://curiousdannii.github.io/infocom-frotz/sfrotz.html?story=shogun, which successfully shows the title graphic. But most of the time when I build my app, I get errors in png_read_IDAT_data. To be clear, the randomness is at compilation time - once compiled it either always works or never does.

My code is at https://github.com/curiousdannii/infocom-frotz
Run ./build.sh to build it, or rm frotz/sfrotz.js && ./build.sh to relink it.
It actually appears that the randomness is happening at the linking stage, as simply running the linker like that is enough to switch between working and broken builds.

I was at my wit's end, even trying to rebuild libpng from source, unless I noticed that just occasionally it does work. Then I was at least a little relieved that if the compiler/linker isn't being consistent it's probably not my fault.

I'll attach two zips of the working and broken output, in case inspecting the .wasm would help.

dist-broken.zip
dist-working.zip

@sbc100
Copy link
Collaborator

sbc100 commented May 13, 2024

O, that sounds like a nasty issue. Do you have any idea what the error are? Are they related to memory layout? Or static data?
Miscompilation? What kind of errors are they?

Can you share the full set of link flags here in the bug?

@curiousdannii
Copy link
Contributor Author

curiousdannii commented May 15, 2024

The errors depend on what optimisation level I try. I'll just show the libpng parts of the stack trace..

O0: completely crashes/exits:

program exited (with status: 0), but keepRuntimeAlive() is set (counter=0) due to an async operation, so halting
execution but not exiting the runtime or preventing further async execution (you can use emscripten_force_exit,
if you want to force a true shutdown)
	...
	exitJS	@	sfrotz.js:4399
	imports.<computed>	@	sfrotz.js:9144
	$inflate.1	@	sfrotz.wasm:0x30421b
	$png_read_IDAT_data	@	sfrotz.wasm:0x32dc3f
	$png_read_row	@	sfrotz.wasm:0x3313ca
	$png_read_image	@	sfrotz.wasm:0x332372
	...

O1+: prints an error to console, and continues without displaying images:

sfrotz.js:1073 libpng error: IDAT: incorrect data check
	...
	$png_default_error	@	sfrotz.wasm:0x19644f
	$png_error	@	sfrotz.wasm:0x196293
	$png_chunk_error	@	sfrotz.wasm:0x1969aa
	$png_read_IDAT_data	@	sfrotz.wasm:0x1c7d5b
	$png_read_row	@	sfrotz.wasm:0x1cb1dd
	$png_read_image	@	sfrotz.wasm:0x1cc184
	...

The link command is (I don't know if there's a more detailed one than what is just shown by make):

/emsdk/upstream/emscripten/emcc src/common/frotz_common.a src/sdl/frotz_sdl.a src/blorb/blorblib.a 
src/common/frotz_common.a -o sfrotz.js --use-port=freetype --use-port=libjpeg --use-port=libpng --use-port=sdl2 
--use-port=sdl2_mixer --use-port=zlib --pre-js ../src/preamble.js --profiling-funcs -sALLOW_MEMORY_GROWTH 
-sASYNCIFY -sENVIRONMENT=web -lm

Last time I saw randomness in Emscripten output it was caused by a Python set(). I wonder if that sort of thing could be responsible again?

@curiousdannii
Copy link
Contributor Author

This isn't going to seem real...

I'm working more on my project, and then it stopped successfully building, ever. It went from building successfully maybe 20% of the time, to seemingly never. I've tried probably like 50 times. I roll back to my last commit, and it works. Eventually I've isolated the change that seems to stop it working: I was adding a new JS library: --js-library ../src/library.js. When I changed that to --js-library ../src/frotz-library.js it started building successfully again (well 20% of the time.)

@sbc100
Copy link
Collaborator

sbc100 commented May 17, 2024

Do you mean that changing just the name of the library changes the behaviour?

And when you say "building successfully" I assume you mean that that program runs successfully? Or are you saying there are times when that actual build fails?

If you can find the source the non-determinism that would be great. One thing to do would be to link with EMCC_DEBUG=1 and then check the checksums off all the intermediate files in /tmp/emscriptem_tmp.. figuring out which files differ between builds might give you a clue. For example emcc-00-base.wasm is the direct output of wasm-ld.. it would be good to know if that differs.

@curiousdannii
Copy link
Contributor Author

curiousdannii commented May 18, 2024

Yeah, just changing the name of my JS library seemed to change the behaviour. I know it's possible nothing actually changed and I just got really unlucky, but it sure seemed like it never built successfully when it was called "library.js", and it went back to the normal rate when it was called "frotz-library.js".

Yes, I mean that it runs successfully vs not running. It never has errors during the build.

Ah, EMCC_DEBUG=1 was the thing I knew existed but forgot about!

Possibly relevant things from a successful build:

ports:DEBUG: including port: freetype
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/freetype)
ports:DEBUG: including port: sdl2
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/sdl2)
ports:DEBUG: including port: ogg
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/ogg)
ports:DEBUG: including port: libjpeg
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/libjpeg)
ports:DEBUG: including port: zlib
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/zlib)
ports:DEBUG: including port: vorbis
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/vorbis)
ports:DEBUG: including port: sdl2_mixer
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/sdl2_mixer)
ports:DEBUG: including port: libpng
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/libpng)

link:DEBUG: linking: ['src/common/frotz_common.a', 'src/sdl/frotz_sdl.a', 'src/blorb/blorblib.a', 
'src/common/frotz_common.a', '-L/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libpng.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libSDL2_mixer_ogg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libvorbis.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libz.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libjpeg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libogg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libSDL2.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libfreetype.a',
'-lGL-getprocaddr', '-lal', '-lhtml5', '-lstubs-debug', '-lnoexit', '-lc-debug', '-ldlmalloc',
'-lcompiler_rt', '-lc++-noexcept', '-lc++abi-debug-noexcept', '-lsockets']

From an unsuccessful build:

ports:DEBUG: including port: zlib
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/zlib)
ports:DEBUG: including port: libpng
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/libpng)
ports:DEBUG: including port: freetype
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/freetype)
ports:DEBUG: including port: sdl2
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/sdl2)
ports:DEBUG: including port: ogg
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/ogg)
ports:DEBUG: including port: libjpeg
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/libjpeg)
ports:DEBUG: including port: vorbis
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/vorbis)
ports:DEBUG: including port: sdl2_mixer
ports:DEBUG:     (at /emsdk/upstream/emscripten/cache/ports/sdl2_mixer)

link:DEBUG: linking: ['src/common/frotz_common.a', 'src/sdl/frotz_sdl.a', 'src/blorb/blorblib.a', 
'src/common/frotz_common.a', '-L/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libSDL2_mixer_ogg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libvorbis.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libjpeg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libogg.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libSDL2.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libfreetype.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libpng.a', 
'/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libz.a',
'-lGL-getprocaddr', '-lal', '-lhtml5', '-lstubs-debug', '-lnoexit', '-lc-debug', '-ldlmalloc',
'-lcompiler_rt', '-lc++-noexcept', '-lc++abi-debug-noexcept', '-lsockets']

So it's definitely got things in a different order.

@curiousdannii
Copy link
Contributor Author

curiousdannii commented May 18, 2024

sha256sums from a working build:

5388af09f4cda627fd9f0f1543bbcf7ce460e5f6468552f23fd59f64aba5568a  /tmp/emscripten_temp/tmpq1530m4dlibemscripten_js_symbols.so
dd9b355408f7d9b1b773ac79b28fc749542ffa893932e69fafe5d9e85d3e743d  /tmp/emscripten_temp/emcc-00-base.wasm
02911def5ce22ab0be4a312a9d0c2e51b26a5c218b912104b8b6f81ef81f144e  /tmp/emscripten_temp/tmpqx1ttoe8.json
2ca59e48093ef69ce094d20ed16c21143db992562f2eb09ee2cc9ddf5d041ef4  /tmp/emscripten_temp/tmp8b6n_aw0.json
8bb9819ebd1b955c680111860ad23b4a7e3dcf3d36d066d0841c6ab634e6d857  /tmp/emscripten_temp/emcc-05-byn.wasm
be03a181b756203d8dbca1681f12576ed53dee58c04c38c00ddccbeee6241f4a  /tmp/emscripten_temp/emcc-02-wasm-emscripten-finalize.wasm
dd9b355408f7d9b1b773ac79b28fc749542ffa893932e69fafe5d9e85d3e743d  /tmp/emscripten_temp/emcc-01-strip.wasm
8bb9819ebd1b955c680111860ad23b4a7e3dcf3d36d066d0841c6ab634e6d857  /tmp/emscripten_temp/emcc-04-wasm-opt.wasm
64cdd2d953ca5503955cb5d5d88ab55297b690c64163aa68f656bc0e7e72ff60  /tmp/emscripten_temp/emcc-03-original.js
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  /tmp/emscripten_temp/emscripten.lock

From a broken build

5388af09f4cda627fd9f0f1543bbcf7ce460e5f6468552f23fd59f64aba5568a  /tmp/emscripten_temp/tmpnkt_xnlflibemscripten_js_symbols.so
11ee58fea76876499baef46f123fc75728eba38176fb2b16eacf3d87026d4567  /tmp/emscripten_temp/emcc-00-base.wasm
02911def5ce22ab0be4a312a9d0c2e51b26a5c218b912104b8b6f81ef81f144e  /tmp/emscripten_temp/tmp9hh3ncvs.json
69ccfe3ab8a9315efdc677f36f485c559aa0190d9bd51e06344e2fc0c59f1058  /tmp/emscripten_temp/emcc-05-byn.wasm
6a3a0eb4bbfe56b2281fa27c849fb7fe8d181ceaaea77a99f8d51963cd8e004e  /tmp/emscripten_temp/emcc-02-wasm-emscripten-finalize.wasm
11ee58fea76876499baef46f123fc75728eba38176fb2b16eacf3d87026d4567  /tmp/emscripten_temp/emcc-01-strip.wasm
69ccfe3ab8a9315efdc677f36f485c559aa0190d9bd51e06344e2fc0c59f1058  /tmp/emscripten_temp/emcc-04-wasm-opt.wasm
64cdd2d953ca5503955cb5d5d88ab55297b690c64163aa68f656bc0e7e72ff60  /tmp/emscripten_temp/emcc-03-original.js
2ca59e48093ef69ce094d20ed16c21143db992562f2eb09ee2cc9ddf5d041ef4  /tmp/emscripten_temp/tmppkxses69.json
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  /tmp/emscripten_temp/emscripten.lock

Looks like all the .wasms differ.

If a build works, the .wasm will be identical. I haven't checked to see whether there are multiple broken .wasms, or if it always fails in the same way.

@sbc100
Copy link
Collaborator

sbc100 commented May 18, 2024

So it looks like the issue is with the order of the native port libraries on the command line.

Firstly, we should fix that so that its determinisitic.

Secondly, you should probably figure out why one ordering works for you and one doesn't. Perhaps that means that perhaps some of those libraries are defining the same symbols names and racing to define them?

@curiousdannii
Copy link
Contributor Author

curiousdannii commented May 18, 2024

I was wondering if perhaps it's the image ports, as most people would be using sdl_image? Does SDL bundle it's own version of zlib, something like that?

Is there a compiler/linker option to log duplicate definitions? No, looks like it would be doing that already by default. So maybe it's excluding each other through conditional compilation?

@sbc100
Copy link
Collaborator

sbc100 commented May 20, 2024

You could confirm by doing something like -Wl,--trace-symbol=deflate to see if the definition of this symbol comes from a different library in the two cases.

@curiousdannii
Copy link
Contributor Author

Thanks, I tried that and got

/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libfreetype.a(ftgzip.c.o): lazy definition of inflate
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libz.a(inflate.c.o): definition of inflate
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libpng.a(pngrutil.c.o): reference to inflate

Looks like Freetype embeds a copy of zlib: https://github.com/emscripten-ports/FreeType/blob/master/src/gzip/inflate.c

So I've found a Freetype option FT_CONFIG_OPTION_SYSTEM_ZLIB. I was wondering why there doesn't seem to be any problem with all these libraries when building natively, but it would make sense for Ubuntu to have enabled that option. But it probably hasn't been enabled for the Emscripten port?

Oh, there's even a pull request for that! #18088

@sbc100
Copy link
Collaborator

sbc100 commented May 24, 2024

OK, thanks for debugging that!

Sounds like have two different bugs here:

  1. Port linking order is non-deterministic
  2. libfreetype embed its own zlib instead of using the emscripten port.

We should fix both of these. I'll take a look that (1) now.

@curiousdannii
Copy link
Contributor Author

curiousdannii commented May 24, 2024

Awesome, thanks!

For (2), you had switched to upstream Freetype (#18095), but then reverted it (#18205). It looks like the port is still not using upstream Freetype. This patch is needed: emscripten-ports/FreeType@40a760c
Presumably that would need to be resolved before #18088 could be merged.

sbc100 added a commit to sbc100/emscripten that referenced this issue May 24, 2024
In python `dict()` ordering is deterministic, but `set()` ordering is
not.  Annoying.

This change adds a simple `OrderedSet` and uses that for managing ports.

Fixes: emscripten-core#21930
sbc100 added a commit that referenced this issue May 27, 2024
In python `dict()` ordering is deterministic, but `set()` ordering is
not.  Annoying.

This change adds a simple `OrderedSet` and uses that for managing ports.

Fixes: #21930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants