Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API alignment #442

Open
jordens opened this issue Jan 15, 2024 · 19 comments
Open

API alignment #442

jordens opened this issue Jan 15, 2024 · 19 comments

Comments

@jordens
Copy link
Contributor

jordens commented Jan 15, 2024

In the light of aiming for a stabilized API of v1.0 (#410) but having just done a significant API changeset in 0.8, I wanted to bring something up that has been bugging me several times.

Currently the heapless::Vec API is significantly different from both std::Vec and alternative no_std/no-alloc Vec implementations like tinyvec::ArrayVec or arrayvec::ArrayVec or coca. This is especially true for the important case where std::Vec would reallocate but its fixed size alternatives must either panic or become fallible: e.g. Vec::push().

heapless deviates from the others in its API and makes these fallible. The others are panicing and additionally/alternatively tend to offer a fallible API (ArrayVec::try_push()).

IMO it would be nice if we could overall

  • align the API with the std::Vec API and tinyvec
  • offer a fallible API (try_push() etc).

That would make porting a bit easier. And it could allow heapless::Vec to be a drop-in replacement for tinyvec::ArrayVec. The only (AFAICT) remaining difference would be under the hood: the use of MaybeUninit.

The API difference comes up from time tom time e.g. #440.
There is demand for distinct operations that are both infallible and non-panicing: #341

There may also be other places in heapless where the API choice diverges from the std analogue.

@newAM
Copy link
Member

newAM commented Jan 15, 2024

I agree with this. It's going to be an annoying breaking change, but std::vec::Vec::try_push will exist at some point for the Linux kernel and we should align with that.

There are definitely other places where the API diverges, String::from_utf8 for example, and not having String::from because from traits cannot panic.

In general we're going to have the same problems the Linux kernel has with std:: for fallible vs infallible allocations. This crate is going to be used for embedded systems and we should make the difference between panicky and non-panicky functions clear somehow.

@Dirbaio
Copy link
Member

Dirbaio commented Jan 15, 2024

I'm not sure if I agree with this.

It's true std's .push() panics on out of memory, so technically making heapless .push() panic if full would be consistent. However, the use cases for both crates are qiute different, I believe making .push() panic on heapless would cause more issues than it solves.

In std, you treat RAM as a practically-infinite single pool. When it does fill up, your program's state is thoroughly hosed because any other allocation will also fail, and pretty much everything in std rust does allocations. The kernel handles OOM terribly too, it can hose your entire machine, not just your process. So panicking in std vec makes sense.

In haepless, it's much more likely to run into .push() full, because you're not treating RAM as infinite, you've set a finite (and likely small) size to the Vec. Also, when you do run into full, it's only that Vec that filled up, not your entire RAM. Your firmware's state is not thorougly hosed. Most likely you received too much data from something, like from one particular network connection, and you can and want to tear that down gracefully but not crash the entire firmware.

so IMO crashing on heapless vec full is never the right choice (except in extremely quick-and-dirty firmware prototypes).

@jordens
Copy link
Contributor Author

jordens commented Jan 15, 2024

I think it's important to keep in mind that we're not so much talking about an overall functionality change, but more about naming. For the case of push() the envisioned changes (in a downstream crate) would be two renaming ops:

  • push().unwrap(), push().expect() etc -> push()
  • else push() -> try_push()

For many cases, try_push() would indeed be the right choice and the docs (and doctests!) should make that clear.

@Dirbaio
Copy link
Member

Dirbaio commented Jan 15, 2024

People will use the more "obvious" method named .push() first. If we make it panicky, we're pushing users to use the wrong one, making them write less reliable software.

You mention ease of porting. However to port code to heapless well you want to handle "full" errors. If heapless push() is fallible, we're forcing users to handle these. If we make it infallible and panicky, porting code becomes easier "on the surface" only, it will result in code that does compile, but is easy to crash.

@jordens
Copy link
Contributor Author

jordens commented Jan 16, 2024

I don't think a panicing API can be that easily disregarded as "wrong". For example, indexing a slice (which is also the obvious approach to retrieving an item) would then be wrong as well. It's just as easy to crash! And with your line of argument, you'd need to convert indexing operations into get() all over.

It's clear to every user of heapless that moving from std::Vec is precisely not a trivial drop-in replacement. The reasoning about the vector size and the way to work with it stays the same, independent on whether push() is panicing or fallible.

And the porting consideration should not be focused on std::Vec but also on e.g. tinyvec::ArrayVec which did indeed make the choice proposed above, and which has a gigantic user base compared to heapless. One might even wonder if one shouldn't just prefer tinyvec::ArrayVec over heapless::Vec. The maybeuninit optimization does not seem to be a showstopper for tinyvec at all.

@Dirbaio
Copy link
Member

Dirbaio commented Jan 16, 2024

I don't think a panicing API can be that easily disregarded as "wrong". For example, indexing a slice (which is also the obvious approach to retrieving an item) would then be wrong as well. It's just as easy to crash! And with your line of argument, you'd need to convert indexing operations into get() all over.

Slice indexes are usually controlled by the "algorithm" your code is implementing, Index out of range panics are almost always bugs in your code, ie "your" fault.

Heapless vec push full errors are usually controlled by "the environment": number of records from deserializing a network message, number of button presses the user did in the PIN code keypad... It's common for your code to be entirely correct and still run into "full" errors, due to none of your fault, just because the environment you're interacting with fed you too much data.

It's clear to every user of heapless that moving from std::Vec is precisely not a trivial drop-in replacement. The reasoning about the vector size and the way to work with it stays the same, independent on whether push() is panicing or fallible.

Sure reasoning is the same, but making .push() fallible allows offloading the reasoning to the compiler. You get code that doesn't compile until you handle all possible "full" errors. I believe if we can design the API so the compiler helps you catch errors, we should.

And the porting consideration should not be focused on std::Vec but also on e.g. tinyvec::ArrayVec which did indeed make the choice proposed above, and which has a gigantic user base compared to heapless. One might even wonder if one shouldn't just prefer tinyvec::ArrayVec over heapless::Vec

A crate having many downloads doesn't automatically mean all API decisions they made are correct. Also, I don't think "I want .push() that panics" is what's getting people to use arrayvec instead of heapless.

@jordens
Copy link
Contributor Author

jordens commented Jan 16, 2024

I'd be very surprised if that statement (that indices are not user/external data while push() traffic is triggered by user/external data) holds water. There is nothing in the code or the docs that supports this AFAICT. And it's trivial to come up with plenty of counter-examples.

@Dirbaio
Copy link
Member

Dirbaio commented Jan 16, 2024

pretty much every single protocol does "send N things" or "send a variable-length string", while it's relatively rare to send indices.

@jordens
Copy link
Contributor Author

jordens commented Jan 16, 2024

Come on! "rare"? When receiving said data, you'll use some length data passed to you to index or split_at() your receive buffer.

@Dirbaio
Copy link
Member

Dirbaio commented Jan 16, 2024

Indices used by the indexing operator [] rarely come directly from the network, they're produced by your program. Therefore it makes more sense for the [] operator to panic on out of bounds.

lengths do come from the network. "read len, then read len things" is common, but that's not an index. You wouldn't feed len to my_vec[len]. You'd call .push() len times. If len is too high you will fill up the Vec, at which point you don't want to panic, you want to somehow handle the error gracefully.

@jordens
Copy link
Contributor Author

jordens commented Jan 16, 2024

You misread me. I was referring to indexing a receive buffer or splitting it at an index. This is both common and panicing. Hence a trivial counterexample to your claim that indices are trusted.

In any case we've strayed far away from the original point I was making: that it may be preferable to stick with the API of std::Vec (panicing push()) and others (fallible ArrayVec::try_push()) and instead of deviating from them.

@reitermarkus
Copy link
Member

Another question regarding aligning the API with std type came up here: #425 (comment)

@sosthene-nitrokey
Copy link
Contributor

sosthene-nitrokey commented Feb 7, 2024

I disagree with the idea of aligning APIs for the purpose of the APIs being the same, when the context in which this crates is used is wildly different from std.

The reason that std fails to push an element is wildly different from why heapless::Vec does. As pointed by other in this conversation: std's Vec failing to allocate is most likely irrecoverable, because OOM is in most case irrecoverable.

In a lot of cases, users will need to put arbitrary limit on the data stored in the Vecs to prevent DOS, and they wouldn't rely on the Vec's failure to allocate to signal that the data is dangerously big, so the need to have their own checks for that: https://jfrog.com/blog/watch-out-for-dos-when-using-rusts-popular-hyper-package/ .
Vec itself (or bytes::Bytes) cannot do those checks because use cases vary and it wouldn't be zero cost. So users need to implement their own.

In the case of heapless on the other hand, users know that the Vec has a limited size that is already allocated, so manually checking for a limit lower than that is just wasting space. Therefore, users will set the capacity appropriately and rely the push failing to signal that the buffer should not be larger. 

To give a practical example, in our case at Nitrokey, we would be using try_push()? almost exclusively, and we would most likely need something that allows us to enforce that a panicking push() is never used, because there are only a dozen of places where this is actually the behaviour we want, and all other uses would lead to crashes being possible to trigger from external input.

@reitermarkus
Copy link
Member

Given that we have a bunch of types that don't have an equivalent in std, I don't think it is necessary to be fully compatible with std.

For example, if Vec::push is infallible, does Queue::enqueue also need to be infallible and do we need Queue::try_enqueue in addition to that?

Also, currently the fallible version of Vec::push in std is called Vec::push_within_capacity. I doubt we want to call it that (assuming it is stabilized with this name) when this is supposed to be the preferable method to use in most cases. In fact, the name would be redundant because there is no way to push outside of the capacity.

@jordens
Copy link
Contributor Author

jordens commented Feb 7, 2024

I disagree with the idea of aligning APIs for the purpose of the APIs being the same, when the context in which this crates is used is wildly different from std.

Exactly. The context is different and there needs to be a fallible op. But the context doesn't demand push() to be fallible and it doesn't suggest that the (desirable) fallible op should be called push(). Instead the context and prior art would suggest that it should be named try_push().
Compare it with indexing which is panicing in any context. There is a separate fallible indexing API with get(). Or if you can't do an infallible From, you impl TryFrom.

we would most likely need something that allows us to enforce that a panicking push() is never used

Presumably you need to do that audit anyway and (hopefully) already do it for all the other panics (unwrap(), slice indices and ranges, or all the other already panicing ops in heapless::Vec: replace, insert, FromIterator...).

A compromise could be to (just) rename the current push() to try_push() (with deprecation schedule of an alias if need be) and keep the signature as it is now.

do we need Queue::try_enqueue in addition to that?

Sure. And other places as well.

The fallible version of Vec::push in std is called Vec::push_within_capacity. I doubt we want to call it

Right. The name would be try_push() as seen elsewhere. The reason to not call it push_within_capacity() is, as you point out, due to its different context (apart from it being annoyingly long).

@Dirbaio
Copy link
Member

Dirbaio commented Feb 8, 2024

Just renaming push for try_push would be breaking all heapless users for no benefit. It doesn't achieve the goal of easy porting from std, since push() is still missing. (And we should not have panicking push(), as discussed above I believe it's a footgun with almost no legitimate uses).

@sosthene-nitrokey
Copy link
Contributor

The reason for naming thing try_* is when * exists but is considered infallible. Here we do not want to have push panic because of how dangerous it would be. A lot of the feedback goes against having push be panicking. If we don't have a push, having a try_push does not make sense. For example the std::io::Read trait does not have a try_read, only a fallible read because in this context (io), handling failure is almost always a requirement.

@robin-nitrokey
Copy link

robin-nitrokey commented Feb 8, 2024

Presumably you need to do that audit anyway and (hopefully) already do it for all the other panics (unwrap(), slice indices and ranges, or all the other already panicing ops in heapless::Vec: replace, insert, FromIterator...).

There are clippy lints for direct unwrap and expect calls that make it easy to audit that. Regarding slice indexing and heapless::Vec::insert, I agree with Dirbaio’s reasoning that indexing errors are a different kind of error than capacity errors. FromIterator is indeed problematic, but it is much less common than push and it is not possible to implement it in a fallible way. heapless::Vec::replace does not exist AFAIS.

@sosthene-nitrokey
Copy link
Contributor

I didn't even know that FromIterator was implemented for Vec and could panic.

I wonder if it wouldn't be an improvement to actually implement it on Result<Vec<T, N>, ()> rather than Vec directly. (Sadly this conflicts with the orphan rules, but we could make it somehow fallible, with a custom Result enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants