-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace static mut
s with new CoreLocal
construct
#3945
base: master
Are you sure you want to change the base?
Conversation
@lschuermann and I discussed this, so summarizing the feedback and take aways here:
Ideally, we need to do at least one of:
The current implementation in this PR makes the constructor unsafe, but does not yet explain why it is unsafe, and what need to be ensured (e.g. that it is private to the module, and that it is only accessible by an ISR). |
Porting a few instances of |
io::DEBUG_INFO.with(|debug_info| debug_info.put(io::DebugInfo { | ||
chip, | ||
processes, | ||
process_printer, | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passes references to these to a potential future panic "thread" while other references to them live on and continue to be used.
Do we have a good sense for whether this is safe? I agree that the old approach is probably unsafe, so this is not necessarily making things worse. But it is not obvious to me that this is safe, and if it is not I sort of think we shouldn't make it look like it may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fundamental danger with this infrastructure is that we're providing a reference of T: ?Sync
in a Sync
wrapper, and I hardly think that chip
or processes
are Sync
. So strictly speaking this is probably not sound as would be required by Rust.
This is a hard problem to solve, and while the introduction of CoreLocal
does not change our premise and actual safety guarantees here, it's properties around Sync
are what's problematic... In fact, I never thought of panics as being virtually a different thread (which they definitely are!), but only ISRs.
I think that this here is symptomatic of a larger body of unsoundness in Tock when it comes to reasoning about which state can be assumed to be valid in these non-single-thread-synchronous circumstances. A panic may occur precisely because an invariant in some struct reachable through those reference is violated, and Rust doesn't give us particularly many guarantees around their validity beyond that point. So, it's really not only that chip
and processes
are !Sync
, but that they may not be valid at all in a panic context. An unergonomic, expensive and ugly fix is to move all state required in the panic handler into some "atomic" statics (perhaps wrapped in a CoreLocal
), that are Sync
, have a Cell
-like interface, can only wrap datatypes that are guaranteed to be updated atomically w.r.t. a single hardware thread, and are unconditionally valid.
In short, I don't think we can or should fix this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic is not in a different thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hardly think that chip or processes are Sync
Why? Multiple points of entry does not mean they are multiple threads of execution. They are not multiple threads of execution and that is pretty important to the Tock kernel functioning correctly generally.
There are exactly two kinds of software things that run concurrently with the kernel:
- Actual Processes, which do not have access to kernel memory
- ISRs which are specifically designed to do almost nothing except switch back to the kernel thread and retain the interrupt flag. They really should not touch any memory shared with the broader kernel.
Everything else is single threaded. If it is not single threaded, it is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hardly think that chip or processes are Sync
@alevy Aside from your actual comment text which I agree with, the aforementioned variables chip
and process
won't be sync because at least, as far as I know, they all contain some form of core::cell::UnsafeCell
that is !Sync
.
And, as you pointed out below, currently panic handlers are exactly invoked as part of ISRs, will access precisely this !Sync
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... I retract! You're of course right. I totally misunderstood that you were referring to the variables rather than the crates/modules. Sorry!
Agree with @alevy's summary, just to illustrate the point of why we reason about ISRs / panic contexts at all: The fundamental assumption in this infrastructure is that, due to Tock's single-threaded nature, deriving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go in kernel/utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, agree.
} | ||
|
||
impl<T> CoreLocal<T> { | ||
pub fn with<F, R>(&self, f: F) -> R where F: FnOnce(&T) -> R { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is with
standard in some context? I would expect it to be flipped ie closure.with(static_mut)
(although I know that doesn't make sense). .access()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's how thread_local
works in Rust's std
kernel/src/core_local.rs
Outdated
impl<T> CoreLocal<T> { | ||
pub const unsafe fn new_single_core(val: T) -> Self { | ||
CoreLocal(UnsafeCell::new(val)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<T> CoreLocal<T> { | |
pub const unsafe fn new_single_core(val: T) -> Self { | |
CoreLocal(UnsafeCell::new(val)) | |
} | |
} | |
impl<T> CoreLocalSingleCore<T> { | |
pub const unsafe fn new(val: T) -> Self { | |
CoreLocal(UnsafeCell::new(val)) | |
} | |
} |
?
Otherwise this reads like it is creating a CPU core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how I read it. A new_<something>()
in Rust generally describes creating a particular variant of a thing (in this case, creating a single-core variant of a CoreLocal
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that new_X is a variant, but it's almost always an adjective (eg unchecked) not a noun (core).
https://doc.rust-lang.org/std/?search=new_
There seem to be exceptions for creating a slice.
Maybe this should be new_single
? Although I don't see the need for the modifier if there is only one new() method.
@@ -41,10 +46,43 @@ use crate::utilities::cells::NumericCellExt; | |||
/// is less than this threshold. | |||
pub(crate) const MIN_QUANTA_THRESHOLD_US: u32 = 500; | |||
|
|||
#[derive(Copy, Clone)] | |||
pub struct StaticSlice<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A [u8]
is not Sized
, so can't be stored in a bunch of places. In Rust std
statics (like thread_local
) end up using Box
or Vec
. StaticSlice
is basically the a fixed-sized Vec
(it has a fixed length, but that length is stored dynamically), so it is Sized
and can be stored in statics directly, like a fixed-sized array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble seeing the semantic difference between a StaticSlice<T>
and a &'static mut [T]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no semantic difference, only that [T]
is not Sized
so cannot be declared in a context that needs Sized
(such as a a static variable), and we can't create a &'static mut [T]
statically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. The issue is not Sized
but that mut
references are not allowed in a static
item's type. Maybe a different name and/or a link to rust-lang/rust#57349 would make this more clear?
Some other improvement ideas:
StaticSlice
doesn't actually need to be slice-specific. You can make it generic for any type and let the user choose whether to store a slice in it.- You can store a
&'static mut T
instead of a raw pointer to avoidunsafe
inDeref
andDerefMut
.
Demonstration: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8de22b70edcbedaffccc382fd35ddd11
@@ -83,6 +83,7 @@ impl<'a> AppCredentialsPolicy<'a> for AppCheckerSimulated<'a> { | |||
} | |||
} | |||
|
|||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a warning we get on newer compilers that for some reason maybe we weren't getting before? Either way, the code is never used and should probably actually be removed, but I didn't want to mess with that stuff in this PR. (Document in the changing commit, but I know that's somewhat hidden)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be pub
.
processes: &'a CoreLocal<MapCell<StaticSlice<Option<&'static dyn Process>>>>, | ||
iterator: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lose the comment. Why this change?
At the moment seems to imply we need to guard against something else, but I don't see the need to make the scope larger than our current issue. Since this is pressing (we can't upgrade nightly), I think we need to keep the scope narrow.
This seems like something that should be wrapped in our type. Why put the complexity on to users? It seems like the is the purpose of
I view our job as kernel developers to manage the complexity of using unsafe correctly. Exposing more APIs where the burden is on other developers to reason about |
Three reasons:
|
CI scripts would be nice, if we can figure out what and how to check for that. I think this ultimately boils down to fairly simple rules, though I've not finished thinking through these and partially needs to be validated by implementing the rest of the port:
I'll note that this is very very similar to how The high bit, ISRs are really not a point of extensibility for Tock and should be handled with the utmost care. This, of course, doesn't change that having automatic ways of validating our the safety requirements would be really good. |
Let me try to articulate the safety invariants for A An aside for justificationThis kind of co-reliance between the system's execution model and the language-enforced APIs is fundamental. The notion of a thread, what constitutes a thread, and how to distinguish different threads is entirely an OS-level construct. For example, the safety of The hidden kernel threadsTock does actually have one other kind of software that runs truly-concurrently (i.e. it may run at any point, including during "critical sections") with the kernel---interrupt service routines (ISRs). There's an important invariant for ISRs (which may not always be maintained at the moment): ISRs must not share any data with the kernel except the variables that signal a syscall, hardfault, etc just fired (even those are generally controlled within the arch crate, but they are accessed in both ISR and kernel threads). Those do not use There are at least two notable exceptions to this rule right now. As @hudson-ayers points out, Supporting more than one thread-of-executionSupporting more than one thread (e.g. ISRs) safely is a tricky and potentially heavy-weight proposition. There needs to be some way of determining whether we are currently in---e.g. are we in ISR or the kernel main-thread, and which ISR. We could do this in architecture specific ways---e.g. use the armv7m IPSR register (which denotes the exception level), and either deny access if it's not the kernel main-thread, or have a separate copy for each ISR (that's a lot of memory....). This implementation argues that a better strategy is to eliminate access to CoreLocals from ISRs, use an unsafe constructor for CoreLocal to denote that the user must ensure that such a static is not accessible from another core or from an ISR, and to have a simple set of "rules" to follow that ensure this by convention---and rely on ISRs themselves being highly subtle from a safety perspective (and requiring unsafe to implement in the first place) to avoid violating these rules in ISR context. |
I don't think this is problem worth thinking about; if your ship's sinking, poking a hole in the hull isn't going to matter.
Agreed.
This seems like it needs more explanation as to what the expectation is.
Agreed.
Sounds good in theory, but what are they?
Very much agreed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot in this PR (two new kernel types, changes to HILs, wrapping static muts in cells, and of course the headline change). How should we make forward progress?
} | ||
None | ||
self.processes.with(|ps| { | ||
ps.map_or(None, |ps| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does if let Some(ps) = ps.get()
not work here?
/// Main object for the kernel. Each board will need to create one. | ||
pub struct Kernel { | ||
/// This holds a pointer to the static array of Process pointers. | ||
processes: &'static [Option<&'static dyn process::Process>], | ||
pub(crate) processes: &'static CoreLocal<MapCell<StaticSlice<Option<&'static dyn process::Process>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pub
?
/// Main object for the kernel. Each board will need to create one. | ||
pub struct Kernel { | ||
/// This holds a pointer to the static array of Process pointers. | ||
processes: &'static [Option<&'static dyn process::Process>], | ||
pub(crate) processes: &'static CoreLocal<MapCell<StaticSlice<Option<&'static dyn process::Process>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about adding MapCell
here. Nothing should ever call .take()
, but that is an option now. What we need is a cell that can't lose its contents.
In particular, I'd like to see safety requirements of |
@@ -41,10 +46,43 @@ use crate::utilities::cells::NumericCellExt; | |||
/// is less than this threshold. | |||
pub(crate) const MIN_QUANTA_THRESHOLD_US: u32 = 500; | |||
|
|||
#[derive(Copy, Clone)] | |||
pub struct StaticSlice<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble seeing the semantic difference between a StaticSlice<T>
and a &'static mut [T]
.
}; | ||
self.registers.packetptr.set(buf.as_ptr() as u32); | ||
self.registers.pcnf1.modify(PacketConfiguration1::MAXLEN.val(buf.len() as u32)); | ||
slice::from_raw_parts_mut(old_base, old_len) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like old_base
can be null at the slice::from_raw_parts_mut
invocation, which is UB.
Please add a safety comment for slice::from_raw_parts_mut
, and minimize the size of this unsafe
block (is slice::from_raw_parts_mut
the only unsafe operation here?).
@@ -0,0 +1,17 @@ | |||
use core::cell::UnsafeCell; | |||
|
|||
pub struct CoreLocal<T>(UnsafeCell<T>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need UnsafeCell
here, though maybe the documentation will clarify that when it is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, I believe! I'm confident this is an artifact from our earlier design. We may want to re-introduce it in the future, if we ever add an unsafe fn with_mut
, where developers themselves must ensure non-reentrance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that having an UnsafeCell
here will place all static CoreLocal
s into RAM. Without UnsafeCell
, if T
does not have interior mutability, then CoreLocal<T>
can live in .rodata
. I don't know how common it is to have immutable data in a CoreLocal
-- presumably rare -- but that cost is probably worth thinking about if we decide to reintroduce (or keep) UnsafeCell
.
kernel/src/core_local.rs
Outdated
impl<T> CoreLocal<T> { | ||
pub const unsafe fn new_single_core(val: T) -> Self { | ||
CoreLocal(UnsafeCell::new(val)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how I read it. A new_<something>()
in Rust generally describes creating a particular variant of a thing (in this case, creating a single-core variant of a CoreLocal
).
Catching up on phone on the train ride home, so haven't read or thought through all the subtleties here, but I feel it's worth pointing out that ISRs need not necessarily risk running concurrently with accesses to CoreLocals; i.e., it is a single assembly instruction to turn all interrupts off and similarly one to reenable them. We have the fundamental primitive available if needed to meet Rust's safety requirements IIUC. (There are further complexities to such an approach, and overhead that isn't strictly necessary if we can otherwise prove ISRs won't access the memory, but a "Rust-legal Sync" is somewhat trivially available from this design IIUC) |
One of the main issues with this is that this requires architecture-specific code (even to call the In our more "advanced"/multi-core design we have a trait-based approach that is actually able to resolve the core-ID at runtime (and thus inject architecture-specific code), but that's not entirely free (uses trait objects) and seems overengineered for this particular use-case. |
7dd7d9c
to
7ede432
Compare
Probably should just remove the dead code? But, it's not relevant to what I'm working on right now.
This required changing the ble_advertising HIL as well as the ble_advertising driver in ways that I'm not certain will work. (likely looses buffers). Need to test
@@ -0,0 +1,17 @@ | |||
use core::cell::UnsafeCell; | |||
|
|||
pub struct CoreLocal<T>(UnsafeCell<T>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that having an UnsafeCell
here will place all static CoreLocal
s into RAM. Without UnsafeCell
, if T
does not have interior mutability, then CoreLocal<T>
can live in .rodata
. I don't know how common it is to have immutable data in a CoreLocal
-- presumably rare -- but that cost is probably worth thinking about if we decide to reintroduce (or keep) UnsafeCell
.
/// | ||
/// By convetion, users should declare [`CoreLocal`] variables in | ||
/// scopes that are inaccessible to ISRs or signal handler, | ||
/// e.g. in module-private or function-local scopes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is somewhat self-contradictory. It first says "must only be created ... on systems ... without preemptive threads". It then explains that single-core systems can have preemptive threads in the form of ISRs and signal handlers, and implies it is okay to use a CoreLocal
on such systems as long as you hide the CoreLocal
from ISRs and signal handlers.
@@ -41,10 +46,43 @@ use crate::utilities::cells::NumericCellExt; | |||
/// is less than this threshold. | |||
pub(crate) const MIN_QUANTA_THRESHOLD_US: u32 = 500; | |||
|
|||
#[derive(Copy, Clone)] | |||
pub struct StaticSlice<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. The issue is not Sized
but that mut
references are not allowed in a static
item's type. Maybe a different name and/or a link to rust-lang/rust#57349 would make this more clear?
Some other improvement ideas:
StaticSlice
doesn't actually need to be slice-specific. You can make it generic for any type and let the user choose whether to store a slice in it.- You can store a
&'static mut T
instead of a raw pointer to avoidunsafe
inDeref
andDerefMut
.
Demonstration: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8de22b70edcbedaffccc382fd35ddd11
I'm trying to understand the balance of flexibility with specificity of If there is a clear need for defining types for a more complex multi-threaded platform, what does the upgrade path look like for users who want to use the hypothetical multicore and/or multithread versions? If the kernel uses It seems like the current CoreLocal gets us to an unclear middleground: the type & constructor imply multiX support based on their naming, but don't indicate how you would use them to run on multiX systems. One idea is to name the type based on how it should be used, not how it is implemented. Maybe The other direction is to double down on specificity and implementation, and not plan for a more generic approach right now. Maybe |
Pull Request Overview
This pull request fixes #3841 by replacing most global
static mut
throughout the kernel, chips, capsules, and board crates with a new construct calledCoreLocal
.CoreLocal
is similar in principle to Ruststd
'sLocalKey
(obtained from thethread_local!
macro). It allows access to its internally stored data within a closure passed towith
, which obtains a temporary shared reference. It is markedSync
, under the assumption that it is accessed in a thread-safe way---e.g. one copy per CPU core, as the name implies---allowing it to be stored in a global (non-mut) static.This implementation only includes support for single-core, single-threaded, platforms (as all of Tock for the moment) via the unsafe
new_single_core
constructor, though extending to multiple cores should be relatively simple.In order to support mutation (which is ostensibly the purpose of
static mut
in many cases), users should use some form of single-threaded mutual exclusion, such as aMapCell
,TakeCell
,Cell
, etc.This PR converts most uses of
static mut
to use this new construct, thus avoiding various annoying warnings and probably some real unsoundness. This comes at virtually no cost to code size, but some minor cost to memory (around 100 bytes on the platforms I looked at).Testing Strategy
Still need to test. This changes some subtle parts of the kernel (deferred calls, the schedulers, process implementation) in ways that, if I didn't take enough care to be precise about the logic, could likely result in bugs (though probably not safety bugs).
TODO or Help Wanted
Documentation Updated
/docs
, or no updates are required.Formatting
make prepush
.