-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Rust does not provide facilities for doing work before fn main(), but we can achieve it anyhow on a lot of platforms using linker tricks and global constructors, as done in ctor, inventory and the standard library itself.
Is this safe to do, or should it require unsafe? If so, what are the safety requirements that one needs to uphold inside a global constructor?
In isolation, this feature seems to be safe (?), but one reason to define it as unsafe could be to allow libraries to rely on never being used in a constructor themselves? Consider the following library:
static mut FOO: i32 = 0;
/// Do some "heavy" work to initialize `FOO`.
#[ctor]
fn setup_foo() {
// SAFETY: `get_foo` is not running at the same time as this constructor, so an unsynchronized write is fine.
unsafe { FOO = 42 };
}
/// Get the value from `FOO`.
pub fn get_foo() -> NonZero<i32> {
// SAFETY: The constructor has run before `get_foo`
let foo = unsafe { FOO };
// SAFETY: The variable was initialized to a non-zero value.
unsafe { NonZero::new_unchecked(foo) }
}The safety comments here rely on:
- The constructor not executing concurrently with anything else.
- The constructor executing before
get_foo.
But those assumptions are actually false, since the user may write:
#[ctor]
fn user_constructor() {
std::thread::spawn(|| {
get_foo();
});
}And, depending on linker ordering, user_constructor may be run first (unless we can somehow guarantee that setup_foo runs before user_constructor?), in which case the unsynchronized read+write is not actually okay, and the variable may not actually be initialized to a non-zero value.
The correct way to write it then would be:
static FOO: AtomicI32 = AtomicI32::new(0);
#[ctor]
fn setup_foo() {
// Must use atomics.
FOO.store(42, Ordering::Relaxed);
}
pub fn get_foo() -> NonZero<i32> {
let foo = FOO.load(Ordering::Relaxed);
// Must check that the variable is actually initialized.
NonZero::new(foo).unwrap()
}Though this does decrease the utility of #[ctor] by a fair margin as the performance improvements of using a global constructor decreases, and you'd probably just want to use OnceLock now instead.
One noteworthy tidbit is that when using dynamic libraries (e.g. linking libc or libX11), this is a non-issue, as constructors are executed in libraries first, so any global constructors that a dynamic library has are guaranteed to be executed before any code that depends on that dynamic library.
For context, I'm opening this because the ctor crate has made the #[ctor] attribute require unsafe, see mmastrac/rust-ctor#159, but deciding "running code before main is unsafe for xyz reason" is a decision that affects more than just ctor (e.g. it affects inventory as well), so I thought it might be something that t-opsem / the UCG has opinions on?
See also #397.