Rust Learning Note: Unsafe Rust
This article is a summary of Chapter 4.9.1 of Rust Course (course.rs/)
Code inside unsafe allows behaviors that are not permitted by the compiler in safe Rust, including:
Dereferencing a raw pointer
Invoking an unsafe or external function
accessing or mutating a mutable static variable
implementing an unsafe trait
skipping borrowing checking
These functionalities allows for more convenient and efficient implementations of certain tasks, and prevents misjudgements of the compiler. However, since some of the checkings by the compiler are skipped, the developer is responsible for the potential bugs lead by unsafe Rust. In general, the unsafe block should be as small as possible.
Dereference Raw Pointer
Raw pointers are similar to references in their functions. However, raw pointers have these differences from references and smart pointers:
Raw pointers can bypass borrowing rules in Rust. The coexistence of multiple mutable and immutable pointers is allowed
Raw pointers may point to invalid memory
Raw pointers can be null
Raw pointers will not be dropped automatically
let mut num = 5;
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;
unsafe {
println!("r1: {}", *r1);
}
In the code above we created an immutable raw pointer r1 and a mutable raw pointer r2. In this case, it is allowed to use the immutable pointer r1 even when a mutable pointer is defined. Note the the creation of raw pointers is not unsafe, but the dereferencing of raw pointers is.
We can also create raw pointers from memory address. Note that we should first get the address and then use still. Prevent making up a memory address as it will likely lead to unexpected behaviors or segmentation fault error.
use std::{slice::from_raw_parts, str::from_utf8_unchecked};
fn get_memory_location() -> (usize, usize) {
let string = "Hello World";
let pointer = string.as_ptr() as usize;
let length = string.len();
(pointer, length)
}
fn get_str_at_location(pointer: usize, length: usize) -> &'static str {
unsafe { from_utf8_unchecked(from_raw_parts(pointer as *const u8, length)) }
}
fn main() {
let (pointer, length) = get_memory_location();
let message = get_str_at_location(pointer, length);
println!("The {} bytes at 0x{:X} stored: {}", length, pointer, message);
// let message = get_str_at_location(1000, 10); Dangerous Behavior
}
We can derefence a raw pointer with *. This must be in unsafe block since we cannot guarantee the memory safety of raw pointers.
let a = 1;
let b: *const i32 = &a as *const i32;
unsafe {
println!("{}", *c);
}
Finally, we can create raw pointers from smart pointers using reference or method into_raw()
let a: Box<i32> = Box::new(10);
let b: *const i32 = &*a;
let c: *consy i32 = Box::into_raw(a);
Invoke Unsafe Functions and Methods
Unsafe function definition is to indicate that the function may contain undefined behaviors. The calling of unsafe functions must also be embedded in unsafe block. There is no need to contain addition unsafe blocks inside an unsafe function.
unsafe fn dangerous() {}
fn main() {
unsafe { dangerous(); }
}
FFI
FFI (Foreign Function Interface) allows using functions from another language in Rust. Code that uses FFI need to be inside unsafe block since other languages may not implement the rules of Rust compiler.
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
unsafe {
println!("{}", abs(-3));
}
}
In this code, we define the abs function in C in extern keyword. To invoke this function, the code need to be in unsafe block. In extern "C" we list the function signiture of the external function to invoke, and "C" refers to the ABI (Application Binary Interface) of the external function. ABI defines how to make function calls on an assembly level.
Similarly, we can also define rust functions that can be used by other languages:
#[no_mangle]
pub extern "C" fn call_from_c() {
println!("Call Rust function from C");
}
In the code above, we allow the function call_from_c to be called in C language. The annotation #[no_mangle] tells the compiler not to change the function name at compiler time so the name can be correctly identified by other languages.
Access Static Variables
A static variable is a global variable that can be accessed throughout the program. Accessing and mutating a static variable is unsafe since it may lead to data competing in multithreading. We should only static variables only in single thread.
static mut REQUEST_RECV: usize = 0;
fn main() {
unsafe {
REQUEST_RECV += 1;
assert_eq!(REQUEST_RECV, 1);
}
}
Access Union
Union is similar to struct except that all of its attributes share the same memory, that is, when one attribute is mutated, the other attributes will also be overwritten. Thus, union is considered as unsafe.
#[repr(C)]
union MyUnion {
f1: u32,
f2: f32
}
In this code, since f1 and f2 share the same memory space. If we assign a value to f1 and access f2, what we get is a f32 representation of value in f1.