Dare to wonder and make wonders?
Drop us a message.
In the previous blog post, we got the idea about the ownership concept and how the Rust compiler works. This blog post will acquaint us with smart pointers and basic concepts to write safe and clean programs. It will be shown through examples in parallel in C++ and Rust.
Even if you are not familiar with C++, keep reading! 🤓
In C++, the developer has complete control over resources, their movement in memory, allocation, and memory deallocation.
The resource represents memory space available to the program for allocation and deallocation - for example, heap, file management, network sockets, and mutex.
Let's start with an example of manual memory management in C++:
class Memcheck {};
void fn_tmp() {
int x = 0;
if (x == 0) {
throw std::invalid_argument("Can't be zero");
}
}
int main() {
Memcheck *memcheck = new Memcheck;
fn_tmp();
delete memcheck;
return 0;
}
The object memcheck is dynamically created, and space is allocated on the heap. An object created this way will not release resources after it goes out of scope. Additionally, fn_tmp() throws an error, so delete memcheck line will never be executed. In this code example, a memory leak occurs, as we cannot use the memory space occupied by the memcheck.
Any dynamically allocated memory space should be released after the object no longer uses it.
It is crucial to remember that when an object is created using the keyword new, we must free up the space explicitly with the delete keyword.
With manual memory management, there is a bigger chance for errors like double-free memory, memory leak, or multiple pointers referencing the same object. The real question is - If there are numerous pointers for the same resources, which of them is responsible for freeing resources???
Let’s see how we can relax developers from memory management headaches.
Resource Acquisition is Initialisation, less known Scope bound resource management, is a recommended pattern to follow if we want to write secure programs in C++.
class A {};
class RAII {
private:
A* a;
public:
RAII(A* _a): a(_a) {}
~RAII() {
delete a;
}
};
void fn_tmp() {
int x = 0;
if (x == 0) {
throw std::invalid_argument("Can't be zero");
}
}
int main() {
RAII raii = RAII(new A());
fn_tmp();
return 0;
}
In this example, after fn_tmp() is executed, all resources used by raii object free up regardless of an error because the destructor is called automatically after the raii object's lifetime ends.
The RAII pattern reduces the headache for developers in the sense that now the only task is to implement the appropriate destructor for each class. This pattern also increases the readability of the code, considering that we do not have to explicitly use the delete keyword for each newly created object, which also reduces the complexity of the code. Therefore, the possibility of a memory leak or another error is reduced. The smart pointers are one of the RAII applications that automatically release memory.
Ownership-based resource management is not a pattern nor a best practice that should follow in writing a clean and secure program; instead, it is a built-in feature of the Rust language. Indeed, the motivation for this feature is RAII to relieve the developer of memory management.
let str = String::from("3327");
let a = Box::new(A{});
The variables str and a are dynamically allocated on the heap. After the end of their life cycle, resources in the memory are deallocated. It happens regardless of an error.
There are just a few lines of code, and the developer has no responsibilities regarding memory management, as the ownership concept automatically takes care of memory release.
Let's recall the previous blog, where we learned Rust concepts: move, clone, and copy. Ownership and smart pointer are closely related, and we will see exactly how in the following examples.
A smart pointer is a concept that automatically clears memory, checks whether data is out of bounds, and what happens with the object in memory.
In C++ these are unique_ptr<T> and shared_ptr<T>
In Rust these are Box<T>, RC<T>, Cell<T>, RefCell<T>, Arc<T>, RwLock<T> and Mutex<T>
So let the battle begin!
Unique pointer
std::unique_ptr<UniqueExample> unique = std::make_unique<UniqueExample>();
std::unique_ptr<UniqueExample> unique2 = unique; //error
A unique pointer represents a single owner of a resource. When we use unique_ptr,we can’t create another pointer that points to the same instance. Moreover, when the function completes, the unique_ptr destructor is called, which cleans up all resources.
let a = Box::new(BoxExample{});
let b = a;
The above example in Rust has no error; since the a Box moves to b, and the a has no longer Box in which the type is BoxExample, the Rust compiler does not allow the a to be used anymore. When the Box goes out of scope, automatic clean-up happens.
We use the Box smart pointer for resources that we want to allocate on the heap. For example, if we want to save primitive types on the heap. Box guarantees to be non-null, i.e., It will always point to some valid memory location. We use Box to improve performance by storing large amounts of data on the heap in a Box, and a pointer on the stack. Moving Box gives you another T inside another Box; when we move the Box, it will allocate a new Box for the same T on the heap. Note there is only one owner at a time.
To achieve this in C++, move function must be implicitly called:
std::unique_ptr<UniqueExample> a = std::make_unique<UniqueExample>();
std::unique_ptr<UniqueExample> b = move(a);
The a is no longer valid; its life cycle has ended. The new owner is b, and it is in charge of memory deallocation.
It is allowed that two variables share ownership of resources, so let's look at the following examples.
std::shared_ptr<SharedExample> shared = std::make_shared<SharedExample>();
std::shared_ptr<SharedExample> shared2 = shared;
When the first shared pointer is created reference_counter = 1, there is one reference to the object shared on the heap; we make the second shared_ptr and assign it shared -> reference_counter = 2. The two shared pointers point to the same SharedExample allocated on the heap.
When the function is finished, the destructor reduces the reference_counter is called. And, when reference_counter = 0, then the memory on the heap is cleared!
let mut room = Rc::new(3327);
{
let _first_room = room.clone();
assert_eq!(Rc::get_mut(&mut room), None);
let _second_room = room.clone();
assert_eq!(Rc::strong_count(& room), 3);
}
assert_eq!(Rc::strong_count(&room), 1);
let room_3327 = Rc::get_mut(&mut room).unwrap();
*room_3327 = 3327;
assert_eq!(*room, 3327);
}
The Rc<T> prohibits mutating variables - when we clone() Rc, it produces a new pointer to the same allocation on the heap, and we cannot mutate that value on the heap.
Notice that we have the ability for multiple owners of the variable, but no owner is allowed to take ownership of the variable unless there are currently no other owners. So, in the above example, we can borrow a room as immutable to _first_room, but we can’t change its value as the current number of pointers is 2. When there is only one reference to the room allocated on the heap, we can borrow it as mutable in the example that is room_3327, where we change the value. The memory is cleared when the last Rc pointers go out of scope.
If we have an immutable structure, with the Cell<T> we can change the data within that structure.
let room = Cell::new(3327);
let room_ref = &room;
let room_3327 = room.get();
{
room.set(3328);
}
let room_3328 = room.get();
assert_eq!(room_3327, 3327);
assert_eq!(room_3328, 3328);
assert_eq!(room_ref.get(), 3328);
Cell<T> is easy to use, calling set and get methods to control the access and change of T.
The limitation of Cell<T> is that it needs to implement the Copy trait. If this limits your use case, you should consider using RefCell<T>.
RefCell<T> works with references, storing anything you want. It has methods that borrow either mutable or immutable references to T.
let ref_cell = RefCell::new("3327");
let _room_3327 = ref_cell.borrow();
assert!(ref_cell.try_borrow().is_ok());
assert!(ref_cell.try_borrow_mut().is_err());
Cell<T> is faster, as it does not do anything complicated, while RefCell<T> is slower, as it keeps track of how many times T has been borrowed and panics if it notices a problem.
Just remember that data can be different than expected as Cell and RefCell are known as "interior mutability".
In case we need thread-safe multiple ownership in Rust, we use Arc<T>, RwLock<T>, and Mutex<T> smart pointers.
C++ is a language in which we can choose whether to manually manage memory or use smart pointers - RAII pattern. We have to use smart pointers in Rust because the ownership rules are checked during compilation. The security of Rust mainly comes from the compile-time check, and in addition to smart pointers, we also have raw pointers that do not give any guarantees.
Memory management is essential for writing safe and clean programs. That is why you should give extra attention to understanding smart pointers, when, and how to use them!
Stay tuned; a blog about unsafe Rust is coming.
COMMENTS (0)