Rust for C++ developers part 3: Ownership and Borrowing
This is the third post from my Rust for C++ developers series. With references and traits covered now is a good time to learn about ownership and borrowing. Rust is a system programming language and as such the memory management is a very important topic. I assume you are already familiar with what stack and heap is and how it works in general. If not - you can read this section from the Rust book. If you are not familiar with the references in Rust and you have missed part 2 I strongly recommend to read it now. You need to understand how references work in Rust to grasp this post. But before talking about borrowing let's discuss memory ownership.
What is ownership
Ownership is something very explicit in Rust. Each piece of memory has got a specific owner responsible for its deallocation. When you declare a variable it becomes the owner of the memory it allocates (no matter if it is on the stack or on the heap). You can create a reference to the variable (the owner) but it doesn't own the memory and can't drop it. It only borrows it.
For example you can have a variable and a mutable reference to it. Effectively they point to the same memory, the mutable reference can modify it but it doesn't own it. It only borrows it. If the reference goes out of scope the memory doesn't get deallocated and no matter what you do safe Rust won't let you drop the memory via the reference. The only way to deallocate the memory is to drop the owner which is the initial variable which allocated it. Furthermore the compiler guarantees that the borrower(s) doesn't interfere with each other and the owner of the memory. Including the owner lives long enough so it doesn't free the memory while it is needed by any of the borrowers. All this is checked during compilation thanks to the borrow checker.
How Rust works with memory
Rust manages the memory in a way similar to RAII idiom in C++. The memory is allocated during the
initialisation of the owner and is released when the owner goes out of scope or is dropped manually.
The concept of scopes is the same as in C++, including the way to create a explicit scope with { ..
}
.
In C++ the 'default' operation is copy. If you want to move something you have to use std::move
explicitly (I'm ignoring the compiler optimisations like RVO for example). In Rust it's the other
way around - everything besides simple types and types having Copy
trait implemented are moved by
default.
Rust hasn't got constructors and destructors. You can initialise a struct directly but the idiomatic
way is to add an associated function pub fn new(...) -> Self
which does the field initialisation.
This is the alternative of the constructors but again it is not enforced by the language. You can
pick another name for the function (and have multiple 'constructors') or initialise the fields
directly. The language/compiler doesn't care.
If you need to perform some actions when the object is destroyed, you must implement the
Drop
trait for it. It consists of a single
drop
function, which gets executed either when the instance goes out of scope or when you destroy
it manually by calling drop()
. On the other hand if you want something specific to be performed
when the object is cloned you must implement
Clone
trait for it. Note that implementing
this trait doesn't change Rust's 'move by default' behaviour.
If copying your object is a cheap operation and you want to enforce 'copy by default' instead of
'move by default' you have to implement
Copy
for it. The trait derives from
Clone
but additionally it makes it copy by default. This is an important difference. Generally my
advise is to avoid implementing Copy
unless you have got a very good reason for it.
Borrow checker and lifetimes
The compiler uses reference lifetimes to figure out when and how a reference is used and to determine if two (or more) references interfere with each other or with the owner of the memory. Thanks to these checks the compiler can catch if a dangling reference is about to be dereferenced or a piece of memory is changed via another reference and issue an error during compilation. Let's see how lifetimes work.
Reference lifetimes
Each variable in Rust has got an associated lifetime. It represents a section in the code where the variable is needed and must be valid. By valid I mean not deallocated (for a regular variable) or not dangling (for a reference). By tracking these lifetimes the compiler checks for conflicts and raises errors if something suspicious happens.
The lifetime of a reference/variable starts when it is created and ends when it is no longer required. This sounds very abstract so lets see some examples.
Let's start with a simple example of a variable and a reference to it. We will build upon it to explore different scenarios:
fn main() { let a = 10; let r = &a; println!("{}", *r); }
Here a
and r
are in the same scope and they have got identical lifetimes. a
can't be dropped
before r
and leave it dangling so we are safe. At this point you might start feeling that a
lifetime of a variable/reference is the same as the scope it lives in. This is a good intuition but
not entirely correct. You'll see why in the following examples. First let's try to create a dangling
reference and see how the compiler reacts:
fn main() { let r; { let a = 10; r = &a; } println!("{}", *r); }
We have got a scope which defines a
and a reference r
outside it. We initialise r
in the inner
scope and try to use it in the other scope. However at the moment we want to dereference r
a
will be already dropped and r
will be dangling. That's why we have got a compilation error:
error[E0597]: `a` does not live long enough --> src/main.rs:5:13 | 4 | let a = 10; | - binding `a` declared here 5 | r = &a; | ^^ borrowed value does not live long enough 6 | } | - `a` dropped here while still borrowed 7 | println!("{}", *r); | -- borrow later used here For more information about this error, try `rustc --explain E0597`.
So far this looks logical. But let's try to remove the last println!
line. The code compiles just
fine. Why? r
is still a dangling reference. Why there is no compilation error? r
is indeed
dangling but the compiler is smart enough to realise that at this point it is not used anymore.
That's what I meant with the scope where a variable lives is not necessary equal to its lifetime. In
that case the lifetime of r
starts when it is initialised with &a
and because we have removed
the println!
macro call it ends on the next line when r
goes out of scope. Let's try to
visualise how the compiler tracks the lifetimes:
The blue boxes represent initialisation and drop points. Yellow boxes represent a potential problems
with the reference (e.g. target being dropped at this case). If we start from the point where r
is
last used (the red box), follow the call graph upward and inspect what happens with the reference
and the source we can detect potential problems. And in this case we do have one - r
becomes
dangling before we try to read from it. Dropping the println!
line though shortens the lifetime of
the reference and the avoids the dangling reference problem.
Multiple references to the same object
Another area covered by the borrow checker is the concurrent usage via references. Concurrent in this context doesn't mean 'used in multiple threads' but having multiple references to a single variable at the same time. Rust has got two rules:
-
Multiple shared references can co-exist (in terms of lifetimes) together.
-
There can be only one mutable reference to variable at any given moment and if such a reference exists no shared references can coexist (again in terms of lifetimes) with it. Additionally the memory can't be accessed via the owner within the lifetime of a mutable reference.
This sounds abstract too so let's see some examples:
fn main() { let a = 5; let ra = &a; let rra = &a; println!("{} {} {}", a, *ra, *rra); }
This is okay, we have got multiple shared references and we use them together. Now let's see an example where mutable and shared references are used together:
fn main() { let mut a = 5; let ra = &a; println!("ra: {}", *ra); let rra = &mut a; *rra = 6; println!("a: {}", a); // println!("{}", *ra); }
This code also works fine. It Remember that the lifetime of a reference starts with its
initialisation and ends at its last usage. In this case ra
is created, used and left alone, so it
is safe to create and use rra
after it. However uncommenting last line will extend ra
's
lifetime, it will overlap with rra
and we will have a compilation error:
error[E0502]: cannot borrow `a` as mutable because it is also borrowed as immutable --> src/main.rs:5:15 | 3 | let ra = &a; | -- immutable borrow occurs here 4 | println!("ra: {}", *ra); 5 | let rra = &mut a; | ^^^^^^ mutable borrow occurs here ... 9 | println!("{}", *ra); | --- immutable borrow later used here For more information about this error, try `rustc --explain E0502`.
If there is a mutable reference to a variable, the initial variable can't be read or modified. For example this code won't compile:
fn main() { let mut a = 5; let ra = &mut a; println!("a: {}", a); *ra = 6; }
Here we try to read from a
during the lifetime of the mutable reference ra
, which is invalid. We
get this compilation error:
error[E0502]: cannot borrow `a` as immutable because it is also borrowed as mutable --> src/main.rs:4:23 | 3 | let ra = &mut a; | ------ mutable borrow occurs here 4 | println!("a: {}", a); | ^ immutable borrow occurs here 5 | *ra = 6; | ------- mutable borrow later used here | = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info) For more information about this error, try `rustc --explain E0502`.
In nutshell the mutable references are exclusive. While one exist - you can't do much with the variable from other places.
Lifetimes and code paths
Finally I want to show how lifetimes spread over different code paths. Have a look at this code:
fn main() { let cond = 3; let mut a = 5; let ra = &mut a; if cond == 3 { println!("a: {}", a); } else { *ra = 6; } }
It compiles just fine. The lifetime of ra
starts from its definition to the assignment in the
else
clause. a
is read in the if
clause. This means that the lifetime of ra
don't intervene
with the read of a
as they happen in different branches. The lifetime of ra
is marked with a
yellow line on the following diagram:
Furthermore we can use a
after the if
statement without a problem because the lifetime of the
mut reference ends in the else
. However using ra
at the same place will yield an error because
we will extend the lifetime of the mutable reference and it will overlap with the usage of a
in
the if
block. Let's see this in practice:
fn main() { let cond = 3; let mut a = 5; let ra = &mut a; if cond == 3 { println!("a: {}", a); } else { *ra = 6; } println!("ra: {}", *ra); }
The lifetime becomes:
The lifetime of ra
(again the yellow line) spans over both branches and reading a
in the if
statement will create a conflict. Luckily the compiler generates a nice error message in this case
too.
Why the borrow checker is so strict in single thread context?
At this point you probably think "what's the benefit of all these restrictions in a single threaded context?". Multiple pointers can coexist in C/C++ and you can use them safely if you are careful. The 'careful' part is the dangerous one - are you sure you are always careful? Let's have a look at an example:
fn main() { let a = vec![0, 1, 2]; let ra = &mut a; drop(a); println!("{}", ra.get(0).unwrap()); }
This code doesn't compile but let's pretend for a second it does. We create a mutable reference in a
single thread context. a
is a Vec
but the only important thing is that Vec
doesn't implement
Copy
. Then we create a mutable reference to it and drop a
immediately after that. At this point we
get a compilation error but let's imagine Rust allows us to do this. After the drop
we try to
access the vector via ra
, which now is a dangling reference. What should Rust do at this point? Of
course it can detect that a
is dropped and prevent us from using the dangling reference if we care
about safety. But this will be very complicated and honestly - how often you do need to write such a
code? My view is that the limitations which the borrow checker enforces on you is the price you pay
for using safe Rust. If you do need to do things like this - unsafe Rust is your friend.
Conclusion
Ownership not an unique concept in Rust. You can achieve the same effect with C++ and its smart pointers. However C++ doesn't stop you from managing memory manually. Not that this is a bad bad thing - quite the opposite. But you need to know what you are doing. You can opt out of this power but it requires discipline.
Safe Rust however forces you to use the single ownership pattern and if you break it you will get a compilation error. If you need the freedom of C/C++' memory management you need to use unsafe Rust (which is bigger and very interesting topic). I like this separation. I prefer not to have too much power if I don't need it.
If you are used to move semantics in C++ you are already thinking like a borrow checker. You know
that you can't move out of a variable and then access it and you are careful not to do this by
mistake. Or even better you are using a tool like clang-tidy
to do this job for you. Rust has this
feature integrated in the compiler. It can be frustrating in the beginning but long term I believe
it will make you write better code.
Finally if you want to learn more about the lifetimes and the borrow checker have a look at the whole Chapter 4 and Section 10.3 from the Rust book.
Comments
Comments powered by Disqus