Back in the C++ doldrums

In the early part of my programming career, I spent a lot of time working with C and C++. From batch load scheduling systems running on different Unix flavours to hyper-spectral image processing software used by geologists running on Windows NT. There were always other languages around, but nothing could come close to C or C++ when running complicated algorithms over large data sets. Maybe Fortran, but who wants to write anything in that.

I fondly remember those days of core dumps and debugging buffer over and under-runs at runtime after the debug info was stripped. It was fun.

The world has since moved on. For better and for worse, but mostly better. Every once in a while I get a bit of nostalgia and think I wrote a lot of C and C++ in the past and can easily do it again. Then I get to the point where I realise I have to allocate and deallocate all that memory myself. Or get excited because closures have been added to C++. So I try a little example program with a closure and get a core dump for my trouble. Almost feels like something from another millennium.

Enter Go

I was really excited to read about Go being a general replacement for C. There was still a need for a systems programming language that can produce binaries that don’t need a runtime environment to be installed before the program could run and that would be quite performant. Other languages seem to be quite bloated when installed on a target system, whether due to all the NPM packages, JAR files, Python packages or Ruby gems. Even .Net needs a runtime installed for it to work.

But I felt like there was something that did not gel with me with Go.

I remember a comment from Lars Bak (I think it was him) on Dart, that creating a programming language was way more difficult than building the most performant Javascript engine ever, because everyone has an opinion when it comes to programming languages. Especially with syntax. Not many people have opinions on the technical details of Javascript engines.

But I still felt like I wanted something more from Go. It almost felt like the designers of the language believed there was a large pool of C++, Java and C# programmers out there. If you created something a little bit too different from what they were used to, they would all go charging off into the blue yonder like a massive herd of wildebeest. And then get eaten by crocodiles.

There was a missed opportunity to create something great. I was a bit sad.

And then Rust came along

Here was a language that had similar characteristics as a C/C++ replacement as Go did, but did not shy away from being different. It created fairly small binaries that did not have any runtime dependencies, apart from a standard C library. It was object-orientated (but no inheritance), did not have garbage collection but utilised smart pointers and lifetimes tracked by the compiler. It had memory and thread safety enforced by the compiler. And it was the first language I’ve used where the trait bounds on generic types just made sense.

In over 3 years of writing Rust code, I’ve never experienced a core dump or had to explicitly allocate a single byte of memory.

The Good Things

There are a lot of awesome things that I really love about Rust.

Tuples

I love the Tuples and Enums in Rust. You have four basic data structures: Structs, Tuples, Arrays and Enums. Tuples are a grouping of values with a variety of types into one compound type that are accessed by index. They are generally used as immutable groupings of data. You just declare them using parentheses like (1, 1.0, “hello tuple”) and that now has a type of (u8, f64, &str).

In languages like Javascript or Ruby you may need to associate values using arrays or maps and I now really miss being able to return two values from a function with (1, “hello”) when outside of Rust. Even when using arrays in other languages, they may be backed by linked lists which would make them not that efficient for passing data around. Some languages like Kotlin have Pairs and Triples and Scala has a Tuple class (actuall, Scala has 22 Tuple classes!), but they don’t feel as easy to use as the Tuples in Rust as they are implemented using class generics.

For instance, if you use the map iterator function over a HashMap, it returns the key value pair as a Tuple. 

hashmap.iter().map(|v| println\!(“key = {}, value = {}”, v.0, v.1))

Or with destructive assignment (note that you need the parentheses as that denotes the Tuple type):

hashmap.iter().map(|(key, value)| println!(“key = {}, value = {}”, key, value))

For functions and expressions that don’t return a value, you can use the empty Tuple ().

Enum types, pattern matching and destructive assignment

Rust has very good pattern matching, and is very useful when used with destructive assignment. It’s pretty much equivalent to Scala with case classes in this regard when used with Enums. Enums are types where you enumerate all its possible variants. And the variants can have values associated with them as well as be different structures.

The Option type is an example of this. It is identical to the one in Scala, where it has two variants: Some(T) and None. So you can then use pattern matching like:

match val {
  Some(v) => println!(“We have a value: {}”, v),
  None => println(“No value was provided”)
}


Here, if the variant is a Some, the first branch is executed with the value inside the Some extracted (the original optional is now destroyed).

You can also do it with an if statement:

if let Some(v) = val {
  println!(“We have a value: {}”, v);
}


You can do very powerful things with this. I’ve been using Rust for a Pact implementation (https://github.com/pact-foundation/pact-reference), and I use a JSON parsing library (serde_json) that will parse a JSON document into an Enum. Which makes total sense, as JSON only has seven types, so the serde enum has seven variants. Here is the structure that a whole JSON document can be mapped to (you can see the original here: https://github.com/serde-rs/json/blob/master/src/value/mod.rs#L111)

pub enum Value {
    /// Represents a JSON null value.
    Null,
    /// Represents a JSON boolean.
    Bool(bool),
    /// Represents a JSON number, whether integer or floating point.
    Number(Number),
    /// Represents a JSON string.
    String(String),
    /// Represents a JSON array.
    Array(Vec<Value>),
    /// Represents a JSON object.
    Object(Map<String, Value>),
}

 

Optionals and Results

I’ve already introduced the Option type, but Rust has another Enum type that is used for error handling: the Result<T, E>.

There are two ways of dealing with errors in Rust. Unrecoverable errors with panics (a.k.a. exceptions in other languages) and recoverable errors with Result. Unlike most modern languages, exceptions (panics) are exceptional in Rust and should be used for those situations where there should be a general panic: delete all the data, send the whole family to the bomb shelter and set the house on fire. Not for every day errors. You know, the common ones that are not really errors, but just real states in your business logic. Like when a user enters a non-integer in an integer field. No need to set the house on fire for that.

Like Option, Result has two variants, but each variant has a value associated with it. It’s defined as:

enum Result<T, E> {
    Ok(T),
    Err(E),
}

 

The idea is that if you have a function (or block of code) that can potentially fail, you wrap the result in the Result enum, where T is the type your function returns and E is the error that can occur.

So instead of throwing exceptions, you can now return results that may be failures. You can then pattern match on those. But there is an even more useful operator added to Rust that works with results: the error propagation operator ?.

Say you have a function that does a few things that could fail. A real world example of this is a function that needs to read in a JSON file, generate an HTTP request with the contents of the file, get the response, and then write that out to another file. This has four steps that could each fail. With the error propagation operator, we can write that as something like:


let mut f = File::open("input.json")?;
let mut s = String::new();
f.read_to_string(&mut s)?;
let client = Client::new();
let response = client.post("http://httpbin.org/post")
    .body(s)
    .send()
    .await?;
let mut out_file = File::create("output.json")?;
out_file.write_all(s)?;

 

This example code won’t actually compile (see the section on not so good things), but what the error propagation operator is doing here is checking the result - if it is Ok then unwrapping it, otherwise it immediately returns the result from the current function. It has a similar flow as throwing an exception in other languages, just there is no stack unwind.

Compiler errors

The Rust compiler has the best error messages I have seen in any compiler. They have spent a lot of effort to make them helpful, including dedicating one release of Rust to just improving the error messages.

error[E0515]: cannot return value referencing function parameter `version`
  --> pact_verifier_cli/src/args.rs:10:3
    |
10  | /   App::new(program)
11  | |     .version(version.as_str())
    | |              ------- `version` is borrowed here
12  | |     .about("Standalone Pact verifier")
13  | |     .version_short("v")
...   |
165 | |       .empty_values(false)
166 | |       .help("URL of the build to associate with the published verification results."))
    | |______________________________________________________________________________________^ returns a value referencing data owned by the current function


The error below gives help text on what you need to do to fix the issue.

error[E0621]: explicit lifetime required in the type of `version`
  --> pact_verifier_cli/src/args.rs:10:3
    |
9   |   pub(crate) fn setup_app<'a, 'b>(program: String, version: &str) -> App<'a, 'b> {
    |                                                             ---- help: add explicit lifetime `'b` to the type of `version`: `&'b str`
10  | /   App::new(program)
11  | |     .version(version)
12  | |     .about("Standalone Pact verifier")
13  | |     .version_short("v")
...   |
165 | |       .empty_values(false)
166 | |       .help("URL of the build to associate with the published verification results."))
    | |______________________________________________________________________________________^ lifetime `'b` required


Here is the compiler telling us that we are missing the await call. I’ve seen lots of issues with Javascript because we forgot to await on an async function call. The Rust compiler even shows you the change you need to make.

error[E0308]: mismatched types
  --> pact_verifier/src/pact_broker.rs:382:5
    |
382 |     self.send_document(url, body, Method::POST)
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |     |
    |     expected enum `std::result::Result`, found opaque type
    |     help: consider using `.await` here: `self.send_document(url, body, Method::POST).await`
    |
   = note: expected enum `std::result::Result<(), pact_broker::PactBrokerError>`
           found opaque type `impl core::future::future::Future`

 

Type inference

Rust also has really good type inference. You generally don’t need to provide the types for variables, it can infer them from how the variable is used. There are a few places where you need to provide the type when the compiler can’t infer it. One example is collecting an iterator of strings:

let file_names = parameters.iter().map(|param| param.name).collect();


Here the problem is we have an iterator of Strings, but we can collect them into both a Vec<String> and a String inself. The compiler won’t know which one to choose, because in Rust a String is also a collection (a collection of chars).

Macros

With C and C++, macros were used a lot, but they were also very painful to work with when they expanded to invalid code. They basically were text expansion that was run before the lexical analyzer. You did not have much in the way of seeing what the expanded source code was. 

In Rust the macros are actually run against the token stream. This makes them really good for building DSLs. You can create a macro that accepts totally non-Rust code.

A simple example of this is the hashmap macro from the maplit crate:

let map = hashmap!{
    "a" => 1,
    "b" => 2,
};


Here the “a” => 1 syntax can normally only be used in the case arm of a pattern match statement.

Another example is the json macro from the serde_json crate:

let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
});


That allows you to create a JSON document in Rust code that looks very similar to JSON.

Generics and Generic Type Bounds

Reasoning about errors with generic types has always been a pain. Scala was the worst. I found myself having to keep changing Scala code until I got a different error, because I could not understand the error the compiler was giving me.

Kotlin has a different system that seems to be easier to understand. A generic type can be either specified as in or out. “In” types are what is passed in as parameters, and “out” types are what are returned. This works ok for most cases (like a class that has a function that receives a parameter and returns a different one). But I found the model more difficult to understand for other cases (like a class that has a generic type as it’s inner state, it neither receives it or returns it).

Rust generic types seem to just make sense. You can provide trait bounds for any generic type, and this just means the types that can be used for the generic type have to implement the defined traits. It is kind of like duck-typing.

For example, in Rust you can define a struct like:

struct MyStruct<T> where T: Clone + Display {
}


What this is doing is putting a bound on the types you can use with MyStruct. It means you can use any type, as long as that type has implemented the Clone and Display traits (traits in Rust are the same as interfaces in other languages). Very duck-typey.

Actually, saying that is doing it an injustice because it’s compile time checked. I find in more dynamic languages we talk about things that quack like a duck, but then you get runtime errors because you received a rhinoceros and an airplane. With Rust you will only ever get an airplane if it is a quacking type of airplane. The compiler will make sure of that.

The less than Good Things

Everything has them. Even Rust. There are some really annoying things about the language. I’ll go through a few that I have found.

Writing tests in Rust

This is one of the big issues I have with Rust. I find dynamic languages more intuitive for writing tests than statically typed ones. One of my other projects, I use Kotlin but then the tests are all written in Groovy.

Tests in Rust are just functions annotated with test annotation. For instance:

#[test]
fn match_request_returns_a_not_found_for_no_interactions() {
    let request = Request::default();
    let interactions = vec![];
    let result = match_request(&request, &interactions);
    expect!(result).to(be_equal_to(MatchResult::RequestNotFound(request)));
}


You can mix the test code with the actual code, if you want to. I prefer having the tests in a separate module, but you can have all the tests immediately following the actual function.

The problem starts when the borrow checker stops you reusing variables. It can make writing tests annoying. It’s one thing for application code to be type and memory safe, but with test code it doesn’t need to be. If there is a memory or type issue in the test code, it will just fail the test, which is what it is there for anyway.

Fighting the borrow checker

This is the big one with Rust. The borrow checker doesn’t allow variables to be reused once ownership has passed to another function. You’ll need to clone them so you get your own copy. It feels like when you give a book to a friend, you can never read that book again as they own it now and they can’t give it back.

Here is an example of an error that I wasn’t sure of how to resolve:

error[E0515]: cannot return value referencing function parameter `version`
  --> pact_verifier_cli/src/args.rs:10:3
    |
10  | /   App::new(program)
11  | |     .version(version.as_str())
    | |              ------- `version` is borrowed here
12  | |     .about("Standalone Pact verifier")
13  | |     .version_short("v")
...   |
165 | |       .empty_values(false)
166 | |       .help("URL of the build to associate with the published verification results."))
    | |______________________________________________________________________________________^ returns a value referencing data owned by the current function


In this case, I’m using the clap crate to provide the handling of the command line parameters. One of the things that clap provides is handling the version (-v) parameter. You just need to provide the version of the application. Except, we can’t pass the version to clap as it is borrowed.

The problem here is that we’re passing a version in (it’s a string slice), and then passing the App struct from the function. Hence, we’re returning a value from the function (App) that references data owned by the function (version). The fix in this case is not clear, but adding a lifetime to the version that is linked to the App resolved the issue. Variables in Rust have two basic values: the type of the variable and its lifetime value. Lifetimes are what the compiler uses to know when it is safe to delete something. Most of the time these are inferred by the compiler, but sometimes you need to define what the lifetime value of a variable is.

Here is the fix:

 fn setup_app<'a, 'b>(program: String, version: &**'b** str) -> App<'a, 'b>


This function is defining two lifetimes which are required for the App struct. Then it links one of the lifetimes (b) to the version pointer. So now the compiler knows that the version string slice should not be dropped before the App ever is.

Here is another error you can get: moved values.

error[E0382]: use of moved value: `hal_client`
  --> pact_verifier/src/pact_broker.rs:643:15
    |
634 |   let hal_client = hal_client.clone().with_doc_context(links)?
    |       ---------- move occurs because `hal_client` has type `pact_broker::HALClient`, which does not implement the `Copy` trait
...
643 |         match hal_client.put_json(hal_client.parse_link_url(&link, &template_values)?, "{}".to_string()).await {
    |               ^^^^^^^^^^ value moved here, in previous iteration of loop
 


This is an example of an error that can be very confusing for a new user. We are using a for loop in this case, and need to post some JSON to another server for each item in the loop. But the hal_client object moves into the first loop iterations, but then can’t be used in the later iterations.

The cause of this particular error is not to do with the hal_client variable at all, but the put_json function. It was defined as:

async fn put_json(self, url: String, body: String) -> Result<(), PactBrokerError>


Which means it moves itself (the self parameter) into the function, and then can’t be used again. One solution would be to change the parameter to &self so it borrows a reference to itself.

That fix is fine when you have control, but when it is in another crate, you would probably have to clone the object. 

Here is another example:

 
error[E0382]: borrow of moved value: `tag`
  --> pact_verifier/src/pact_broker.rs:646:69
    |
638 |       for tag in provider_tags {
    |           --- move occurs because `tag` has type `std::string::String`, which does not implement the `Copy` trait
...
641 |           "tag".to_string() => tag
    |                                --- value moved here
...
646 |             error!("Failed to push tag {} for provider version {}", tag, version);
    |                                                                     ^^^ value borrowed here after move
 


This is the same bit of code that had the previous error. Here we have a for loop, which is building up a hashmap, passing that to the put_json function, and displaying an error if something goes wrong. But we can’t use the tag variable in the error message because it has already been used to create the hash map.

Here is an issue I had with an XML library that I was using. I wanted to generate some XML from a DOM into a String, but the write method of that library did not take the buffer as a reference. So you could create a buffer, pass it in to be filled but then you couldn’t use it because the buffer was now owned by the write function.

 
error[E0382]: use of moved value: `buff`
  --> pact_matching/src/generator_tests.rs:360:29
    |
358 |   let buff = Vec::new();
    |       ---- move occurs because `buff` has type `std::vec::Vec<u8>`, which does not implement the `Copy` trait
359 |   dbg!(xml_handler.value).write(buff);
    |                                 ---- value moved here
360 |   expect!(String::from_utf8(buff).unwrap()).to_not(be_equal_to("".to_string()));
    |                             ^^^^ value used here after move
 

In this case I had to write the XML to a file, and then read it in afterwards. And, of course, I had to clone the file handle because I couldn’t re-use the one that has been used to write the file!

Chained Iterator calls or try with error types

One of the areas I have difficulty is when we either have chained different iterator operations (for instance you might do a map, then a filter, then a group_by), and the types don’t match. This is especially the case when using results where the error would be different.

Another example would be with the error propagation operator. We introduced the example before of a function that needs to read in a JSON file, generate an HTTP request with the contents of the file, get the response, and then write that out to another file. In theory, you could write something like (pseudo code):

let val = read(“/tmp/file.json”)?;
 let result = client.post(url, val)?;
 write(output_file, result)?


The error propagation function is meant to allow you to not worry about the errors, because they would be returned immediately from the function. But the problem is that the error type returned from the function will probably be different than the type returned by the file functions as well as the type returned by the HTTP client library. So you probably end up having to write that code as:

let val = read(“/tmp/file.json”).map_err(|err| err.description())?;
 let result = client.post(url, val).map_err(|err| format!(“{}”, err)?;
 write(output_file, result).map\_err(|err| err.description())?;


And now it’s not as nice as it could have been.

There is no inheritance

Rust is an Object-Orientated language, but does not have inheritance that is common for other OO languages. You can implement traits, but you can’t have a common base class. There are problems that are easily solved with inheritance (like using the strategy pattern), and if you are converting code bases from languages like C++, Java or C#, you’ll probably not be able to convert it without re-writing large sections of it.

I think this is the main issue when dealing with errors (as described in the previous section). In other languages, HTTP Error would extend IO Error, which would in turn extend Runtime Error. So you could just use one of the base error classes and you don’t have the issues with all the types not matching.

Slow Compiler!

The other things are painful, but I’ve found the compiler times to be really slow. I suspect as I get better at the language, the other issues will be less of a problem. The compiler speed is unlikely to be one of them (which impacts the TDD red/green cycles).

The Rust compiler has to do a lot of work, and they have a policy of not adding anything to the language that will affect runtime performance. So I think this is an example of where the language designers have taken runtime performance over development performance and is the opposite of Go where the compiler is really fast. 

Conclusion

I’ve really enjoyed my time writing this in Rust. It’s one of the few languages that has changed the way I solve problems in other languages. But it has quite a steep learning curve. As one of my colleagues said “Rust is really interesting, but it is a big language.”

Rust may not be suitable for larger enterprises with large code bases. Mainly because of the slow compiler. But it is a statically typed language, so that does help. But it feels more likely to be a language that is good to embed into larger code bases.

For multiple teams potentially migrating from C/C++ it is probably a good language to use to replace parts where performance or runtime safety is important. For C, you could probably replace all of it with Rust. But if you have large C++ class hierarchies, it may be too much work and it would be better to embed it into the C++ project.

I don’t think I would use Rust for everything, probably not for building websites or restful web services. But it definitely has a sweet spot, particularly for single binary command line utilities, or for embedding in other languages. These are the two things I’ve used it for. It is also good for solutions where performance is key like messaging or stock market trading. 

But it probably won’t be a language for everyone. Which is a good thing.