Daily Coding Problem 6

Software is beautiful again!

Daily Coding Problem 6 - Rust

Problem 6 - XOR Linked List

The problem statement is as follows:

An XOR linked list is a more memory efficient doubly linked list. Instead of each node holding next and prev fields, it holds a field named both, which is an XOR of the next node and the previous node. Implement an XOR linked list; it has an add(element) which adds the element to the end, and a get(index) which returns the node at index.

If using a language that has no pointers (such as Python) you can assume you have access to get_pointer and dereference_pointer functions that converts between nodes and memory addresses.

NOTE: All the code in this post, including this write-up itself, can be found or generated from the GitHub repository.

Preface to Solution

This problem represents a large gap in my recent blog posts because, when I encountered this problem, I saw it as an opportunity to learn Rust.

For those who are not familiar, here is a short description from Mozilla Research:

Rust is an open-source systems programming language that focuses on speed, memory safety and parallelism. Developers are using Rust to create a wide range of new software applications, such as game engines, operating systems, file systems, browser components and simulation engines for virtual reality.

Rust is a "low-level" language whose syntax is quite close to C/C++. However, it also provides a number of higher-level concepts from other object-oriented and functional programming languages.

In particular, Rust has a very strong concept of memory safety through concepts of ownership and borrowing. While there are lots of other articles about this feature and its implementation, here is how it finally made sense to me: Rust's memory safety is somewhat similar to a pessimistic database lock. Because mutability is very explicit in Rust, there can be as many readers as desired for a single location in memory. However, when a writer wants to mutate that memory, it receives and keeps exclusive access to that memory until the work is complete.

As you can imagine, doubly linked lists are particularly annoying in Rust because a common question that comes up is 'Who owns the current node, the previous or the next node'. There is an excellent article about linked lists in Rust for those wanting a correct and much better introduction to linked lists in Rust.

It took me the better part of the last few months to learn enough about Rust to get this exercise working. I gave up pretty much once a month, wrote this solution partially in Go and in F#, and did a lot of experimentation to get to this point. In the end, I was not able to make my code completely 'safe' in Rust, but I am not sure that is possible with an XOR linked list. In any case, all feedback is welcome and greatly appreciated.

Solution

Data model

The key to this problem, in my mind, is to design the correct data structure that can work with an XOR linked list while also ensuring that objects are not prematurely destroyed (a problem in Rust since the language calls the destructor as soon as the last reference to an object goes out of scope).

Going with the classic model, a linked list is made of a series of nodes. My choice was to create two data structures that, together, represent an XOR linked list:

  1. An XORLinkedListNode is a struct (similar to a F# record, a Java class without methods, or a data transfer object) that holds the following information:
    1. element holds an integer, representing the value of the node.
    2. both holds the XOR'ed memory address.
  2. An XORLinkedList is a struct that holds the following information:
    1. nodes keeps a reference to all XORLinkedListNode objects so that Rust does not drop while they are still part of the list.

In addition to the above, there are two additional considerations to keep in mind:

  • The Rust compiler is free to move objects around in memory for efficiency purposes, whether on the stack or the heap. Because of the strong memory safety that Rust provides, this is considered a relatively safe action. However, if the compiler chooses to move items around, that would invalidate the both pointer. To prevent this, each node has a 3rd element in its struct called _pin.
    • _pin forces Rust to pin an object to a memory address. More information can be found here.
  • By default, Rust creates objects on the stack. To create objects on the heap so that they outlive the execution of a function, Rust provides boxed values.
    • The types of nodes in the XORLinkedList is Vec<Pin<Box<XORLinkedListNode>>>, i.e. a vector of boxed, pinned XORLinkedListNode values.

XORLinkedListNode

XORLinkedListNode provides two methods. THe first is the equivalent of a simple constructor that creates new nodes. The second is a set_both function that changes the value of both for a boxed, pinned node.

unsafe fn set_both(new_both: usize, node: &mut Pin<Box<XORLinkedListNode>>)
{
  let x = node.as_mut();
  Pin::get_unchecked_mut(x).both = new_both;
}

XORLinkedList

XORLinkedList provides the two methods requested in the original problem statement, add and get.

add is responsible for two key actions:

  1. Creating the new node.
  2. Fixing the both attribute of the last node in a non-empty linked list.
// Adds an item to the end of the linked list.
fn add(&mut self, v: i32) -> &XORLinkedList {
  // Linked list is empty, so add the value with an `EMPTY` both
  // pointer.
  if self.nodes.is_empty() {
    // First, create the new node.
    let n = XORLinkedListNode::new(v, None);

    // Add node to the internal vector.
    self.nodes.push(n);

    self
  } else {
    // Convenience variable for current length.
    let l = self.nodes.len();

    // Create new tail node with `both` set to the address of the old tail.
    // Then, adds the new tail to the internal vector.
    self.nodes.push(XORLinkedListNode::new(
      v,
      Option::from(self.nodes[l - 1].deref()),
    ));

    // Fixes the old tail's `both` to point to the address of the new tail.
    unsafe {
      XORLinkedListNode::set_both(
        self.nodes[l - 1].both ^ address(self.nodes[l].deref()),
        &mut self.nodes[l - 1],
      );
    }

    self
  }
}

get uses a recursive helper function to walk the list and retrieve the desired index. In its current implementation, get will panic if the requested index is outside the length of the linked list.

// Returns the `XORLinkedListNode` at the given index. This will panic if
// the index is outside the length of the linked list.
fn get(&self, index: usize) -> &XORLinkedListNode {
  // (Tail-)Recursive function that returns the node at the given index. `n`
  // represents the current node under investigation. `idx` represents the
  // index being counted down (0-based). `prev_address` represents the
  // memory address of the previous node, so that it can be XORed with
  // `n.both` to get the next node's address.
  fn get_helper(
    n: &XORLinkedListNode,
    idx: usize,
    prev_address: usize,
  ) -> &XORLinkedListNode {
    if idx == 0 {
      n
    } else {
      // Get the address of the next node.
      let next_address = n.both ^ prev_address;
      let next_n = next_address as *const XORLinkedListNode;
      get_helper(unsafe { &(*next_n) }, idx - 1, address(n))
    }
  }

  // Call `get_helper`. The previous node of the first entry in the list is
  // always `EMPTY`.
  get_helper(&self.nodes[0], index, EMPTY)
}

Utility methods

The code uses one utility constant and one utility function to make coding easier.

The first is a constant called EMPTY which represents the so-called empty address, i.e. 0x0.

The second is a function that returns the address of an object.

// Gets the memory address of an object.
fn address<T>(elt: &T) -> usize {
  usize::from_str_radix(format!("{:p}", elt).trim_start_matches("0x"), 16)
    .unwrap()
}

Testing

Testing in Rust was extremely easy, thanks to built-in support using the cargo test command.

For this exercise, I did not perform any property-based testing. By the end, I just wanted to get my code working. However, this exercise really highlighted the value of unit testing. It was through testing that I realized that Rust is free to move objects around in memory, even if they were allocated on the heap. I don't know if other languages also have this behavior, but it was very surprising to me. That is what led me down the rabbit hole of both pinned and boxed values.

Conclusion

Even though this was my first coding challenge in Rust, I don't think I could have picked a better problem to help me learn about some key concepts in Rust:

  1. Heap allocations
  2. Memory safety
  3. Unsafe functions
  4. Immutability vs. mutability

I am planning to write the next few solutions in Rust to build up my knowledge of the language.

See you in the next one!