Be Careful When Copying Mutable Data Types

Jeremy Grifski - May 6 '19 - - Dev Community

Recently, I was working on an article about list comprehensions in Python when I thought it would be helpful to talk a little bit about making copies of variables. In particular, I want to take a moment address some of the risks when copying mutable data types.

Immutability

Before we talk about copying variables, it’s important discuss an important programming language feature called immutability. Immutability describes a variable which cannot be changed. In other words, immutable variables are constants.

More specifically, immutability implies that a variable cannot be mutated. For example, an immutable string cannot have any characters changed or removed without creating a completely new string in the process. We often see this when working with numbers in a language like Java or Python:

num = 5 
copy = num
Enter fullscreen mode Exit fullscreen mode

Naturally, we would expect that anything that happens to copy has no effect on num. That’s because numbers are typically immutable. In other words, the 5 that is stored in num has an identity that is unique from the 5 that is stored in copy.

Unfortunately, in most programming languages, immutability has very limited support. As a result, variables beyond numbers and strings are typically mutable which means the code snippet above will fail to create a copy. Instead, you’ll get what’s called “spooky action at a distance” in quantum entanglement. In other words, whatever you do to one variable will happen to the other variable.

The Basics of Copying

Since most languages lack support for immutability, we’re stuck dealing with the consequences when creating copies. In particular, we have to create new variables with all the same properties of the variable we’d like to copy by hand. In the following subsections, we’ll look at how this plays out.

Copying a List in Python

If we wanted to copy a list in Python, we might try the following:

my_list = [1, 2, 3] 
my_copy = my_list
Enter fullscreen mode Exit fullscreen mode

If we poke around, we’ll notice that both lists are in fact the same. What a big success, right? Maybe we should take another look:

my_copy[1] = 7 
print(my_list) # Prints [1, 7, 3]... uh oh!
Enter fullscreen mode Exit fullscreen mode

As we can see, lists in Python are mutable. When we created a “copy,” we actually copied the reference—not the contents of the list.

To create a proper copy, we need to iterate over the list and add each element to a new list:

my_copy = [item for item in my_list]
Enter fullscreen mode Exit fullscreen mode

Here, we used a list comprehension to create a copy of the original list. Now when we manipulate the new list, we don’t have to worry about corrupting the old list. But, is this enough?

Copying Nested Lists in Python

As it turns out, a list comprehension isn’t guaranteed to perform a proper copy. For example:

my_list = [[1, 2], [2, 7]] 
my_shallow_copy = [item for item in my_list]
Enter fullscreen mode Exit fullscreen mode

Here, we’ve created a shallow copy of my_list. While the new list has a unique identity from the original list, the contents of both lists are the same. In other words, the following is safe:

my_shallow_copy.append([5, -4]) 
print(my_list) # Prints [[1, 2], [2, 7]]
Enter fullscreen mode Exit fullscreen mode

However, changing any of the nested elements will result in corruption of both lists:

my_shallow_copy[0][1] = -4 
print(my_list) # prints [[1, -4], [2, 7]]... uh oh!
Enter fullscreen mode Exit fullscreen mode

If we want perform a deep copy in this case, we have to copy the nested lists as well:

my_deep_copy = [[item for item in sub_list] for sub_list in my_list]
Enter fullscreen mode Exit fullscreen mode

Naturally, this leads us to writing a recursive function that can handle an n-dimensional matrix:

def deep_copy(item): 
  if type(item) is list: 
    return [deep_copy(sub_list) for sub_list in item] 
  else: 
    return item
Enter fullscreen mode Exit fullscreen mode

Of course, even this deep copy function can only go so far. What if our lists contain mutable objects?

Copying Mutable Objects in Python

At this point, we’re fairly comfortable with copying immutable data types like numbers and strings as well as mutable data types like lists, but what if the data types we’re dealing with are something else? For example, what if we make our own class as follows:

class Votes: 
  def __init__(self): 
    self.pro = list() 
    self.anti = list()
Enter fullscreen mode Exit fullscreen mode

Here, we’ve created some class that represents a set of votes which maintains two lists: pro (for) and anti (against). We can populate those lists with unique IDs that represent voters:

town_votes = Votes() 
town_votes.pro.append("109437139") 
town_votes.pro.append("476524275") 
town_votes.pro.append("794314532") 
town_votes.anti.append("420901790")
Enter fullscreen mode Exit fullscreen mode

Great, now we can do fun things like count the votes for and against:

len(town_votes.pro) # 3 
len(town_votes.anti) # 1
Enter fullscreen mode Exit fullscreen mode

Now, let’s say we have several people counting the votes, so we can make sure we got it right. For security purposes, we want to create a deep copy of the town_votes objects, so corrupt individuals don’t ruin the counts for everyone. If they try, they should fail during the final check.

Of course, how do we copy our town_votes object? For example, would something like this work:

duplicate = town_votes
Enter fullscreen mode Exit fullscreen mode

Of course not. We’ve only copied the reference which results in the same problem we had with lists. But, what if we make a new Votes object and duplicate its references:

duplicate = Votes() 
duplicate.pro = town_votes.pro 
duplicate.anti = town_votes.anti
Enter fullscreen mode Exit fullscreen mode

Sure, we now have a new Votes object, but there’s still a problem: the pro and anti lists are the same. In other words, we’ve only created a shallow copy of our Votes object. Luckily, we know a thing or two about cloning lists:

duplicates.pro = [id for id in town_votes.pro] 
duplicates.anti = [id for id in town_votes.anti]
Enter fullscreen mode Exit fullscreen mode

Now, we have a deep copy of our town_votes object. If someone were to come along and tamper with the copy, we’d still be okay.

Copying Constructors

What we just managed to accomplish with the Votes object is known as a deep copy. Naturally, the process scales up quickly depending on how many references our object is storing. What can make matters worse is if those references store references. To deal with this, it’s not uncommon for libraries to implement what is known as a copy constructor:

def __init__(self, to_copy=None): 
  if to_copy: 
    self.pro = [id for id in to_copy.pro] 
    self.anti = [id for id in to_copy.anti] 
  else: 
    self.pro = list() 
    self.anti = list()
Enter fullscreen mode Exit fullscreen mode

Then, if we ever want a deep copy of our Votes object, we’ll provide it as the input to the constructor. And, if our votes lists contained references (like hypothetical Voter objects), we could call their copy constructor directly from the list comprehension:

def __init__(self, to_copy=None): 
  if to_copy: 
    self.pro = [Voter(id) for id in to_copy.pro] 
    self.anti = [Voter(id) for id in to_copy.anti] 
  else: 
    self.pro = list() 
    self.anti = list()
Enter fullscreen mode Exit fullscreen mode

Of course, there are challenges when performing a deep copy. Perhaps the most dangerous are circular references where one object points to another and the other points back. During copying, both objects would need to construct each other in an infinite loop. To deal with that, you usually need to maintain some sort of reference lookup table to see if you’ve ever duplicated that object in the past.

In any case, Python does provide copying libraries that can handle all this fun stuff for you within reason. I won’t go into that here because I wasn’t planning to write a Python article, but you’re welcome to dig into the documentation yourself.

Attack of the Clones

At this point, I hope you’re more comfortable with concepts like immutability and cloning. These concepts apply to just about every popular language used today like C, C++, JavaScript, and Java. You’d be hard pressed to find a language which implements total immutability, but there are a few that exist. I believe most functional languages try to avoid the notion of state, so you might be able to avoid this cloning issue using languages like Haskell.

While you’re here, I recommend checking some of the following articles:

And, if you’re feeling extra generous, check out the membership page for subscription information. Every little bit helps!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .