What is copiable?
What is copiable anyway?
Python is garbage collected and has a reference counting system. This means that when you create an object, it is stored in memory and a reference to it is stored in a variable. When you assign a variable to another variable, the reference count for the object is incremented. When you delete a variable, the reference count is decremented. When the reference count reaches zero, the object is deleted from memory.
This is a very simple explanation of how Python works. There are many more details that I will not go into here. The point is that when you assign a variable to another variable, you are not creating a copy of the object. You are creating a new reference to the same object. This is important to understand because it can lead to some unexpected behavior.
Questions I had:
- What happens when you assign a variable to another variable?
- What happens when you return a complex object (i.e. a class) as part of a tuple from a function?
- What happens when you spin up a subprocess, call a method you defined in one class, and give it an object as an argument?
Toy models:
Let’s create a super simple class that allows us to print the underlying ID (i.e. a pointer address kind of unique assignment for each object).
class Copyable:
def id(self):
return id(self)
def __repr__(self):
return f"Copyable({self.id()})"
Simple copy assignments
a = Copyable()
b = a
print(a)
print(b)
>>> Copyable(140703000000000)
>>> Copyable(140703000000000)
As we can see, normal assignments just do a “pointer” copy and both variables point to the same object. This is what we expect.
What about a list of objects?
a = [Copyable(), Copyable()]
b = a
print(a)
print(b)
>>> [Copyable(140703000000000), Copyable(140703000000001)]
>>> [Copyable(140703000000000), Copyable(140703000000001)]
As we can see, list of objects do a simple element wise pointer
copy.
What about returning from functions?
def return_copyable():
a = Copyable()
print(a)
return a
b = return_copyable()
print(b)
>>> Copyable(140703000000000)
>>> Copyable(140703000000000)
As we can see, returning from functions does a simple pointer
copy and the lifetime of the function scoped variable gets extended to the lifetime of the returned variable.
What about returning from functions with a tuple instead of a single object?
Tuples are non-mutable, but that does not mean they are storing a copy of the object. They are still just storing a reference to the object. In C++ lingo, this is like storing a const pointer to the object.
def return_copyable() -> Tuple[Copyable, str]:
a = Copyable()
print(a)
return a, "hello"
b, c = return_copyable()
print(b)
>>> Copyable(140703000000000)
>>> Copyable(140703000000000)
As we can see, even wrapping it with a tuple does not change the behavior.
So, how do we actually make a true copy?
copy
module in Python does just that:
from copy import deepcopy
a = Copyable()
b = deepcopy(a)
print(a)
print(b)
>>> Copyable(140703000000000)
>>> Copyable(140703000000001)
As we can see, deepcopy
will construct a new object and assign it to the new variable, and then ensure that the fields of this new object matches the previous one. This is a true copy.