Dead Simple Python: Data Typing and Immutability

Jason C. McDonald - Jan 17 '19 - - Dev Community

Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.


I received a lovely comment on this series from Damian Rivas...

I just read the first two parts that are currently released. I gotta say, with Python being around the 5th language I dive into, I really appreciate this style of teaching! It's hard finding teaching material that doesn't start with "what is a variable" lol.

Hate to disappoint, Damian, but I couldn't avoid variables forever!

Okay, okay, I'm not going to bore y'all with explanation #2,582,596 of variables. You're smart people, I'm sure you know all about them by now.

But now that we're all set up to write code, I think it's worth touching on what a variable is in Python. While we're at it, we'll take a look at functions, strings, and all that other dull, boring stuff...which may well turn out not to be that boring under the hood. There's a lot of information here, but I believe it makes the most sense when understood together.

Welcome to Python. Please mind the furniture on your way down the rabbit hole.

A Pedantic Point

I use the term "variables" throughout this entire series, mainly because that's the standard term across languages. It is valid to use that term in Python, and it is even acknowledged in the official documentation.

However, the technical term for a Python variable is actually a name; that relates to the entire concept of "name binding" I'll be referring to later on.

Use whichever term you're comfortable with. Just understand that Python "variables" are officially referred to as "names," and you're liable to hear both.

Where's The Datatypes?!?

In summer 2011, I sat on the porch swing in Seattle, Washington and logged onto Freenode IRC. I had just decided to switch languages from Visual Basic .NET to Python, and I had some questions.

I joined #python and jumped right in.

How do you declare the data type of a variable in Python?

Within moments, I received a response which I consider to be my first true induction into the bizarre world of programming.

<_habnabit> you're a data type

He and the rest of the room regulars were quick to fill me in. Python is a dynamically typed language, meaning I don't have to go and tell the language what sort of information goes in a variable. I don't even have to use a special "variable declaration" keyword. I just assign.

netWorth = 52348493767.50
Enter fullscreen mode Exit fullscreen mode

At that precise moment, Python became my all-time favorite language.

Before we get carried away, however, I must point out that Python is still a strongly-typed language.

Um, dynamically typed? Strongly typed? What does all that means?

  • Dynamically typed: the data type of a variable (object) is determined at run time. Contrast with "statically typed," where we declare the object's data type initially. (C++ is statically typed.)

  • Strongly typed: the language has strict rules about what you can do to different data types, such as adding an integer and a string together. Contrast with "weakly typed," where the language will let you do practically anything, and it'll figure it out for you. (Javascript is weakly typed.)

(If you want a more advanced explanation, see Why is Python a dynamic language and also a strongly typed language).

So, to put that in other terms: Python variables have data types, but the language automatically figures out what that data type is.

So, we can reassign a variable to contain whatever data we want...

netWorth = 52348493767.50
netWorth = "52.3B"
Enter fullscreen mode Exit fullscreen mode

...but we are limited on what we can do to it...

netWorth = 52348493767.50
netWorth = netWorth + "1 billion"
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> TypeError: unsupported operand type(s) for +: 'float' and 'str'
Enter fullscreen mode Exit fullscreen mode

If we ever need to know what type of variable something is, we can use the type() function. That will print out what class the variable is an instance of. (In Python, everything is an object, so get your object-oriented hat on.)

netWorth = 52348493767.50
type(netWorth)
>>> <class 'float'>
Enter fullscreen mode Exit fullscreen mode

We may actually want to check what the datatype is before we do something with it. For that, we can pair the type() function with the is operator, like this:

if type(netWorth) is float:
    swimInMoneyBin()
Enter fullscreen mode Exit fullscreen mode

However, in many cases, it may be better to use isinstance() instead of type(), as that will account for subclasses and inheritance (object-oriented programming, anyone?) Bonus, the function itself returns True or False...

if isinstance(netWorth, float):
    swimInMoneyBin()
Enter fullscreen mode Exit fullscreen mode

Now, the fact is, we rarely check with isinstance(). Pythonistas prefer the philosophy of duck typing; that is, instead of checking what the type is, we simply look for the features we need on the object. If it looks like a duck, walks like a duck, quacks like a duck, it must be a duck. Nevermind if it's actually a robotic duck, or a moose in a duck costume; if it has the traits we need, the rest is usually a moot point.

The Immutable Truth

Since I just introduced that is operator, we'd better clear something up: is and == do not do the same thing!

A lot of Python novices discover that this works...

richestDuck = "Scrooge McDuck"

if richestDuck is "Scrooge McDuck":
    print("I am the richest duck in the world!")

if richestDuck is not "Glomgold":
    print("Please, Glomgold is only the SECOND richest duck in the world.")

>>> I am the richest duck in the world!
>>> Please, Glomgold is only the SECOND richest duck in the world.
Enter fullscreen mode Exit fullscreen mode

"Oh, that's cool!" said a certain young developer in Seattle. "So, in Python, I just use is and is not for comparisons."

WRONG, WRONG, WRONG. Those conditional statements worked, but not for the reason I thought. This faulty logic surrounding is falls apart as soon as you try this...

nephews = ["Huey", "Dewey", "Louie"]

if nephews is ["Huey", "Dewey", "Louie"]:
    print("Scrooge's nephews identified.")
else:
    print("Not recognized.")

>>> Not recognized.
Enter fullscreen mode Exit fullscreen mode

"Wait, WHAT?" You might poke at this a bit, even confirming that nephews is nephews evaluates to True. So what in dismal downs is going on?

The trouble is, the is operator checks to see if the two operands are the same instance, and Python has these funny things called immutable types.

Oversimplifying it, when you have something like an integer or a string, only one of that piece of data actually exist in the program's memory at once. Earlier, when I created the string "Scrooge McDuck", there was only one in existence (isn't there always?) If I say...

richestDuck = "Scrooge McDuck"
adventureCapitalist = "Scrooge McDuck"
Enter fullscreen mode Exit fullscreen mode

...we would say that both richestDuck and adventureCapitalist are bound to this one instance of "Scrooge McDuck" in memory. They're like a couple of sign posts, both pointing to the exact same thing, of which we only have one.

To put that another way, if you're familiar with pointers, this is a little like that (without the scary sharp edges). You can have two pointers to the same place in memory.

If we changed one of those variables, say richestDuck = "Glomgold", we'd be rebinding richestDuck to point to something different in memory. (We'd also be full of beans for claiming Glomgold is that rich.)

Mutable types, on the other hand, can store the same data multiple times in memory. Lists, like ["Huey", "Dewey", "Louie"], are one of those mutable types, which is why the is operator reported what it did earlier. The two lists, although they contained the exact same information, were not the same instance.

Technical Note: You should be aware that immutability isn't actually related to sharing only one instance of a thing, although that's a common side effect. It's a useful way to imagine it, but don't rely on it to always be so. Multiple instances can exist. Run this in an interactive terminal to see what I mean...

a = 5
b = 5
a is b
>>> True
a = 500
b = 500
a is b
>>> False
a = 500; b = 500; a is b
>>> True
Enter fullscreen mode Exit fullscreen mode

The actual truth behind immutability is a lot more complicated. My Freenode #python friend Ned Batchelder (nedbat) has an awesome talk about all this, which you should totally check out.

So, what are we supposed to use instead of is? You'll be happy to know, it's just good old fashioned ==.

nephews = ["Huey", "Dewey", "Louie"]

if nephews == ["Huey", "Dewey", "Louie"]:
    print("Scrooge's nephews identified.")
else:
    print("Not recognized.")

>>> Scrooge's nephews identified.
Enter fullscreen mode Exit fullscreen mode

As a rule, you should always use == (etc.) for comparing values, and is for comparing instances. Meaning, although they appear to work the same, the earlier example should actually read...

richestDuck = "Scrooge McDuck"

if richestDuck == "Scrooge McDuck":
    print("I am the richest duck in the world!")

if richestDuck != "Glomgold":
    print("Please, Glomgold is only the SECOND richest duck in the world.")

>>> I am the richest duck in the world!
>>> Please, Glomgold is only the SECOND richest duck in the world.
Enter fullscreen mode Exit fullscreen mode

There's one semi-exception...

license = "1245262"
if license is None:
    print("Why is Launchpad allowed to drive, again?")
Enter fullscreen mode Exit fullscreen mode

It's somewhat common to check for a non-value with foo is None because there's only one None in existence. Of course, we could also just do this the shorthand way...

if not license:
    print("Why is Launchpad allowed to drive, again?")
Enter fullscreen mode Exit fullscreen mode

Either way is fine, although the latter is considered the cleaner, more "Pythonic" way to do it.

Word of Warning: Hungarian Notation

When I was still new to the language, I got the "brilliant" idea to use Systems Hungarian notation to remind me of my intended data types.

intFoo = 6
fltBar = 6.5
strBaz = "Hello, world."
Enter fullscreen mode Exit fullscreen mode

Turns out, that idea was neither original nor brilliant.

To begin with, Systems Hungarian notation is a rancid misunderstanding of Apps Hungarian notation, itself the clever idea of Microsoft developer Charles Simonyi.

In Apps Hungarian, we use a short abbreviation at the start of a variable name to remind us of the purpose of that variable. He used this, for example, in his development work on Microsoft Excel, wherein he would use row at the start of any variable relating to rows, and col at the start of any variable relating to columns. This makes the code more readable and seriously helps with preventing name conflicts (rowIndex vs colIndex, for example). To this day, I use Apps Hungarian in GUI development work, to distinguish between types and purposes of widgets.

Systems Hungarian, however, misses the entire point of this, and prepends an abbreviation of the data type to the variable, such as intFoo or strBaz. In a statically typed language, it's bright-blazingly redundant, but in Python, it might feel like a good idea.

The reason it isn't a good idea, however, is that it robs you of the advantages of a dynamically typed language! We can store a number in a variable one moment, and then turn around and store a string in it the next. So long as we're doing this in some fashion that makes sense in the code, this can unlock a LOT of potential that statically typed languages lack. But if we're mentally locking ourselves into one pre-determined type per variable, we're effectively treating Python like a statically typed language, hobbling ourselves in the process.

All that to say, Systems Hungarian has no place in your Python coding. Frankly, it doesn't have a place in any coding. Eschew it from your arsenal immediately, and let's never speak of this again.

Casting Call

Let's take a break from the brain-bending of immutability, and touch on something a little easier to digest: type casting.

No, not the kind of type casting that landed David Tennant the voice role of Scrooge McDuck....although he is completely awesome in that role.

I'm talking about converting data from one data type to another, and in Python, that's about as easy as it gets, at least with our standard types.

For example, to convert an integer or float to a string, we can just use the str() function.

netWorth = 52348493767.50
richestDuck = "Scrooge McDuck"
print(richestDuck + " has a net worth of $" + str(netWorth))
>>> Scrooge McDuck has a net worth of $52348493767.5
Enter fullscreen mode Exit fullscreen mode

Within that print(...) statement, I was able to concatenate (combine) all three pieces into one string to be printed, because all three pieces were strings. print(richestDuck + " has a net worth of $" + netWorth) would have failed with a TypeError because Python is strongly-typed (remember?), and you can't combine a float and a string outright.

You may be a bit confused, because this works...

print(netWorth)
>>> 52348493767.5
Enter fullscreen mode Exit fullscreen mode

That's because the print(...) function automatically handles the type conversion in the background. But it can't do anything about that + operator - that happens before the data is handed to print(...) - so we have to do the conversion there ourselves.

Naturally, if you're writing a class, you'll need to define those functions yourself, but that's beyond the scope of this article. (Hint, __str__() and __int__() handle casting the object to a string or integer, respectively.)

Hanging By A...String

While we're on the subject of strings, there's a few things to know about them. Most confusing of all, perhaps, is that there are multiple ways of defining a string literal...

housekeeper = "Mrs. Beakley"
housekeeper = 'Mrs. Beakley'
housekeeper = """Mrs. Beakley"""
Enter fullscreen mode Exit fullscreen mode

We can wrap a literal in single quotes '...', double quotes "...", or triple quotes """...""", and Python will treat it (mostly) the same way. There's something special about that third option, but we'll come back to it.

The Python style guide, PEP 8, addresses the use of single and double quotes:

In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

This comes in handy when we deal with something like this...

quote = "\"I am NOT your secretary,\" shouted Mrs. Beakley."
quote = '"I am NOT your secretary," shouted Mrs. Beakley.'
Enter fullscreen mode Exit fullscreen mode

Obviously, that second option is much more readable. The backslash before the quotes means we are wanting that literal character, not to have Python treat it like the boundary of a string. However, because the quotes we wrap the string in have to match, if we wrap in single quotes, Python will just assume the double quotes are characters in the string.

The only time we'd really need those backslashes would be if we had both types of quotes in the string at once.

print("Scrooge's \"money bin\" is really a huge building.")
>>> Scrooge's "money bin" is really a huge building.
Enter fullscreen mode Exit fullscreen mode

Personally, in cases like that, I prefer to use (and escape) the double quotes, because they don't escape my attention like an apostrophe will tend to do.

But remember, we also have those triple quotes ("""), which we could use here too.

print("""Scrooge's "money bin" is really a huge building.""")
>>> Scrooge's "money bin" is really a huge building.
Enter fullscreen mode Exit fullscreen mode

Before you start wrapping all your strings in triple quotes for convenience, however, remember that I said there was something special about them. In fact, there's two things.

First, triple quotes are multiline. In other words, I can use them to do this...

print("""\
Where do you suppose
    Scrooge keeps his
        Number One Dime?""")
>>> Where do you suppose
>>>    Scrooge keeps his
>>>        Number One Dime?
Enter fullscreen mode Exit fullscreen mode

Everything, including newlines and leading whitespace, is literal in triple quotes. The only exception is if we escape something using a backslash (\), like I did with that newline at the beginning. We typically do that, just to make the code cleaner.

The built-in textwrap module has some tools for working with multi-line strings, including ones that allow you to have "proper" indentation without it being included (textwrap.dedent).

The other special use of triple quotes is in creating docstrings, which provide basic documentation for modules, classes, and functions.

def swimInMoney():
    """
    If you're not Scrooge McDuck, please don't try this.
    Gold hurts when you fall into it from forty feet.
    """
    pass
Enter fullscreen mode Exit fullscreen mode

These are often mistaken for comments, but they're actually valid code that is evaluated by Python. A docstring must appear on the first line of whatever it's about (such as a function), and has to be wrapped in triple quotes. Later, we can access that docstring in one of two ways, both shown here:

# This always works
print(swimInMoney.__doc__)

# This works in the interactive shell only
help(swimInMoney)
Enter fullscreen mode Exit fullscreen mode

Special String Types

I want to briefly touch on two other types of strings Python offers. Actually, they're not really different types of strings - they're all immutable instances of the class str - but the string literal is processed a bit differently by the language.

Raw strings are preceded with an r, such as...

print(r"I love backslashes: \ Aren't they cool?")
Enter fullscreen mode Exit fullscreen mode

In a raw string, the backslash is treated like a literal character. Nothing can be "escaped" inside of a raw string. This has implications for what type of quotes you use, so beware.

print("A\nB")
>>> A
>>> B
print(r"A\nB")
>>> A\nB
print(r"\"")
>>> \"
Enter fullscreen mode Exit fullscreen mode

This is particularly useful for regular expression patterns, where you're likely to have plenty of backslashes that you want as part of the pattern, not interpreted out by Python before it gets there. Always use raw strings for regular expression patterns.

Gotcha Alert: If the backslash is the last character in your raw string, it'll still act to escape out your closing quote, and create a syntax error as a result. That has to do with Python's own language lexing rules, not with strings.

The other "type" of string is a formatted string, or f-string, which is new as of Python 3.6. It allows you to insert the values of variables into a string in a very pretty way, without having to bother with concatenation or conversion like we did earlier.

We precede the string with an f. Inside, we can substitute our variables by wrapping them in {...}. We put it all together like this...


netWorth = 52348493767.50
richestDuck = "Scrooge McDuck"
print(f"{richestDuck} has a net worth of ${netWorth}.")
>>> Scrooge McDuck has a net worth of $52348493767.5.
Enter fullscreen mode Exit fullscreen mode

You're not just limited to variables in those curly braces ({...}) either! You can actually put just about any valid Python code in there, including math, function calls, expressions...whatever you need.

Compared to the older str.format() methods and % formatting (neither of which I'll be covering here), f-strings are much faster. That's because they're evaluated before the code is run.

Formatted strings were defined by PEP 498, so go there for more information.

Functions

While we're getting basic stuff out of the way, let's talk a bit about Python functions. I won't sport your intelligence by redefining "functions" yet again. It'll suffice to provide a basic example.

def grapplingHook(direction, angle, battleCry):
    print(f"Direction = {direction}, Angle = {angle}, Battle Cry = {battleCry}")

grapplingHook(43.7, 90, "")
Enter fullscreen mode Exit fullscreen mode

def says we're defining a function, and then we provide the name, and the names of the arguments in parenthesis. Yawn

Let's make this a bit more interesting. (The following works in Python 3.6 and later.)

def grapplingHook(direction: float, angle: float, battleCry: str = ""):
    print(f"Direction = {direction}, Angle = {angle}, Battle Cry = {battleCry}")

grapplingHook(angle=90, direction=43.7)
Enter fullscreen mode Exit fullscreen mode

Believe it or not, that's valid Python! There's a lot of nifty little goodies in there, so let's break it down.

Calling Functions

When we call a function, we can obviously provide the arguments in the order they appear in the function definition, like in the first example: grapplingHook(43.7, 90, "").

However, if we want, we can actually specify which argument we're passing which values to. This makes our code more readable in many cases: grapplingHook(angle=90, direction=43.7). Bonus, we don't actually have to pass the arguments in order, so long as they all have a value.

Default Arguments

Speaking of which, did you notice that I left out the value for battleCry in that second call, and it didn't get mad at me? That's because I provided a default value for the argument in the function definition...

def grapplingHook(direction, angle, battleCry = ""):
Enter fullscreen mode Exit fullscreen mode

In this case, if no value is provided for battleCry, then the empty string "" is used. I could actually put whatever value I wanted there: "Yaargh", None, or whatever.

It's pretty common to use None as a default value, so you can then check if the argument has a value specified, like this...

def grapplingHook(direction, angle, battleCry = None):
    if battleCry:
        print(battleCry)
Enter fullscreen mode Exit fullscreen mode

But then, if you're just going to do something like this instead...

def grapplingHook(direction, angle, battleCry = None):
    if not battleCry:
        battleCry = ""
    print(battleCry)
Enter fullscreen mode Exit fullscreen mode

...at that point, you might as well just give battleCry that default value of "" from the start.

Gotcha Alert: Default arguments are evaluated once, and shared between all function calls. This has weird implications for mutable types, like an empty list []. Immutable stuff is fine for default arguments, but you should avoid mutable default arguments.

Gotcha Alert: You must list all your required arguments (those WITHOUT default values) before your optional arguments (those WITH default values). (direction=0, angle, battleCry = None) is NOT okay, because the optional argument direction comes before required angle.

Type Hinting and Function Annotations

If you're familiar with statically typed languages like Java and C++, this might make you a little excited...

def grapplingHook(direction: float, angle: float, battleCry: str = "") -> None:
Enter fullscreen mode Exit fullscreen mode

But this doesn't do what you think it does!

We can provide type hints in Python 3.6 and later, which offer exactly that: hints about what data type should be passed in. Similarly, the -> None part before the colon (:) hints at the return type.

However...

  • Python will not throw an error if you pass the wrong type.
  • Python will not try to convert to that type.
  • Python will actually just ignore those hints and move on as if they aren't there.

So what's the point? Type hinting does have a few advantages, but the most immediate is documentation. The function definition now shows what type of information it wants, which is especially helpful when your IDE auto-magically shows hints as you type arguments in. Some IDEs and tools may even warn you if you're doing something weird like, say, passing a string to something type-hinted as an integer; PyCharm is very good at this, in fact! Static type checkers like Mypy also do this. I'm not going into those tools here, but suffice it to say, they exist.

I should make it extra clear, those type hints above are a type of function annotation, which has all sorts of neat use cases. Those are defined in more detail in PEP 3107.

There are a bunch more ways you can use type hinting, even beyond function definitions, with the typing module that was added in Python 3.5.

Overloaded Functions?

As you might guess, since Python is dynamically typed, we don't have much of a need for overloaded functions. Thus, Python doesn't even provide them! You generally can only have one version. If you define a function with the same name multiple times, the last version we defined will just shadow (hide) all the others.

Thus, if you want your function to be able to handle many different inputs, you'll need to take advantage of Python's dynamically typed nature.

def grapplingHook(direction, angle, battleCry: str = ""):
    if isinstance(direction, str):
        # Handle direction as a string stating a cardinal direction...
        if direction == "north":
            pass
        elif direction == "south":
            pass
        elif direction == "east":
            pass
        elif direction == "west":
            pass
        else:
            # throw some sort of error about an invalid direction
    else:
        # Handle direction as an angle.
Enter fullscreen mode Exit fullscreen mode

Note, I left the type hints out above, as I'm handling multiple possibilities. That was honestly a terrible example, but you get the idea.

Gotcha Alert: Now, while that was perfectly valid, it is almost always a "code smell" - a sign of poor design. You should try to avoid isinstance() as much as possible, unless it is absolutely, positively, the best way to solve your problem...and you may go an entire career without that ever being the case!

Return Types

If you're new to Python, you may have also noticed something missing: a return type. We don't actually specify one outright: we simply return something if we need to. If we want to leave the function mid-execution without returning anything, we can just say return.

def landPlane():
    if getPlaneStatus() == "on fire":
        return
    else:
        # attempt landing, and then...
        return treasure
Enter fullscreen mode Exit fullscreen mode

That bare return is the same as saying return None, while return treasure will return whatever the value of treasure is. (By the way, that code won't work, since I never defined treasure. It's just a silly example.)

This convention makes it easy for us to handle optional returns:

treasure = landPlane()
if treasure:
    storeInMoneyBin(treasure)
Enter fullscreen mode Exit fullscreen mode

NoneType is truly a wonderful thing.

Gotcha Alert: You'll notice, all the other functions in this guide lacked return statements. A function automatically returns None if it reaches the end without finding a return statement; no need to tack one on the end.

Type Hinting and Defaults

When using type hinting, you may be tempted to do this...

def addPilot(name: str = None):
    if name is not None:
        print(name)
    else:
        print("Who is flying this thing?!?")
Enter fullscreen mode Exit fullscreen mode

This used to be acceptable, but it is no longer considered officially correct. Instead, you should use Optional[...] to handle this situation.

def addPilot(name: Optional[str] = None):
Enter fullscreen mode Exit fullscreen mode

Review

I hope you feel a bit less confused by Python's type system, and that you didn't bump your head on too many chairs during your trip down the rabbit hole. Here's the highlights again:

  • Python is dynamically typed, meaning it figures out the data type of an object during run time.

  • Python is strongly typed, meaning there are strict rules about what you can do to any given data type.

  • Many data types in Python are immutable, meaning only copy of the data exists in memory, and each variable containing that data just points to that one master copy. Mutable types, on the other hand, don't do this.

  • is checks if the operands are the same instance of an object, while == compares values. Don't confuse them.

  • Systems Hungarian notation (e.g. intFoo) is a bad idea. Please don't do that.

  • You can wrap strings in single ('...') or double quotes ("...").

  • Triple quote strings ("""...""") are for multiline strings. They can also be used for docstrings, documenting a function, class, or module.

  • Raw strings (r"\n") treat any backslash as literal. This makes them great for regular expression patterns.

  • Formatted strings (f"1 + 1 = {1+1}") let us magically substitute the result of some code into a string.

  • Default values can be specified for function arguments, making them optional arguments. All optional arguments should come AFTER required arguments.

  • Type hinting lets you "hint" what type of data should be passed into a function argument, but this will be treated as a suggestion, not a rule.

As usual, you can find out lots more about these topics on the Python documentation.


Thank you to deniska, grym, and ikanobori (Freenode IRC #python) for suggested revisions.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .