In this article I'm revisiting the concept of type equality. Type equality is a topic that software engineers learn early on in their careers. Similar to any other profession, it's beneficial to go back to the basics for practice. Professional basketball players practice layups before each game. Professional programmers should work at the basics as well. I spent this past week re-learning type equality in 13 different languages. In the process I've reaffirmed my knowledge and gained new insights. The rest of this article discusses my findings.
Each programming language has its own intricacies in regards to analysing types for equality. Types are the blueprints for values in programming languages. Types define the characteristics of a value and differentiates a value from other types1. Equality is a test to see if two values are equal. There are two main forms of equality - reference equality and value equality.
Forms of Equality
Tests for reference equality check to see if two variables, primitives, or objects refer to the same space in memory. When two variables are referentially equal, altering the value of one impacts the other since they refer to the same bytes in memory. For example, var a = "value" and var b = a results in two variables that point to the same memory location in many languages.
Tests for value equality check to see if two variables, primitives, or objects are logically the same. For example, two integers containing the value 2 are deemed equal in value, since 2 = 2. Another example is two objects a and b where the properties in a have the same values assigned to them as the properties in b. These two objects would pass a value equality check.
In many languages value equality is dependent on the two values conforming to the same type. However, some languages lift this restriction in certain circumstances.
Languages that allow values of different types to be equal are generally considered loosely typed. Languages where it's impossible or very rare for values of different types to be equal are generally considered strongly typed. Loosely typed and strongly typed languages should not be confused for dynamically and statically typed languages.
A language that is strongly typed has very strict type rules. In order for a type to be converted to another type, an explicit conversion mechanism must be visible in the code. For example, Java is a strongly typed language where explicit casts are used to convert from one type to another, such as double two = 2.0 and int twoInt = (int) two;. However, there are still a few cases in Java where implicit type coercion occurs, such as converting an int to a double or boxing and un-boxing primitives. Languages such as C, Java, and Python are generally considered strongly typed. Just like loosely typed languages, there is no explicit rule for whether or not a language is strongly typed. It's mostly up to personal interpretation of the language.
Java has two different categories of types - primitives and objects. Primitive types are checked for value equality with the == operator. Object types are checked for reference equality with the == operator and value equality with the equals() method found in the Object class.
The == operator tests the values in two variables memory locations for equality. Primitive types hold their values directly in their assigned memory space. Therefore, == tests for value equality with primitives.
There is no way to test reference equality with primitives. Primitives also can't use the equals() method since they aren't objects and can't have methods.
The following code demonstrates how String objects are tested for equality. String literals are unique because Java caches them in the same memory location if they have equal values. As you will soon see, many other languages use this optimization for strings as well.
However, for most objects == tests for reference equality and equals() tests for value equality. I created a custom Yarn class to demonstrate how object equality works. For value equality to work properly, Yarn overrides the equals() method from Object.
Variables yarn1 and yarn2 are the only Yarn instances that pass reference and value equality tests.
While Java is generally a strongly typed language, there are a few occasions where type coercion occurs with the == operator. This is usually due to boxing and un-boxing primitives along with comparing numeric primitives2. Here are a few examples:
When testing object types for equality, the == and === operators are much more predictable. They both test objects for reference equality.
If we want to test objects for value equality, a custom function must be created. The following function tests objects for value equality one level deep3.
Python has two ways to test for equality - the == operator and the is statement. Value equality is tested with == and reference equality is tested with is. Python equality is easier to reason about because all types are objects. Python is also strongly typed, so there is no implicit type coercion when testing equality.
Similar to Java, Python caches string literals in the same memory location, causing the results of == and is is be the same.
I created a custom Yarn class to demonstrate the normal behavior of == and is. Python provides a built in __eq__ method that classes can override. This method impacts the behavior of the == operator for testing value equality.
Bash is an outlier in the sense that it's an untyped language. All variables in bash are plain text that can be interpreted differently depending on the context. Bash provides a couple operators for determining equality of plain text. All these operators test value equality. There is no way that I know of to test reference equality.
The = and == operators test for plain text equality. The -eq operator tests for integer equality.
You can also easily test arrays for value equality in Bash:
C has a single == operator for testing type equality. == can be used to test integer, floating point, and pointer types for equality. Integer and floating point equality operations test for value equality. Pointer equality operations test for reference equality. C is strongly typed, so no type coercion occurs during these equality tests.
One of the small differences between the == operator in C and other languages is that in C it returns a number instead of a boolean value. == returns 1 when the two items are equal and 0 when they aren't.
As I mentioned, comparing two pointers with the == operator tests for reference equality. Similar to other languages, C caches string literals so they point to the same memory location. This is proven in the following code:
Strings in C can be represented as character pointers (char*) or character arrays. Since character arrays always claim a new slice of memory, reference equality will fail when comparing them to a character pointer.
C is an imperative programming language and doesn't support classes or objects. The basic construct it does support is structs. Unfortunately the == operator doesn't work with structs. If you try using it a compile time error will occur. To test struts for value equality, each item in the struct must be checked for equality individually. There is also a memcmp() function available to test structs for value equality, however it isn't always reliable5,6. To test structs for reference equality, you just need to compare the pointers to the structs.
Similar to C, C++ has a single == operator for testing type equality. However, C++ is an object-oriented language that allows for operator overloading. Because of this, all structs and classes can use the == operator with their own custom logic for value equality. Pointer types are still used for reference equality just like C.
You can see an example of equality between objects in C++ on GitHub.
Also the == operator in C++ returns a boolean type instead of an integer type as it does in C.
C# provides an == operator, an Equals() method, and a ReferenceEquals() method for testing equality. When working with primitives, == and Equals() test for value equality. C# is strongly typed so there is no type coercion when testing for equality.
C# is often compared to Java due to similarities in their structure. However, there are some differences when it comes to equality. While Java primitives can't use the equals() method, C# primitives can. This is because primitives in C# are aliases for structs.
When working with custom structs or classes, Equals() tests for value equality and ReferenceEquals() tests for reference equality. The == operator is a trickier situation in C#. Unlike Java, C# permits operator overloading. This means a class designer can decide if they want to overload == or not. Depending on this decision, == may test for value equality or reference equality.
For example, the built in Uri class overloads the == operator so that it tests for value equality. I also created a custom Yarn class which doesn't overload == so that it maintains reference equality. The consequences can be seen below:
Luckily you can still use the ReferenceEquals() method in case the == operator is overloaded. If you want to see more examples of equality in C#, check out my GitHub page.
Groovy takes Java's type equality system and makes some major changes to it. Groovy uses == to test for value equality for all values. This is different than Java which uses == for reference equality with value types. You can still use Object.equals() in Groovy, however it will provide the same result as ==. Because both == and Object.equals() are used for value equality, Groovy introduced a new method Object.is() to test for reference equality.
Like Java, Groovy is a strongly typed language. However, there is some type coercion that occurs when comparing numeric values.
If you want to see how to alter Groovy's value equality mechanism, check out the full code on GitHub.
Haskell is a functional programming language that behaves much differently than the other languages I'm looking at today. It doesn't have reference equality by default, mostly for performance reasons7. However, you can implement value equality by making a type an instance of the Eq type class.
Once Eq is implemented, the == and /= operators can be used. Note that Haskell is strongly typed, so no type coercion occurs.
Now here is a custom class I created for testing value and reference equality with objects.
PowerShell is a loosely typed scripting language. It has a native operator -eq for testing value equality. One of the cool things about PowerShell is that is has full access to the .NET Framework, including the [System.Object]::Equals() and [System.Object]::ReferenceEquals() functions I explored earlier in C#.
When testing primitive values for value equality, the -eq operator (and opposite -ne operator) work as expected for a loosely typed language:
When working with objects in PowerShell, the -eq operator and [System.Object]::Equals() function are used to test value equality. The [System.Object]::ReferenceEquals() function is used to test reference equality.
Swift provides two operators to test equality. For value types and structs, the == operator tests for value equality. There is no way to test for reference equality on value types or structs. For objects, the == operator tests for value equality and the === operator tests for reference equality.
The following examples demonstrate how to test equality amongst value types.
The next piece of code creates a custom struct and a custom class. Both demonstrate how the == and === operators work.
If you try using the == operator on two different types, TypeScript will throw a compile time error:
You can however trick TypeScript into thinking the two values are of comparable types by using the any or Object types: