Recently I read The Go Programming Language in hopes to better understand Go and discover what makes it a unique language. One Go feature that peaked my interest was goroutines. Goroutines are a mechanism for achieving concurrency and parallelism in Go programs. Many programming languages provide libraries that use multiple threads or processes to achieve concurrency, but Go takes a more distinct approach.
Go's use of Goroutines to achieve concurrent programming lead me to ask many questions, all of which I'll attempt to answer in this article. The questions are as follows:
A goroutine is an activity within a Go program1. Using goroutines, engineers can write concurrent and parallel programs. When a Go program starts and calls its main() function, it runs in a main goroutine2. This is similar conceptually to a main thread in languages like Java and Python, although goroutines and threads are distinct entities. A new goroutine is created in a Go program using the go keyword. In a program with multiple goroutines, each goroutine runs concurrently, and if the computer running the program has multiple CPUs or cores, potentially in parallel.
As a basic example, the following code starts a separate goroutine from the main goroutine of an application. Both goroutines simply print text to standard output and exit.
One thing to note is when the main goroutine exists (the main() function completes), all other goroutines are forced to exit as well3. Therefore, it is possible the child goroutine is forced to close before it can print "Other Goroutine". In production code, there are ways to wait for other goroutines to run until completion.
A thread, also known as a lightweight process, is the most basic unit of scheduling on most computers. I wrote about threads, the difference between concurrency and parallelism, along with other multithreading concepts in a previous article on Python.
Threads come in two different forms: kernel threads and user threads.
Kernel threads, also known as OS threads, are threads that are managed by the operating system in kernel mode. Running in Kernel mode allows OS threads to have unrestricted access to the underlying hardware they run on4. Kernel threads contain a virtualized processor, stack space, and program state from the process they run within5. Although kernel threads require operating system support, all modern operating systems support them6. An example of a kernel thread library is Pthreads. You can find examples of pthreads in my system-programming-prototypes repository.
A user thread is similar to an OS thread, except it exists in user space and isn't managed by the operating system. Instead, user threads are written and managed in code, such as within the standard libraries for programming languages. While they require lots of user space code to implement, the benefits of user threads include fewer expenses from context switches and more application control7.
There are multiple threading models for mapping user threads to kernel threads. In reality, all threads used in application code are user threads. However, depending on the threading model, user threads can utilize kernel threads for their execution strategy (as is the case with the 1:1 threading model) or be dependent on user space code to handle threading (as is the case with n:1 and n:m threading models). Knowing the differences between threading models along with kernel threads and user threads is critical for understanding goroutines.
1:1 (Kernel-Level Threading)
The 1:1 threading model is where every user thread is mapped to a single kernel thread. Different ways to think of this threading model are user threads implementing kernel threading functionality, or user threads existing as a wrapper around kernel threads. 1:1 Threading is also known as kernel-level threading7. Pthreads are an example of the 1:1 threading model, which is why I previously described them as an example of a kernel thread. While their code isn't strictly in kernel space, every user thread created with pthreads maps directly to a single, unique kernel thread; user threads and kernel threads in pthreads form a one-to-one relationship.
The benefit of 1:1 threading is that kernel threads can be scheduled and run on separate CPUs or cores. This means that kernel threads can run in parallel on a multicore or multiprocessor machine. The downside of 1:1 threading is that context switches between threads are expensive, and operating systems set a (configurable) limit to the number of kernel threads that can be created. For example, on Linux, the maximum number of threads is viewable with the cat /proc/sys/kernel/threads-max command8.
N:1 (User-Level Threading)
The n:1 threading model is where multiple user threads are all mapped to a single kernel thread. N:1 threading is also known as user-level threading. Historically, the Java threading library used n:1 threading (known as green threads), although this system is no longer used9,10.
The benefits of n:1 threading are reduced performance costs related to context switching and more control in user space. However, on modern architectures, it is important for threads to take advantage of multiple cores and processors. Since the n:1 threading model maps user threads to a single kernel thread, all user-level threads are executed on a single core and a single processor. The performance loss of not taking advantage of multiple cores or processors far outweighs any benefits gained from reducing context switches (on multicore/multiprocessor machines), making N:1 threading a seldomly used approach. Also, the amount of complex user level code needed to maintain user-level threads is another detriment to n:1 threading.
N:M (Hybrid Threading)
The n:m threading model is where m user threads are mapped to n kernel threads. N:M threading is also known as hybrid threading.
For example, take a program that implements hybrid threading and distributes 20 user threads to four kernel threads. If the computer this program runs on has a four core processor, these four kernel threads can be distributed evenly across the processing cores, allowing kernel threads to run in parallel. However, from the appearance of the application, 20 threads were created, not four.
Hybrid threading attempts to benefit from both kernel-level threading and user-level threading. By utilizing kernel threads, hybrid threads are able to achieve parallelism on multicore or multiprocessor machines. By utilizing user threads, hybrid threads reduce the cost of context switches and provide more power to user-space code. The downside of hybrid threads is they are complex to implement11.
Many programming languages utilize kernel-level threads (the 1:1 threading model). Modern implementations of Java threads and Python threads utilize kernel threading under the hood (In Java, this is referred to as native threading)12,13. One differentiating factor for goroutines compared to Java or Python threads is that Go maps m goroutines to n kernel threads, thus making it follow the hybrid threading (n:m) model14.
After learning that Go uses hybrid threading instead of a more common kernel threading approach, the question I began to wonder was why? Go is a much newer language than Java and Python, which were first released in 1996 and 1991, respectively. Therefore, Go had the benefit of hindsight when it chose its threading model. By using hybrid threading, goroutines (which exist in user space) are able to have custom functionality that makes concurrent code easier to use and write. Hybrid threading also reduces the cost of context switching in a scenario where the number of goroutines is greater than the number of cores and processors in a machine. This results in more efficient concurrent code while still leveraging a machine's architecture for parallelism.
While reading Go documentation and The Go Programming Language, writers are quick to point out that goroutines are not threads. In many ways, I find it easier to think of goroutines as user threads with some unique attributes. One of the biggest differences between goroutines and typical threads is that threads have a fixed-sized stack space while goroutines have a dynamically-sized stack space15.
I believe dynamically-sized stack space is the best feature of goroutines. Kernel threads have a fixed-size stack space where the size is architecture dependent, but is often large enough to prevent a stack overflow. There is also a configurable limit to the number of kernel threads on a machine. This leads to scalability issues when a program attempts to execute many operations concurrently.
When goroutines are initialized, their stack space is small, often around 2KB according to the documentation15. The stack space of a thread is often 100 times as big, 2MB in the case of pthreads on a 64 bit x86 architecture16. In some cases, the stack size for a thread is even larger than 2MB. The small (and scalable) stack space of goroutines allows magnitudes more of them to be created compared to threads17.
The scalable nature of goroutines has the same impact on concurrent code as scalable cloud resources has on infrastructure; goroutines make concurrent code easier to write and manage and allows for architectural designs impossible in more rigid systems. As cloud platforms like AWS allow infrastructure to scale with consumer demand, goroutines are able to scale with growing concurrent workloads. Concurrent processes that are difficult to create with threads in libraries like pthreads or languages like Java and Python may be easier in Go with goroutines.
Goroutines also abstract away much of the complexity around concurrent code. In my opinion, prefixing any Go function with the go keyword is an elegant way to implement a concurrency library; in my experience, goroutines are easier to use and learn compared to other threading libraries.
In Go, channels are a way to communicate between goroutines. Let's take a basic example where integer values are passed to a goroutine via a channel and returned from the same goroutine via another channel.
The make(chan int) syntax creates a new integer channel. The double() function, which is run as a goroutine with the go double(out, in) statement, takes two channels as arguments. Within double(), the statement value := <-in reads a value from the in channel. After doubling the value, the statement out <- result writes the doubled value to the out channel.
In the main goroutine, the in <- 2 statement sends a value to the in channel and the result := <-out statement reads a value from the out channel and assigns it to result.
The double() function can be refactored to use unidirectional channels, which specify the directional flow of data through channels in their type definitions18. The directional flow of data in a unidirectional channel must be obeyed.
Type chan<- int is a send-only integer channel and type <-chan int is a receive only integer channel18.
Another way to refactor this example is to use buffered channels. Buffered channels represent queues with a specified size, compared to a standard channel which holds a single element. Here is a final version of the code using buffered channels and unidirectional channels.
Channels in Go remind me of multiple different technologies, ranging from message brokers like RabbitMQ to generators and coroutines in Python. In Go, the syntax for channels is simple, making them easy to work with.
A coroutine is a function or subroutine that allows its execution to be suspended and resumed cooperatively19. In Python, coroutines are functions that yield their control flow; they are functions containing a yield keyword in their bodies20. Coroutines are similar to kernel threads because they enable concurrent execution, but unlike threads do not enable parallelism21. Coroutines also follow cooperative multitasking, unlike kernel threads which follow preemptive multitasking22. In other words, coroutines yield control of a program on their own terms whenever they wish, while kernel threads are forced to yield control by the operating system.
Coroutines are a programming concept I first encountered while learning Python. The double() goroutine used in the previous section is easily rewritten as a Python coroutine. This code is available in a coroutines.py file on GitHub.
In many ways, double() implemented as a goroutine in Go and a coroutine in Python are functionally the same. Both are unblocking calls that yield execution, waiting for a value. In the Python coroutine, the yield keyword is used to wait for a value to be passed by its caller. In the goroutine, a channel is used to wait for a value to be passed by another goroutine. Both the coroutine and the goroutine emit values as well. The coroutine passes values back to its caller using the yield keyword, while the goroutine passes values to other goroutines through a channel.
While it's reasonable to guess that goroutines are simply coroutines, there are crucial differences. Coroutines do not enable concurrency or parallelism on their own23. Meanwhile, goroutines implement hybrid threading and preemptive multitasking. A library such as asyncio, which is built on top of coroutines in Python, has more in common with goroutines than coroutines. I wrote about asyncio along with other Python concurrent programming concepts in a prior article.
Goroutines are a very interesting part of the Go programming language, and I'll likely leverage them next time I have a codebase with large amounts of concurrency and parallelism. Goroutines make concurrent programming easy, hiding a complex hybrid threading model behind simplistic language syntax and semantics. Go code shown in this article along with other goroutine and channel examples are available in my go-programming repository on GitHub.