For most people, studying data structures and algorithms can feel like being thrown into the deep end of a pool — especially if you’re new to computer science. But don’t worry! Getting started doesn’t have to be intimidating.
Think of data structures and algorithms as the building blocks for programming. They’re the foundation on which all programs are built, no matter how complex they might be.
Note that this guide is completely language agnostic, meaning it doesn’t matter what language you use—these principles will still apply.
Let’s start simple: What Are Data Structures?
A data structure is [exactly] what it sounds like: a way of organizing and storing data (an array, linked list, tree, etc.). On the other hand, an algorithm is simply a set of instructions that tell the computer how to manipulate the data.
By data, we mean any type of information. It could be numbers, characters, words, images, or audio files — the possibilities are endless!
And data structure is how we store our data in a computer’s memory. It also affects how quickly we can access and manipulate that data.
Why Are Data Structure and Algorithms So Important?
You’ve probably organized your house before. You have containers to put your clothes in, shelves to store your books on, and boxes to keep all your miscellaneous items.
It’s the same with data structures. Instead of storing all your data haphazardly, data structures allow us to store and organize our data so that it takes up less space and is more accessible.
Algorithms are important for the same reason—they help us be more efficient with our data. By having a set of instructions, we can automate tasks, like finding the shortest route from one city to another, or sorting a list of names in alphabetical order.
As a developer, understanding data structure algorithms and how to choose the right data structure for your program will help you create more efficient, robust software solutions.
You’ll start writing programs most logically, in the way that makes the most sense for the problem you’re trying to solve.
So, to define it, a data structure is a way of organizing and storing data in a computer’s memory.
How Computer Memory Works
Let’s dive into the technical aspect of data structures and algorithms. Computers store data in their memory registers, which are essentially blocks of memory. Each of these blocks holds some kind of value, like a 2, 5, or 10.
Memory Registers
2 |
5 |
7 |
When you write a program, you’re probably not just doing something with one piece of data. Instead, you’ll be working with a collection of data that are sort of related to each other, like a list of numbers, a list of transactions, or a list of people’s names.
So, storing that data in an organized and easy-to-access way makes so much sense. That’s where data structures come into play.
Instead of defining separate variables, like:
a = 2
b = 4
c = 7
You can just define one variable as a list and then store the values in it:
list = (2, 4, 7)
So, in the future, should you want to access one of the values, you don’t have to go hunting for a, b, or c. You can just access the list and get all the values that it holds:
list (1) = 2
list (2) = 4
list (3) = 7
That’s the primary purpose of data structures—to make it easier for us to access and manipulate our data.
That’s the computer equivalent of organizing your clothes in a closet, books on shelves, and tools in boxes.
So, should you want to access any of your clothes, books, or tools, you can go to the respective containers and pick out what you need.
That’s precisely how data structure works: it helps us organize our data in such a way that we can quickly find what we need in the least amount of time possible.
Now, let’s talk about some of the limitations of a simple data structure like a list.
Limitations of a Simple Data Structure Such as a List
A list is great for organizing data but can only do so much.
Now, let’s go back to our memory register. We had only allocated three memory registers to store our data.
But what if, in the future, we need to store more values than the three that we had allocated?
We would then have to allocate more memory registers or find a different way of organizing our data.
But what if other things already take up all the memory in the register, and we can’t allocate more registers?
Say we had a “hello” string in one of the memory registers, and now we wanted to store an integer value there.
2 |
5 |
7 |
“Hello” |
Unfortunately, this isn’t possible since you can’t store two different data types in the same register.
If you put the integer value on top of the “hello” string, the string will get replaced, and you will lose the string.
Lists are usually defined under the hood in the language, such that if you add an element to a list, the entire list will be taken and placed in a different memory, allocating more memory to it.
Even if you’re new to programming, it’s easy to see why this could be a problem. Every time we add data to our list and the entire list has to be taken and moved to a new memory, we’re wasting time and resources.
Your program will become slow and inefficient, and that’s why lists are not always the best data structure when dealing with large amounts of data.
A Better Way to Organize Data than a List
A better alternative to organize data than a list is a Linked List.
Instead of storing data sequentially, as seen in a list, a linked list stores data in ‘nodes.’
With nodes, data isn’t stored sequentially on top of each other; instead, they’re all linked together with ‘pointers’ so that the next piece of data is always linked to the previous one, hence the name ‘linked list.’
That makes adding and deleting data much easier without moving the entire data structure around.
A Linked list only has two parts, the data, and the pointer.
The ‘data’ is simply the piece of information that’s being stored, like an integer value or a string.
And then there’s the ‘pointer’ which points to the next node in the list.
The nodes are usually numbered in the computer’s memory, so one node’s pointer will reference the next node’s memory address.
So, if you need to find a particular piece of data, you can follow the pointers until you reach the node storing your data and then access it.
For example, in the diagram above, the pointer box reads 3600. That means the next node in the linked list is stored at memory address 3600.
The second node pointer reads 4000, which means that the next node is stored at memory address 4000.
A Linked List is the equivalent of a room full of containers.
In this room, instead of having all the containers in one spot, sequentially as they’re supposed to be in a list, you can store the containers in different places and use pointers to mark where the next container is.
This way, you can continue increasing the chain of values or nodes as long as there’s space without worrying about moving the entire data structure around.
All you have to do is find the last node and add the new node to its pointer.
Linked lists can be used to store anything from integers, strings, and even other data structures.
Lists (Arrays) and Linked Lists aren’t the only data structures out there. But they are a good starting point, and understanding how they work will help you better understand other data structures.
Why Do We Need to Have Different Data Structures?
Now that we’ve explored the list data structure and why it can be inefficient for large datasets, let’s talk about why it’s important to have different data structures.
Data structures are like tools in a toolbox — they all serve their own purpose, whether that’s sorting data or allowing us to access data quickly and efficiently.
Each data structure has its own advantages, so it’s important to know which one is best for the job you’re trying to accomplish.
For example, if you need to store large amounts of data that needs to be accessed quickly, then Linked Lists are probably your best bet.
But if you’re working with a small dataset and need to do basic tasks like sorting or searching for data, then a List (Array) may be more suitable.
It all depends on the job, and understanding the different data structures is key to choosing the right one for your task.
A Real-World Example of Data Structures (Priority Queues)
Let’s say you’re designing a booking system for an airline, and you want to structure the data of the people who have booked tickets for some flights.
What data structure would you use to store this information?
A Priority Queue
Priority Queues are used to store and order elements based on their priority, with the highest priority element being served first.
Customers can have different priority levels in an airline booking system, so it makes sense to use a Priority Queue to store this data.
You can start by defining a Linked List to hold the ticket information and then use a Priority Queue to order them according to priority.
The Priority Queue will take the Linked List data and arrange it so that customers with higher priority levels are served first, while those with lower priority levels will be served last.
Instead of having a piece of data in a node block that points to the next node, each node block here will have a priority value attached to it.
So, instead of using pointers to access data, the Priority Queue uses an algorithm that sorts the data in order of its relative importance.
Stack
Another type of data structure is the Stack.
A stack queue is a data structure that allows us to store and access elements in a ‘last in, first out (LIFO) manner.
Think of it like a stack of plates; the last one you put in is the first one that gets taken out.
If you want to take the last item off a stack, you can use the ‘pop’ command, and if you want to add an item to the top of the stack, you can use the ‘push’ command.
Note that you can only take the items from the top of the stack, not the bottom.
Stacks are useful when dealing with recursive algorithms, and they can help us keep track of our progress in a task by using a ‘call stack’ or ‘function stack’.
In a nutshell, stacks are great for keeping track of data that needs to be accessed in a particular order.
The Limitations of Stack
One of the main drawbacks of a stack is that once an item is added to the stack, it cannot be removed unless it is on the top of the stack.
That means that if you have many elements in your stack and need to access one at the bottom, you may need to remove all of the other elements first.
That can lead to a lot of wasted time and resources if the stack is large, so it’s important to keep these limitations in mind when using stacks.
Another downside of stacks is that if you need a different retrieval order, let’s say you want to get the first item you added instead of the last one, a stack won’t be the right data structure for the job.
For tasks like that, you need to use a different data structure called Queue.
Queue
A Queue is similar to a Stack but operates in a ‘first in, first out (FIFO) manner.
That means that the element at the front of the Queue is always the first to be removed, regardless of when it was added.
Queues are useful for handling customer requests or processing tasks according to order.
They also allow us to easily track how many elements are in the queue and what is currently at the front of it.
When you add an item to the queue, this is what we refer to as enqueue, which means adding a new element to the back of the queue.
Similarly, removing an item from the queue is known as dequeue, which means removing the item at the front of the queue.
The Limitations of a Queue
As with Stacks, there are limitations to using Queues.
First, searching for a particular element in the queue can be difficult, as we need to look at each element and dequeue it one by one until we find what we are looking for.
Secondly, if the queue is large, it can take a long time to add or remove elements.
Finally, since queues follow a FIFO order, you can’t access elements that have been in the queue for a while without first taking out elements that have been added more recently.
That means that if you need random access to the elements in your queue, then a Queue won’t be the right data structure for the job.
Trees
Trees are a type of data structure that is used to store hierarchical data.
For example, let’s say we want to show employees in an organization sorted by their job titles.
We could use a tree data structure to organize the information in this way, with each node representing an employee and the edges showing how they are related.
As you can see in the company shown above, John is the company’s CEO, and John has two direct reports, from CTO Steve and president Rama.
Both Steve and Rama have people reporting to them.
Steve has Lee, Bob, and Ella, while Rama has Tom and Raj.
Bob has two direct reports; lastly, we have Tom with one direct report, Bill.
This particular logical structure is called Tree, the data structure we use to store this type of hierarchical data.
If we flip the structure upside down, we can see it resembles a real tree, with the CEO at the top and all other employees branching down from him.
So, to define it, a tree data structure is a collection of nodes organized in a hierarchical order from the root node down to the leaf nodes.
Trees have several benefits over other data structures, such as stacks and queues.
For example, searching for a particular element in a tree is much faster than searching through a stack or queue.
Each node will have some associated data, and each node can have several children.
That means we can traverse a tree quickly and easily to find whatever data we want.
The nodes in the tree can either be parent nodes or child nodes.
From the diagram above, 1 is the root, but the parent node to nodes 2 and 3.
Nodes 4, 5, and 6 are children of node 2, while 7 and 8 are children of node three and so on.
Nodes of the same parent are referred to as siblings.
The topmost mode is called a rood, and it’s the only node without a parent.
Nodes without children are called leaves.
The nodes with both a parent and children are called intermediate nodes.
The main operations that can be performed on a tree are insertion, deletion, and traversal.
Inserting a new node into the tree is done by adding it as a child to an existing parent node.
Nodes are deleted by removing the desired node and its associated edges from the tree.
Traversing a tree involves recursively visiting each node in the tree.
The Limitation of Tree Data Structure
Trees do have some limitations, however.
They are not suitable for storing large amounts of data as the number of nodes can quickly become unmanageable.
Also, balancing a tree can be tricky, as it requires careful manipulation to keep the structure balanced while making constant additions and deletions.
Finally, trees are most useful for data with a hierarchical structure, so if you don’t have data that fits this model, then a tree won’t be the most suitable data structure for the task.
The Big O Notation
As we said, there is nothing like a perfect data structure. If anything, each data structure has its own strengths and weaknesses.
We use the Big O Notation to measure the efficiency of a data structure. This notation is a way to measure how well an algorithm scales when faced with increasing amounts of data.
The Big O notation works by assigning a value to each possible input size and analyzing the time it would take for our program to process that data.
The mathematical notation represents the worst-case run time or space requirements as the input size increases.
As your input sizes grow or expand, how much longer will it take for your algorithm to complete the task?
And how much more space in memory does the algorithm require?
The Big O notation helps us answer these questions and allows us to make better decisions about which data structure is the most efficient for our task.
The most common Big O notation you’ll see is Constant Time (O(1)), Linear Time (O(n)), and Quadratic Time (O(n²)).
Constant time means that regardless of the size of our input, the algorithm will always take the same amount of time to complete.
Linear Time means that as the size of our input increases, so does the required time for completion.
Quadratic time means that as our input size doubles, the time required for completion quadruples.
And lastly, Logarithmic Time (O (log n)) means that as our input size increases, the time required for completion grows logarithmically (or proportionally).
Data structures and algorithms can be difficult to understand. Still, by understanding their basic concepts and how they are used in real-world applications, you should be able to make better decisions about which one works best for your task.
In conclusion
Data structures are an important tool to have in your programming arsenal.
You can become a better, more efficient programmer by understanding how they work and when it’s most appropriate to use them.