29^th December, 2021

Pathfinding Using Depth-First Search

Adventures in technical interviews and using pathfinding through depth-first search to solve the lowest common ancestor problem.

Background

Recently I went through the interview process for a “FAANG/MAANG/MANGA/whatever they call them these days” company. I managed to get all the way through the numerous interviews - to the very end, but didn’t end up receiving an offer.

As you don’t get feedback after the process it’s hard to know for sure; but I did have trouble with one technical question in the final set of technical interviews which I believe was what cost me the opportunity. Reflecting upon it in retrospective I had devised the right algorithm to solve the problem - I just didn’t have enough time to finish the implementation.

I think this question is interesting and unique enough to write a blog post about because I actually didn’t find much information about this specific problem on the internet when trying to check if I had the right solution afterward. I hope it may help some people out.

The Question

Find the nearest common parent of any two nodes in a non-binary tree.

Example:

Given the following tree:


           Anthony
          /   /   \
        Ben Caleb Dan
            /   \    \
          Ethan Frank Gary

Example inputs and expected outputs
Input	Output
(Ethan, Frank)	Caleb
(Frank, Dan)	Anthony

I later found out that this problem already exists, it’s called the lowest common ancestor (LCA) problem. Solutions to this on the internet are abundant for binary trees, however there are not so many for the non-binary tree case!

High-Level Algorithm Design

The general algorithm I devised is pretty simple really; but as they say, the devil is in the details:


1  lowestCommonAncestor(tree, nodeA, nodeB)
2      pathToA = pathFromRootToNode(tree, nodeA)
3      pathToB = pathFromRootToNode(tree, nodeB)
4
5      /* Case: nodeB is the root node */
6      if length(pathToB) == 1:
7          return pathToB[0]
8
9      for i in {0 to length(pathToB) - 1}:
10         if pathToA[i] != pathToB:
11             return pathToA[i-1]
12
13     /* Case: pathToB is a subset of pathToA,
14        return the last node of pathToB */
15     return last(pathToB)

We can make the assumption that the two nodes exist in the same tree. Therefore we can guarantee that the the two paths will have at least one node match - the root node.

Okay so now you’re probably thinking But Jamie, what is that function pathFromRootToNode()? How do you find the path to the node? Well dear reader, allow me to introduce you to a powerful tool in any practical programmer’s arsenal; depth-first search (DFS).

Note: Although DFS is guaranteed to find the shortest path between two nodes in a tree, this cannot be guaranteed for graphs! For graphs I recommend looking into breadth-first search (BFS).

Depth-First Search

I won’t be going into DFS in much depth (haha) as there already exist other resources that do a good job of explaining the algorithm. I recommend:

Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest & Clifford Stein Chapter 22.3 Depth-first search
Cracking the Coding Interview, Gayle Laakmann McDowell Chapter 4 | Trees and Graphs - Graph Search

However it’s still important to know the general pattern of DFS. To describe it informally, we get the child nodes of the current node and pick one child node. We then go to the chosen child node and repeat this process of getting it’s child node. This has the result of traversing the tree by going as far down the tree as possible, then returning back up to the previous node, repeating the process for the next child. This is how the algorithm gets the name depth-first search.

I’ve illustrated the “path” that DFS takes exploring an example tree.

Notice how the path always chooses to take the deepest option first - the child nodes, as opposed to exploring sibling nodes first (as seen in BFS).

Okay, let’s use DFS to record the path from the root to a arbitrary node!

There are actually two ways to implement DFS; iteratively and recursively. There are not really advantages/disadvantages of either approach, it’s more a matter of preference - the recursive approach is a little more succinct however.

I will first use the more general case of the recursive DFS here that can apply to graphs also:


1   dfs(graph, goal, currentVertex, visited, path)
2       if currentVertex not in visited:
3           if currentVertex == goal:
4               return concat(path, currentVertex)
5
6           visited = concat(visited, currentVertex)
7
8           for vertex in adjacentVertices(graph, currentVertex):
9               p = dfs(graph, goal, vertex, visited, concat(path, currentVertex))
10
11              if p is not undefined:
12                  return p
13
14       /* Hit a dead end,
15          a vertex with no new adjacent vertices */
16       return undefined

An important detail to note is that we record the nodes we have already seen to stop the algorithm from indefinitely looping through a graph that has cycles in it.

Because trees by definition do not contain cycles, we can forgo recording the previously visited nodes for a tree-specific implementation:


1   dfs(tree, goal, currentNode, path)
2       if currentNode == goal:
3           return concat(path, currentNode)
4
5       for node in childNodes(tree, currentNode):
6           p = dfs(tree, goal, node, concat(path, currentNode))
7
8           if p is not undefined:
9               return p
10
11       /* Hit a leaf node which is not the goal node */
12       return undefined

The most important lines to examine are 6-9, if we don’t find the target node, search deeper in the tree.

If we hit the bottom of the tree i.e. a leaf node, we return undefined, which will propagate back up the tree.

On the otherhand, if we find the target node, we return the path that we took down to the node, back up through the recursive callstack to the initial function call.

Visualising DFS

This may be a bit confusing at first because of the recursion. What I find helps me is to visualise the actual runtime execution of the algorithm.

Although I came up with this notation myself, I’m sure it has been thought of before, if you’ve seen it previously, please let me know.

What I do is visualise the stack frames produced during the algorithm’s runtime. I usually go through this manually and sketch it out on a piece of paper.

Note: Although this picture shows the generalised, graph-version of the algorithm being run on the previously mentioned tree, the DFS tree-version is exactly the same as the graph algorithm - just without the visited variable present, as previously discussed above.

Putting it All Together

Now that we have realised our implementation of the pathFromRootToNode() function at the beginning of the article through DFS, we can specify a more concrete implementation of the LCA algorithm:


1  lca(tree, nodeA, nodeB)
2      root = getRoot(tree)
3      pathToA = dfs(tree, nodeA, root, [])
4      pathToB = dfs(tree, nodeB, root, [])
5
6      /* Case: nodeB is the root node */
7      if length(pathToB) == 1:
8          return pathToB[0]
9
10     for i in {0 to length(pathToB) - 1}:
11         if pathToA[i] != pathToB:
12             return pathToA[i-1]
13
14     /* Case: pathToB is a subset of pathToA,
15        return the last node of pathToB */
15     return last(pathToB)

I’ve renamed pathFromRootToNode() to dfs() for the purposes of illustrating where our concrete implementation fits in, but I personally think that pathFromRootToNode() is a more descriptive name that actually tells you what the function is doing.

Time Complexity Analysis

For graphs, DFS has a time complexity of O(V + E), where V = the number of vertices in the graph, and E = the number of edges in the graph. There is a good derivation of this in Introduction to Algorithms.

However for our case, trees, the time complexity is just O(n), where n = the number of nodes in the tree. This is because the number of edges in a tree will always be n-1; this simplifies to O(n).

After we have the paths to each of the two nodes, in lines 10-12 we are simply iterating through one of the paths and cross-checking with the other in order to find at what node the paths stop being equal. At that point, we return the previous node and we have finally found the lowest common ancestor! 🎉

It should be easy to see that the time complexity of that search will be O(D), where D = depth of the tree. Having a path be a leaf-node - the deepest possible depth is an upper bound on the size of the runtime and it’s also obvious that the run-time will increase linearly with the increase in size of the path.

To sum the two algorithms together, let’s imagine a worst-case scenario; a tree that is just a linear list of nodes:

If we call pathFromRootToNode() on nodes C and D, we will get paths of length n-1 and n respectively, with respective runtimes of O(n-1) and O(n).

Note that O(n-1) = O(n).

For the second part (lines 6-15) of the lca() function, we will have to traverse all of pathC to discover that it is a subset of pathD. The runtime to find the last node is therefore O(n-1) = O(n).

Adding the runtime of the two DFS’es and the search for the last common node, we get O(3n) = O(n)! 🎉