Mihai Anton
Blog Image

Back a few years ago, when I was I in highschool, I had the first contact with the computer science field and learned about algorithms. Since then, things have evolved and I got interested in many aspects of software engineers, like AI, building end to end products and the business aspect of them.

As I grew as a software engineer through university, the complexity of the software I was building increased over time, having multiple layers of abstraction between me and the data I was using. I no longer had to think about freeing up memory like I was doing in C, or implementing fast sorting algorithms or hash-tables from scratch. Nowadays you have all those out of the box. Memory management is handled by default by high level programming languages, arrays are sorted with a generic '.sort()' method, and maps are created in just a few keyboard touches: 'testMap = {}'.

However, even a few years later, I still enjoy doing small coding competitions and solving algorithmic problems. Also, before landing internships at Google and Bloomberg, I was used to spending a lot of time learning about various coding concepts, as well as you can leverage different data structures to build fast and efficient algorithms. I decided I would share some of the problems I found most interesting, breaking down the logic and solving them step by step, as if I was taking an interview.

The statement of the problem is simple. Given a list of words, we need to group the anagrams together, and return the groups as a list of lists afterwards. (word1 and word2 are anagrams if they use the exact same letters; i.e. one is the shuffle of the other). Here’s a couple examples:

1 2 groupAnagrams(["eat","tea","tan","ate","nat","bat"]) = [["bat"],["nat","tan"],["ate","eat","tea"]] groupAnagrams(["a"]) = [["a"]]

Before writing any code, let’s build a strategy. We need the following:

  • A method that decides whether 2 words are anagrams.
  • A way of storing the words in the buckets, so that every 2 words in a bucket are anagrams.
  • A way to convert each bucket to a list, and return a list with all of them.

Of course, there are multiple ways to solve this problem. You could, for example, identify a group of anagrams by the sorted representation of the string. In this case, both “cba” and “bca” would belong to the “abc” group. This, however, implies a O(n logn) time complexity on the length of each word. Let’s see if we can do better. Have you heard of prime numbers? Did you ever try to multiply them? Because a prime number does not have any prime factors other than itself and 1, a product of prime number always has the same factors, the prime numbers themselves. Given that the input consists of only lowercase english letters, we can assign each letter to a prime number:

1 2 3 4 5 6 LETTERS_TO_PRIMES = { 'a': 2, 'b': 3, 'c': 5, 'd': 7, 'e': 11, 'f': 13, 'g': 17, 'h': 19, 'i': 23, 'j': 29, 'k': 31, 'l': 37, 'm': 41, 'n': 43, 'o': 47, 'p': 53, 'q': 59, 'r': 61, 's': 67, 't': 71, 'u': 73, 'v': 79, 'w': 83, 'x': 89, 'y': 97, 'z': 101 }

Taking the example above, with “cba” and “bca”. If we multiple the prime numbers associated with each letter, we end up with 5*3*2 = 3*5*2 = 30. Only the anagrams of “abc” can have this product, and we can use this method as identifying a bucket of anagrams, in linear time!

1 2 3 4 from operator import mul def _getAnagramKey(self, word: str) -> int: return reduce(mul, [LETTERS_TO_PRIMES[c] for c in word], 1)

Having this method of creating a unique key for each anagram group, let’s split the list into buckets. We’ll use a map for this, where the key is product we discussed before, and the value is a list of all the words that belong to that bucket.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def groupAnagrams(self, strs: List[str]) -> List[List[str]]: anagram_groups = {} for word in strs: anagram_key = self._getAnagramKey(word) anagram_groups[anagram_key] = anagram_groups.get(anagram_key, []) + [word] print(anagram_groups) return list(anagram_groups.values()) # strs = ["eat","tea","tan","ate","nat","bat"] # anagram_groups = { # 1562: ['eat', 'tea', 'ate'], # 6106: ['tan', 'nat'], # 426: ['bat'] # }

Doing this, we end up with a complexity of O(n log k), where n is the length of the word list, and k is the total size of the alphabet. Taking one step further, we can even consider k as a constant, since we only have lowercase english letters, and conclude that this implementation has a O(n) space and time complexity (we also need to store all words in the map). This is how a final solution would look like:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from operator import mul class Solution: LETTERS_TO_PRIMES = { 'a': 2, 'b': 3, 'c': 5, 'd': 7, 'e': 11, 'f': 13, 'g': 17, 'h': 19, 'i': 23, 'j': 29, 'k': 31, 'l': 37, 'm': 41, 'n': 43, 'o': 47, 'p': 53, 'q': 59, 'r': 61, 's': 67, 't': 71, 'u': 73, 'v': 79, 'w': 83, 'x': 89, 'y': 97, 'z': 101 } def _getAnagramKey(self, word: str) -> int: return reduce(mul, [self.LETTERS_TO_PRIMES[c] for c in word], 1) def groupAnagrams(self, strs: List[str]) -> List[List[str]]: anagram_groups = {} for word in strs: anagram_key = self._getAnagramKey(word) anagram_groups[anagram_key] = anagram_groups.get(anagram_key, []) + [word] print(anagram_groups) return list(anagram_groups.values())

For mapping letters to prime numbers you can either get a hardcoded list, if the vocabulary is well defined, or you can use Eratosthene’s Sieve to get as many prime numbers as you need. A trade-off to consider though, some programming languages might give you a hard time with the value domain of integers. The key for a bucket can easily lead to a big number if, let’s say, you have a string like “zzzzzzzzz”.

This one is a LeetCode easy, and finding a solution to it is not hard. However, there are multiple flavors to the problem, that I’d like to discuss. The prompt is simple, we have a list of integers. The length of the list is odd, every number appears twice except for one. We need to find that single number.

Here are some examples:

1 2 3 4 singleNumber([1]) = 1 # only one number without a pair singleNumber([3,3,3]) = 3 # we can form a pair out of any two 3s, and one is still left out singleNumber([100, 2, 2]) = 100 singleNumber([4, 3, 4, 4, 4]) = 3

Let’s see how we can approach the problem. Given that only one number has a odd frequency, we can count how many times a number shows up, and return the one with an odd frequency at the end. We can use a map for this and such a solution would look like this:

1 2 3 4 5 6 7 def singleNumber(self, nums: List[int]) -> int: frequencies = {} for num in nums: frequencies[num] = frequencies.get(num, 0) + 1 return [x for x in list(frequencies.keys()) if frequencies[x] & 1][0]

We can even take this one step further and have a set instead of the dictionary. When the current element exists, we remove it, when it does not, we add it. At the end, the set will only have the element we care about. Both of the solutions offer a linear time complexity, as well as linear space complexity. Can we do better with the space.

Sure, we can sort the original array, iterate sequences of similar values, and check what sequence has a odd length. This is the code that does that.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def singleNumber(self, nums: List[int]) -> int: nums.sort() current, frequency = nums[0], 1 for num in nums[1:]: if current != num: if frequency & 1: return current current, frequency = num, 1 else: frequency += 1 return current

We’ve removed the additional space requirement, but we increased the time complexity to O(n logn) because of the sort.

Can we do better? Hmm, do you know about bitwise operations, namely XOR? Here’s what XOR between 2 bits does:

  • 1 ^ 1 = 0 ^ 0 = 0
  • 1 ^ 0 = 0 ^ 1 = 1

Notice that XOR is 1 only when the bits are different. This means that if we have a number A, it will look something like 0001111001001… in the bitwise representation. XORing A with itself will equal 0, since we’re performing XOR on equal bits. See a pattern here?

Let’s say we have the following list of number: [A, A, B, A, B]. Xoring them all together will look like A ^ A ^ B ^ A ^ B = (A ^ A) ^ (B ^ B) ^ A = 0 ^ 0 ^ A = A. This is exactly how we’ll solve the problem. Iterating over all values, with the constraint that every number has a pair but one, the result would be exactly the number we are looking for. Doing so drops the time complexity to O(n) and we don’t even use additional data structures. We can even write it in one line:

1 2 def singleNumber(self, nums: List[int]) -> int: return reduce(lambda x, y: x ^ y, nums)

Those are just 2 interesting problems (linked below). LeetCode has much more, that I aim to write posts on consistently. Knowing the solution to those 2 problems only won’t get you into Google, but constantly doing problems and learning about new techniques will increase your chances a lot.