SQL Server Hash Join

Update yo statistics!

The way the hash join works (A JOIN B or B JOIN A) is a two step process:
1. hash build. Takes input from table A and builds a hash table for it
2. hash probe. Loops table B and compares each item with the hashes from step 1.

Step 1 is blocking.

Before starting with step 1, it allocates memory for the hash table. This allocation is done based on the table statistics, so if the statistics are off there's a chance it won't allocate enough memory, and then it'll spill to disk (via tempdb). Which is super bad, especially for large tables, because then both steps will need to write or read from disk to do the join.

Another important thing is: It will try to choose the smallest of the two sets for the hash build. So if possible, use a WHERE clause to restrict the size of one of the two sides.