Home Data Science Look Ma! I finally understood MySql Joins.

Look Ma! I finally understood MySql Joins.


Joins- such a cute and uniting name for something which can completely shatter your confidence if not understood the right way. Speak about oxymoron! 

If you have been struggling with MySql joins for about whole of your life, you are in right place-you are probably gonna spend your evening waking the park just the way I am gonna help you understand joins- a walk in the park. Joins concept sits in the heart of relational database idea. You wouldn’t create a relational database if you couldn’t join them and see the desired output. The functionality of join is an absolute god-send when it comes to understanding inter-table insights from a database. Okay. Enough said’ lets get our hands dirty now. 

The pre-work needed:

If you are friends with relational algebra (and spend your weekends in library), you might have bumped into the concept of cartesian product at some point of time. A cartesian product of two sets is basically every element of one set associated with every element of another set. For example, if we have a set A with 3 elements, and another set B with 2 elements, upon cartesian product-ing it, we shall have 3 * 2 elements in the resulting set. 

Joins happen in SQL exactly on this principle. Every row of one table is paired with every row of another table based on a common attribute- this is the crux of any join in RDBMS. And the sad part, you can’t change this behavior of the compiler, but of course we can control and filter the output we get from that join. Mastering this ‘controlling’ part is slight bit of trickery. Grab a coffee, we are off to doing that now. 

Show me how!

Lets create two tables — People and KYC. I am using MySql workbench here- but any database should give you similar outputs as the basic characteristics of most of them are same. Do it along side if possible. 

Let’s look at the tables:

It’s important to know which type of join to apply where. There are various sort of joins known by various names which changes slightly with the change in database you are working, but essentially there are 4 types of joins which are used most commonly. The types of joins are-

  1. Outer joins(simply join without an condition)
  2. Natural join
  3. Self join (kinda fooling the compiler)
  4. Inner joins/conditional joins

Please don’t get overwhelmed by these 4 similar sounding words- they are easy and I am gonna keep this real short.  

Outer joins

Ok, so outer joins are basically join without any condition. If you simple use the term ‘join’ the compiler does a cross product of elements of two tables. Join, cross join, full outer join, full join work exactly similar. Lets look at our example and understand better. 

12 rows- as promised. 3*4 from both the tables. 

In outer joins, we have Left outer and right outer joins too. These are basically taking the common rows from both the tables and all the rows from either the left table or right table based on your syntax. These joins require a common attribute. Look at this example- 

If we look at tables as venn diagram, we could make a following analogy — 

Exactly similar stuff happens with right joins. Both of them are special cases of outer / full joins

Natural Joins

No, it doesn’t have any organic certification. It is termed as ‘natural’ because the complier does the job based on its understanding of all the common columns between the two tables. In or example here, aadhar_no is a common attribute here, so the join happened here is inner — that is; only display the common rows in both tables. 

Self Joins

These joins are basically playing smart with the compiler. What we do here is giving two alias to the same table, thus asking the engine to treat the same table as two different entity and then providing one/more common attribute which has comparable and same data type. For example- if we are asked to find the employeeid of employees which are managers too, we can self join the employees table with itself on the condition —

 table1.employee_id = table2.manager_id

Inner Joins

These are outer joins + some filter. If you provide the common attribute on which you are trying to base your join on, your join then and there becomes an inner join. In some cases they are also called conditional joins. The ‘on’ is used to make aware the complier your columns, after which you can use ‘where’ or ‘and’ to specify further filters to get the desired output. Lets take an example. 

This just fetches you the intersection of both the table’s rows, hence the self-explanatory name — inner join. Let’s look at an example where we use a condition along with the common attributes. Let’s say we want to find out every detail of the person whose favorite show is How I met your mother and loves burgers. We can do something like- 

Just like that; it’s that simple. You have got the playground rulebook, now go and do fancy stuff with these! Oh, don’t forget to keep a close eye on the data types you are trying to join with- sometimes they play trickery. 

Pro tip: A little flavor of joining more than 2 tables using multiple join statements- I bet they are piece of cake for you now, aren’t they? 

Q. Suppose we have been asked to find the city from where the most managers belong to. (from the infamous HR schema) 

A. What we can do here is, join departments table with employees table based on manager id, and then join this derived output to location table to fetch the city names and use count aggregate function to count the number of managers. Something like this- 

Congratulations! you have ‘Joined’ the exclusive SQL community with this knowledge now.  

This site uses Akismet to reduce spam. Learn how your comment data is processed.