+ 1

Explain solution please

There is this sql question: An employee is a manager if any other employee has their managerId set to the first employees id. An employee who is a manager may or may not also have a manager. TABLE employees id INTEGER NOT NULL PRIMARY KEY managerId INTEGER REFERENCES employees(id) name VARCHAR(30) NOT NULL Write a query that selects the names of employees who are not managers. The solution I found was: select name from employees where id not in(select managerId from employees where managerid is not null); I cannot see why the 'where managerid is not null' part is necessary? Surely it would only make a difference if the id's were ever null, but they are not. Could someone please explain? Also, why does Select name from employees where id <> managerid; not work?

14th Apr 2018, 4:41 PM
Tom
7 Réponses
+ 8
I've understood your question now :) Here is an interesting issue about null: Correct: select * from employees where id is null Incorrect: select * from employees where id==null The second query is incorrect because comparing any value directly (using ==, != or <>) with null will produce "undefined". Null should be checked with "is null" or "is not null". Now, IN and NOT IN statement breaks down it's arguments in following way: id IN (1, 2, null) = (id==1) OR (id==2) OR (id==null) id NOT IN (1, 2, null) = (id!=1) AND (id!=2) AND (id!=null) Since IN statement compares the arguments using == or !=, keeping null inside IN statement produces undefined state. Reference: https://stackoverflow.com/questions/129077/not-in-clause-and-null-values
20th Apr 2018, 3:43 PM
Shamima Yasmin
Shamima Yasmin - avatar
+ 7
According to your requirement, employees may or may not have a manager. So if an employee doesn't have a manager, then managerid of that particular row is supposed to be null. (Note: id can't be null but managerid is nullable) While tracing a query involving sub-query, trace the inner parts first. Inner query: It'll return the managerid of all employees whose managerids are not null. Outer query: The inner query returns all employee ids of managers, and the outer query uses the ids as a set and returns the names of the employees whose ids are not in the previous set. For example, id 3, 12 and 15 are managers. So the outer query is equivalent to : SELECT name FROM employees WHERE id NOT IN (3, 12, 15) Therefore, it returns the names of non-managers only. The last stated query doesn't work because an employee can't be his own manager.
14th Apr 2018, 6:39 PM
Shamima Yasmin
Shamima Yasmin - avatar
+ 7
Please note that the "not null" condition is checked for the inner query, not for the outer one. Say, John and Maria are two managers. John is manager of Ricky and Sam. Maria is manager of Peter and Harry. The table is as follows: id | manager id | name --------------------------------------------------------- 1 null John 2 1 Ricky 3 null Maria 4 3 Peter 5 1 Sam 6 3 Harry Now, how will I get id of John and Maria? The only indicator of not being under a manager is: their manager id must be null. Thus this query will return 2,4,5,6 (ids of non-managers). The later part: you've already understood that. It will check ids in set (2,4,5,6), not (2,4,5,6, ... , null). Because null is not a part of outer query.
15th Apr 2018, 3:28 AM
Shamima Yasmin
Shamima Yasmin - avatar
+ 4
I've never seen <> before. id != managerid; does the same thing.
14th Apr 2018, 4:53 PM
Toni Isotalo
Toni Isotalo - avatar
+ 1
Yes, I do see that the inner query returns a set against which the outer query can compare the employee ids. As you state, the inner query set is the employee ids of all managers. lets say this list is 2233 3455 4567 What I don't get is the need to remove the null values from the inner query list. i.e. why can't the outer query compare employee ids against 2233 3455 4567 null null The result should surely be the same- 2233, 3455, 4567 will all be excluded from the final result by virtue of being in that list. 9987, 7788 (employees who are not managers) will be included as they are not in that list.
14th Apr 2018, 7:56 PM
Tom
+ 1
Hi Shamima, Thanks for the explanation. I understand your explanation and see how the solution works. The problem is I do not see why select name from employees where id not in(select managerId from employees); does not work? You state: "The later part: you've already understood that. It will check ids in set (2,4,5,6), not (2,4,5,6, ... , null). Because null is not a part of outer query." My question is, but why can't (2,4,5,6... null) work just as well as a set to compare against in your example? Thanks
20th Apr 2018, 11:44 AM
Tom
0
This is the original test case. https://www.testdome.com/d/sql-interview-questions/17 number 7 workers. This is taken as a correct answer select name from employees where id not in (select managerId from employees m where managerId is not null) This answer is false select name from employees where id not in (select managerId from employees) I also do not understand why.
14th Apr 2018, 7:57 PM
sneeze
sneeze - avatar