Skip to main content
HomeTutorialsSQL

SQL NOT IN Operator: A Comprehensive Guide for Beginners

Master SQL's NOT IN operator with this beginner's guide. Learn to filter data effectively, avoid common pitfalls, and explore efficient alternatives
Updated Feb 2024  · 5 min read

In the realm of data analysis and management, SQL (Structured Query Language) stands as a critical tool for accessing and manipulating database systems. Among its many functions, the NOT IN operator serves as a powerful means to filter data sets, allowing users to exclude rows that match a specified list of values. This tutorial is designed with the aim of demystifying the NOT IN operator for beginners and data practitioners, ensuring clarity in its application and understanding its potential pitfalls.

Why NOT IN?

Choosing to focus on the NOT IN operator stems from its utility in data exclusion scenarios. Whether you're cleaning data, preparing reports, or conducting complex analyses, understanding how to effectively exclude specific data points is invaluable. This operator offers a straightforward syntax but comes with nuances that can affect performance and accuracy.

Our decision to explore NOT IN in depth is to provide a comprehensive understanding that equips you with the knowledge to use it effectively in your SQL queries.

Understanding SQL NOT IN

The Basics of NOT IN

The NOT IN operator is used within a WHERE clause to exclude rows where a specified column's value matches any in a given list of values. Its basic syntax is as follows:

SELECT column_names
FROM table_name
WHERE column_name NOT IN (value1, value2, ...);

This syntax highlights the operator's role in filtering data—a critical step in data analysis and database management.

Practical Applications of NOT IN

Consider a database containing a table of customer information. If you wish to select customers who are not from specific cities, NOT IN becomes an invaluable tool, allowing you to exclude those cities from your result set easily.

Step-by-Step NOT IN Examples

Example 1: Excluding Specific Values

Let's say we have a table named Customers with columns CustomerID, CustomerName, and City. To find customers who are not located in 'New York' or 'Los Angeles', the query would be:

CustomerlD

CustomerName

City

1

John Doe

New York

2

Jane Smith

Los Angeles

3

Emily Jones

Chicago

4

Chris Brown

Miami

5

Alex Johnson

San Francisco

6

Jessica White

New York

Customers Table

SELECT CustomerName, City
FROM Customers
WHERE City NOT IN ('New York', 'Los Angeles');

CustomerName

City

Emily Jones

Chicago

Chris Brown

Miami

Alex Johnson

San Francisco

Example 2: Using NOT IN with Subqueries

NOT IN can also work with subqueries. For example, to find products that have not been ordered:

ProductID

ProductName

1

Apple

2

Banana

3

Orange

4

Pear

5

Grape

Table Products

OrderID

ProductID

101

2

102

4

103

2

104

3

Order Table

SELECT ProductName
FROM Products
WHERE ProductID NOT IN (
  SELECT ProductID
  FROM Orders
);

ProductName

Apple

Grape

Common Pitfalls and How to Avoid Them

Handling NULL Values

A common issue with NOT IN arises when the list of values contains NULL. Because NULL represents an unknown value, any comparison with NULL using NOT IN will not return any rows, even if there are rows that should logically be excluded from the list.

To circumvent this, ensure that the list of values does not contain NULL or use an alternative approach such as NOT EXISTS.

Example:

Suppose we have a table Orders with a column CustomerID, some of which are NULL to represent guest orders. If you run a query to find orders not made by certain customers, including NULL in the list would cause no results to be returned.

OrderID

CustomerID

101

3

102

4

103

2

104

 

106

1

Orders Table

-- Assume we want to exclude CustomerID 1, 2, and unknown (NULL) customers
SELECT OrderID
FROM Orders
WHERE CustomerID NOT IN (1, 2, NULL);

This query would return no rows, which is likely not the intended outcome.

Solution: To ensure accurate results, avoid including NULL in the list for NOT IN. Alternatively, use a combination of NOT IN for known values and IS NOT NULL for handling NULLs effectively.

SELECT OrderID
FROM Orders
WHERE CustomerID NOT IN (1, 2) AND CustomerID IS NOT NULL;

OrderID

101

102

Performance Considerations

For large datasets or subqueries, NOT IN can be less efficient than alternatives like NOT EXISTS or LEFT JOIN /IS NULL. The inefficiency stems from how NOT IN compares each row in the table to each value in the list, which can lead to slow performance on large datasets.

Alternatives to SQL NOT IN

Using NOT EXISTS

NOT EXISTS is often recommended over NOT IN when dealing with subqueries that might return NULL values. It is generally more efficient because it stops as soon as it finds a match.

Example:

To find products that have not been ordered, using NOT EXISTS:

SELECT ProductName
FROM Products p
WHERE NOT EXISTS (
  SELECT 1
  FROM Orders o
  WHERE o.ProductID = p.ProductID
);

ProductName

Apple

Grape

This query checks for the non-existence of a product in the orders table, which can be more efficient than comparing against a potentially large list of IDs with NOT IN.

LEFT JOIN / IS NULL

Another efficient alternative for excluding rows is to use a LEFT JOIN combined with a WHERE ... IS NULL clause. This method is particularly effective for large datasets.

Example:

To achieve the same goal of finding products that have not been ordered:

SELECT p.ProductName
FROM Products p
LEFT JOIN Orders o ON p.ProductID = o.ProductID
WHERE o.ProductID IS NULL;

ProductName

Apple

Grape

This method leverages the LEFT JOIN to include all products and any matching orders, then filters out those products that have orders using WHERE o.ProductID IS NULL, effectively excluding them.

Best Practices for Using NOT IN

  • Verify the list for NULL values: Ensure the list used in NOT IN does not contain NULL values to avoid unexpected results.
  • Consider dataset size: For large datasets or subqueries, evaluate whether alternatives like NOT EXISTS might offer better performance.
  • Indexing: Ensure that the columns used in the NOT IN clause are indexed, if possible, to improve query performance.

Using NOT IN in Real Life Data Project

In data analysis, the "Not IN" SQL command is commonly used to retrieve specific data. In this section, we will work with a bookstore database that keeps track of its inventory (books available in the store) and sales (books sold). Our goal is to identify which books have never been sold.

We have two tables:

  • Inventory: to store information about books in the bookstore.
  • Sales: to store records of books that have been sold.

image4.png

Inventory Table

image3.png

Sales Table

Now, we want to find out which books have never been sold.

SELECT Title, Author 
FROM Inventory 
WHERE BookID NOT IN (
    SELECT BookID 
    FROM Sales
);

image1.png

Next, we want to identify books in our inventory that have not been sold in the last month, assuming today is February 7, 2024. This involves checking the SaleDate in the Sales table against our current date.

SELECT Title, Author 
FROM Inventory 
WHERE BookID NOT IN (
    SELECT BookID 
    FROM Sales
    WHERE SaleDate >= date('now', '-1 month')
);

image2.png

Conclusion

The NOT IN operator is a versatile tool in SQL for excluding specific values from your query results. By understanding its syntax, practical applications, and potential pitfalls, you can effectively use this operator in your data manipulation tasks.

Remember to consider NULL values and dataset size to optimize your queries' performance and accuracy. As you become more familiar with NOT IN, you'll find it an invaluable addition to your SQL toolkit, enabling more precise and efficient data analysis and management.

You can learn more about NOT IN and NOT EXISTS in our Improving Query Performance in SQL Server course and explore SQL operators and much more in our SQL Basics Cheat Sheet and SQL Fundamentals skill track.


Photo of Abid Ali Awan
Author
Abid Ali Awan

I am a certified data scientist who enjoys building machine learning applications and writing blogs on data science. I am currently focusing on content creation, editing, and working with large language models.

Topics

Start Your SQL Learning Journey Today!

Certification available

Course

Intermediate SQL

4 hr
174.4K
Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know to analyze data using your own SQL code today!
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

A Comprehensive Guide to SQL Certification on DataCamp

Explore SQL certification with DataCamp's courses & tutorials. Gain practical skills & theoretical knowledge to excel in data analytics and advance your career.
Matt Crabtree's photo

Matt Crabtree

8 min

Top 5 SQL Server Certifications: A Complete Guide

Unlock SQL Server certification success with our guide on paths, preparation with DataCamp, and the top certifications to enhance your career.
Matt Crabtree's photo

Matt Crabtree

8 min

Becoming Remarkable with Guy Kawasaki, Author and Chief Evangelist at Canva

Richie and Guy explore the concept of being remarkable, growth, grit and grace, the importance of experiential learning, imposter syndrome, finding your passion, how to network and find remarkable people, measuring success through benevolent impact and much more. 
Richie Cotton's photo

Richie Cotton

55 min

Mastering SQL ROUND: Precision Handling in Data Analysis

Learn how to use the SQL ROUND function in this tutorial. It covers the implementation of the ROUND function in Oracle, MySQL, PostgreSQL, and SQL Server.
Laiba Siddiqui's photo

Laiba Siddiqui

10 min

Mastering SQL NOT EQUAL Operator: A Beginner's Guide

Unlock the power of SQL NOT EQUAL with our expert guide. Learn to refine data queries with practical examples and optimization tips for better analysis.
Abid Ali Awan's photo

Abid Ali Awan

5 min

SQL CONTAINS: A Comprehensive Tutorial

Unlock the power of SQL CONTAINS for advanced text searches. Dive into logical operators, proximity searches, and wildcard uses for precise data analysis.
Abid Ali Awan's photo

Abid Ali Awan

5 min

See MoreSee More