How SQL DISTINCT and ORDER BY are Related

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

One of the things that confuse SQL users all the time is how DISTINCT and ORDER BY are related in a SQL query.
The Basics
Running some queries against the Sakila database, most people quickly understand:

SELECT DISTINCT length FROM film

This returns results in an arbitrary order, because the database can (and might apply hashing rather than ordering to remove duplicates):

length |
——-|
129 |
106 |
120 |
171 |
138 |
80 |

Most people also understand:

SELECT length FROM film ORDER BY length

This will give us duplicates, but in order:

length |
——-|
46 |
46 |
46 |
46 |
46 |
47 |
47 |
47 |
47 |
47 |
47 |
47 |
48 |

And, of course, we can combine the two:

SELECT DISTINCT length FROM film ORDER BY length

Resulting in…

length |
——-|
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |

Then why doesn’t this work?
Maybe somewhat intuitively, we may want to order the lengths differently, e.g. by title:

SELECT DISTINCT length FROM film ORDER BY title

Most databases fail this query with an exception like Oracle’s:

ORA-01791: not a SELECTed expression

At first sight, this seems funny, because this works after all:

SELECT length FROM film ORDER BY title

Yielding:

length |
——-|
86 |
48 |
50 |
117 |
130 |

We could add the title to illustrate the ordering

length |title |
——-|—————————-|
86 |ACADEMY DINOSAUR |
48 |ACE GOLDFINGER |
50 |ADAPTATION HOLES |
117 |AFFAIR PREJUDICE |
130 |AFRICAN EGG |

So, how are these different?
We have to rewind and check out the logical order of SQL operations (as opposed to the syntactic order). And always remember, this is the logical order, not the actual order executed by the optimiser.
When we write something like this:

SELECT DISTINCT length FROM film ORDER BY length

The logical order of operations is:
FROM clause, loading the FILM table
SELECT clause, projecting the LENGTH column
DISTINCT clause, removing distinct tuples (with projected LENGTH columns)
ORDER BY clause, ordering by the LENGTH column
If we look at this step by step, we have:
Step 1: SELECT * FROM film
The intermediary data set is something like:

film_id |title |length | …
——–|—————————-|——-| …
1 |ACADEMY DINOSAUR |86 | …
2 |ACE GOLDFINGER |48 | …
3 |ADAPTATION HOLES |50 | …
4 |AFFAIR PREJUDICE |117 | …
5 |AFRICAN EGG |130 | …
… |… |… | …

Step 2: SELECT length …
The intermediary data set is something like:

length |
——-|
86 |
48 |
50 |
117 |
130 |

86 |

X ITM Cloud News

Emily

Next Post

Beware of Hidden PL/SQL to SQL Context Switches

Sun Nov 24 , 2019
Spread the love          I recently stumbled upon a curious query on a customer’s productive Oracle database: SELECT USER FROM SYS.DUAL Two things caught my attention: The query was executed many billions of times per month, accounting for about 0.3% of that system’s load. That’s 0.3% for something extremely silly! I don’t […]
X- ITM

Cloud Computing – Consultancy – Development – Hosting – APIs – Legacy Systems

X-ITM Technology helps our customers across the entire enterprise technology stack with differentiated industry solutions. We modernize IT, optimize data architectures, and make everything secure, scalable and orchestrated across public, private and hybrid clouds.

This image has an empty alt attribute; its file name is x-itmdc.jpg

The enterprise technology stack includes ITO; Cloud and Security Services; Applications and Industry IP; Data, Analytics and Engineering Services; and Advisory.

Watch an animation of  X-ITM‘s Enterprise Technology Stack

We combine years of experience running mission-critical systems with the latest digital innovations to deliver better business outcomes and new levels of performance, competitiveness and experiences for our customers and their stakeholders.

X-ITM invests in three key drivers of growth: People, Customers and Operational Execution.

The company’s global scale, talent and innovation platforms serve 6,000 private and public-sector clients in 70 countries.

X-ITM’s extensive partner network helps drive collaboration and leverage technology independence. The company has established more than 200 industry-leading global Partner Network relationships, including 15 strategic partners: Amazon Web Services, AT&T, Dell Technologies, Google Cloud, HCL, HP, HPE, IBM, Micro Focus, Microsoft, Oracle, PwC, SAP, ServiceNow and VMware

.

X ITM