Optimisation of where afield = coalesce(value1, value2 )

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Optimisation of where afield = coalesce(value1, value2 )

Paul Reeves

Given this simple SQL statement

  select e.emp_no, e.full_name
  from employee e
  where e.emp_no = coalesce(value1,value2 )

the plan varies depending on the following :

1.  where e.EMP_NO = COALESCE(?, ? ) -- PLAN (E INDEX (RDB$PRIMARY7))

2.  where e.EMP_NO = COALESCE(2, ? ) -- PLAN (E INDEX (RDB$PRIMARY7))

3.  where e.EMP_NO = COALESCE(?, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7))

4.  where e.EMP_NO = COALESCE(2, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7))

5.  where e.EMP_NO = COALESCE(NULL, 2 ) -- PLAN (E INDEX (RDB$PRIMARY7))

6.  where e.EMP_NO = COALESCE(NULL, e.emp_no ) -- PLAN (E NATURAL)

7.  where e.EMP_NO = COALESCE(2, e.emp_no ) -- PLAN (E NATURAL)

8.  where e.EMP_NO = COALESCE(?, e.emp_no ) -- PLAN (E NATURAL)


Examples 1 and 2 are correct. The index on EMP_NO exists, so use it.

Example 3 is more interesting. The coalesce may evaluate to a value or
to NULL. So even if NULL is evaluated there is no harm in choosing an
index.

Examples 4 and 5 will always evaluate to a value, so obviously an index
should be used.

Example 6 is interesting. As each row is selected emp_no will always
evaluate to itself so the result set is the entire table. In this case
a natural scan is correct.

Example 7 certainly seems to have chosen the wrong plan. We already
know the value of the expression when the statement is prepared. And
examples 2, and 4 seem to prove that the optimiser is capable of
evaluating the COALESCE during prepare and choosing an index.

Example 8 could be considered correct if and only if ? evaluates to
NULL. However, it seems to assume that the parameter will always be
NULL and thus a natural scan should be used. However we have seen in
examples 1 and 3 that it is equally capable of choosing an index when
evaluating the coalesce even if the final evaluation may be null.

And so here are the questions

- is the plan produced in example 8 wrong or right ?
- the above examples seem to lack coherence so how much is COALESCE
  evaluated during a prepare ?
- should the optimiser assume that the first parameter will
  always resolve to a value
- how does the optimiser consider COALESCE when this construct is used
  in a stored procedure?


Paul
--
Paul Reeves
http://www.ibphoenix.com
Supporting users of Firebird
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Adriano dos Santos Fernandes-3
On 07/07/2017 11:19, Paul Reeves wrote:

> Given this simple SQL statement
>
>   select e.emp_no, e.full_name
>   from employee e
>   where e.emp_no = coalesce(value1,value2 )
>
> the plan varies depending on the following :
>
> 1.  where e.EMP_NO = COALESCE(?, ? ) -- PLAN (E INDEX (RDB$PRIMARY7))
>
> 2.  where e.EMP_NO = COALESCE(2, ? ) -- PLAN (E INDEX (RDB$PRIMARY7))
>
> 3.  where e.EMP_NO = COALESCE(?, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7))
>
> 4.  where e.EMP_NO = COALESCE(2, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7))
>
> 5.  where e.EMP_NO = COALESCE(NULL, 2 ) -- PLAN (E INDEX (RDB$PRIMARY7))
>
> 6.  where e.EMP_NO = COALESCE(NULL, e.emp_no ) -- PLAN (E NATURAL)
>
> 7.  where e.EMP_NO = COALESCE(2, e.emp_no ) -- PLAN (E NATURAL)
>
> 8.  where e.EMP_NO = COALESCE(?, e.emp_no ) -- PLAN (E NATURAL)
>
>

...

> And examples 2, and 4 seem to prove that the optimiser is capable of
> evaluating the COALESCE during prepare and choosing an index.
...

> However we have seen in
> examples 1 and 3 that it is equally capable of choosing an index when
> evaluating the coalesce even if the final evaluation may be null.
>
> And so here are the questions
>
> - is the plan produced in example 8 wrong or right ?
> - the above examples seem to lack coherence so how much is COALESCE
>   evaluated during a prepare ?
> - should the optimiser assume that the first parameter will
>   always resolve to a value
> - how does the optimiser consider COALESCE when this construct is used
>   in a stored procedure?
>
>
It does not evaluate COALESCE at prepare time.

It's just a expression which may or may not contain fields.

If it doesn't contain fields, it will be the same as a simple "?" for
the plan calculation purposes.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Paul Reeves
On Fri, 7 Jul 2017 11:27:26 -0300 Adriano dos Santos Fernandes wrote

> >  
> It does not evaluate COALESCE at prepare time.
>
> It's just a expression which may or may not contain fields.
>
> If it doesn't contain fields, it will be the same as a simple "?" for
> the plan calculation purposes.
>
>

OK. That is not the answer I was hoping for :-)

But that doesn't answer all my questions...

Given

  where e.EMP_NO = COALESCE(?, e.emp_no )

and that there is an index on EMP_NO, why doesn't the optimiser default
to the index. After all, it is logically more likely that a value will
be passed in the where condition, rather than a NULL.

In fact, considering the index is a primary key

 where e.EMP_NO = NULL

would make no sense at all. Surely the logic ought to be

  if unique index on field then use index
 

Also, even though a field is specified in the coalesce why can't I
force the plan...

  PLAN (E INDEX (RDB$PRIMARY7))

This error is raised...

  'cannot be used in the specified plan'



Paul
--
Paul Reeves
http://www.ibphoenix.com
Supporting users of Firebird
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dmitry Yemanov-3
07.07.2017 17:51, Paul Reeves wrote:

> But that doesn't answer all my questions...
>
> Given
>
>    where e.EMP_NO = COALESCE(?, e.emp_no )
>
> and that there is an index on EMP_NO, why doesn't the optimiser default
> to the index. After all, it is logically more likely that a value will
> be passed in the where condition, rather than a NULL.

To evaluate COALESCE, e.emp_no must be known. How it can be known before
we start reading the table (via index scan)? Chicken and egg problem.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dimitry Sibiryakov-3
07.07.2017 17:07, Dmitry Yemanov wrote:

>> Given
>>
>>    where e.EMP_NO = COALESCE(?, e.emp_no )
>>
>> and that there is an index on EMP_NO, why doesn't the optimiser default
>> to the index. After all, it is logically more likely that a value will
>> be passed in the where condition, rather than a NULL.
>
> To evaluate COALESCE, e.emp_no must be known. How it can be known before we start reading
> the table (via index scan)? Chicken and egg problem.

   In this particular case it is enough to know parameter value to choose plan. Parameters
are known before reading table.

--
   WBR, SD.

PS: Is it only my Thunderbird has started to send answers to authors instead of list?

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dmitry Yemanov-3
In reply to this post by Dmitry Yemanov-3
07.07.2017 18:12, Dimitry Sibiryakov wrote:
>
> In this particular case it is enough to know parameter value to
> choose plan. Parameters are known before reading table.

True, but our optimizer is developed for generic cases, not such
specific ones. It could be improved, but I'd say we have more important
issues to solve there.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Adriano dos Santos Fernandes-3
On 07/07/2017 12:19, Dmitry Yemanov wrote:
> 07.07.2017 18:12, Dimitry Sibiryakov wrote:
>>
>> In this particular case it is enough to know parameter value to
>> choose plan. Parameters are known before reading table.
>
> True, but our optimizer is developed for generic cases, not such
> specific ones. It could be improved, but I'd say we have more
> important issues to solve there.
>
BTW, isn't ConditionalStream used for something in this field?


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Leyne, Sean
In reply to this post by Dmitry Yemanov-3


> To evaluate COALESCE, e.emp_no must be known.

Why?

Is it not really the case that for all practical purposes the COALESCE will always return a value

How else could COALESCE(?, NULL ) [Example #3] use an index?
{Nothing says that the input parameter won't be NULL}


Sean


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dmitry Yemanov-3
In reply to this post by Adriano dos Santos Fernandes-3
07.07.2017 18:26, Adriano dos Santos Fernandes wrote:

> BTW, isn't ConditionalStream used for something in this field?

Yep, but the optimizer so far handles just one specific case (A = ? or ?
is null). It could be extended though.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Paul Reeves
In reply to this post by Dmitry Yemanov-3
On Fri, 7 Jul 2017 18:07:55 +0300 Dmitry Yemanov wrote

> 07.07.2017 17:51, Paul Reeves wrote:
>
> > But that doesn't answer all my questions...
> >
> > Given
> >
> >    where e.EMP_NO = COALESCE(?, e.emp_no )
> >
> > and that there is an index on EMP_NO, why doesn't the optimiser
> > default to the index. After all, it is logically more likely that a
> > value will be passed in the where condition, rather than a NULL.  
>
> To evaluate COALESCE, e.emp_no must be known. How it can be known
> before we start reading the table (via index scan)? Chicken and egg
> problem.
>

I understand that evaluating COALESCE(?, e.emp_no ) at prepare time
may require a circular logic and is thus impractical but surely
the more important piece of information is in the

  where e.emp_no =

A unique index exists on this column so surely the optimiser should
choose it instead of a natural scan ?


The other question I asked was...

    why can't I force the plan...
 
      PLAN (E INDEX (RDB$PRIMARY7))

    This error is raised...

      'cannot be used in the specified plan'


And a supplementary question (perhaps it merits a separate thread )

What are the rules for deciding when a PLAN statement will be rejected?
From one logical point of view if have this sort of query

  select * from mytable m where afield = whatever;

and I add

  plan (m index(myindex ))

the engine should accept that. Shouldn't it?



Paul
--
Paul Reeves
http://www.ibphoenix.com
Supporting users of Firebird
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dmitry Yemanov-3
10.07.2017 12:17, Paul Reeves пишет:
> On Fri, 7 Jul 2017 18:07:55 +0300 Dmitry Yemanov wrote
>
>> 07.07.2017 17:51, Paul Reeves wrote:
>>
> I understand that evaluating COALESCE(?, e.emp_no ) at prepare time
> may require a circular logic and is thus impractical

It cannot be done at runtime either (without deducting that e.emp_no is
the same in both paths of the condition).

> but surely
> the more important piece of information is in the
>
>    where e.emp_no =
>
> A unique index exists on this column so surely the optimiser should
> choose it instead of a natural scan ?

No, it cannot. Imagine "where T.A = T.B", it cannot use an index for
either A or B. Only full table scan is possible.

> The other question I asked was...
>
>      why can't I force the plan...
>    
>        PLAN (E INDEX (RDB$PRIMARY7))

Because it's invalid in this case.

> What are the rules for deciding when a PLAN statement will be rejected?
>>From one logical point of view if have this sort of query
>
>    select * from mytable m where afield = whatever;
>
> and I add
>
>    plan (m index(myindex ))
>
> the engine should accept that. Shouldn't it?

It depends on "whatever". If it's literal or independent expression or
priorly evaluated field, index scan can be used. If it includes a field
from the same stream, it cannot.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Paul Reeves
On Mon, 10 Jul 2017 12:31:43 +0300 Dmitry Yemanov wrote

>
> > The other question I asked was...
> >
> >      why can't I force the plan...
> >    
> >        PLAN (E INDEX (RDB$PRIMARY7))  
>
> Because it's invalid in this case.
>

I can understand that this plan might appear to be invalid from the
perspective of the optimiser. But surely the whole point of adding the
PLAN clause is because I think I know better than the optimiser what I
want. So to return to the original query:

  where e.EMP_NO = COALESCE( ?, e.emp_no )

surely I ought to be able to add

  PLAN (E INDEX (RDB$PRIMARY7))  

Then if the parameter resolves to a value at run-time the index will be
used. Obviously if the parameter is null performance would be much
worse than a natural scan because the engine must walk the index pages
and then walk the data pages. But the customer is always right,
surely :-)

( I also understand there is a downside to the customer always being
right argument - one bad customer could screw everything up for all the
others. But is that argument being used here or is it just because of
the way the optimiser works? )


> > What are the rules for deciding when a PLAN statement will be
> > rejected?  
> >>From one logical point of view if have this sort of query  
> >
> >    select * from mytable m where afield = whatever;
> >
> > and I add
> >
> >    plan (m index(myindex ))
> >
> > the engine should accept that. Shouldn't it?  
>
> It depends on "whatever". If it's literal or independent expression
> or priorly evaluated field, index scan can be used. If it includes a
> field from the same stream, it cannot.
>

Why not create a second stream in that case?

that seems to be what happens when I create this query:

   SELECT  *
   FROM employee
   WHERE
     EMP_NO = 2  
   OR
     EMP_NO is null

this plan is chosen...

  PLAN (EMPLOYEE INDEX (RDB$PRIMARY7, RDB$PRIMARY7))

Because there are two streams the fetches and the indexed reads more or
less double because of the OR clause.

(Again, a subject for another day is why does the optimiser even bother
with the second stream when EMP_NO can never be null ? )

Anyway, none of the above is any sort of criticism. I'm just trying to
get a better understanding of the rules and maybe find out where things
are broken that could be fixed easily (or hacked around).


Paul
--
Paul Reeves
http://www.ibphoenix.com
Supporting users of Firebird
 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimisation of where afield = coalesce(value1, value2 )

Dmitry Yemanov-3
10.07.2017 14:20, Paul Reeves wrote:
>
> I can understand that this plan might appear to be invalid from the
> perspective of the optimiser. But surely the whole point of adding the
> PLAN clause is because I think I know better than the optimiser what I
> want.

Unless the engine physically cannot execute your plan. This was exactly
the case.

> So to return to the original query:
>
>    where e.EMP_NO = COALESCE( ?, e.emp_no )
>
> surely I ought to be able to add
>
>    PLAN (E INDEX (RDB$PRIMARY7))
>
> Then if the parameter resolves to a value at run-time the index will be
> used. Obviously if the parameter is null performance would be much
> worse than a natural scan because the engine must walk the index pages
> and then walk the data pages. But the customer is always right,
> surely :-)

You seem to believe that the engine can execute the INDEX plan without
lower/upper bounds (which require e.emp_no to be known in advance). In
theory, it could. In practice, it's pointless and thus prohibited by the
optimizer.

> Why not create a second stream in that case?
>
> that seems to be what happens when I create this query:
>
>     SELECT  *
>     FROM employee
>     WHERE
>       EMP_NO = 2
>     OR
>       EMP_NO is null
>
> this plan is chosen...
>
>    PLAN (EMPLOYEE INDEX (RDB$PRIMARY7, RDB$PRIMARY7))

This is still a single stream and bitmap based on two index scans.

Solution with two streams (chosen conditionally at runtime) is
implemented in FB3 but only for some specific syntax. Perhaps your
COALESCE trick could join this special category, but it doesn't yet.

> Because there are two streams the fetches and the indexed reads more or
> less double because of the OR clause.
>
> (Again, a subject for another day is why does the optimiser even bother
> with the second stream when EMP_NO can never be null ? )

The optimizer don't take NOT NULL constraints into account. They don't
have existence locks and thus can affect query results if dropped in the
meantime.


Dmitry

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Loading...