Opened 5 years ago

Closed 5 years ago

#2068 closed defect (fixed)

induced condenser treats null values different from regular condensers

Reported by: Dimitar Misev Owned by: apercov
Priority: major Milestone: 9.8
Component: qlparser Version: 9.7
Keywords: Cc: Peter Baumann
Complexity: Medium

Description (last modified by Dimitar Misev)

In regular condense and shorthand condensers like avg_cells, null values are ignored (cf. documentation).

In the induced condenser this is not the case: null values are treated the same way as in binary induced operation, i.e. the result is null if any cell in the operands (or slices in induced condenser) is null (cf. documentation).

With an example, these two expressions are equivalent and hence null value treatment is the same:

select condense +
       over i in [0:1]
       using coll3d[ i[0], 0:100, 0:100 ]
from coll3d

=

select coll3d[0, 0:100, 0:100] + 
       coll3d[1, 0:100, 0:100]
from coll3d

The correct equivalence of the first condense expression, however, should be:

select marray i in [0:100, 0:100]
       values add_cells( coll3d[ 0:1, i[0], i[1] ] )
from coll3d

Attachments (2)

example.png (296.6 KB ) - added by Dimitar Misev 5 years ago.
0001-ticket-2068-Extend-binary-induced-operations-with-op .patch (25.9 KB ) - added by Dimitar Misev 5 years ago.
current patch

Download all attachments as: .zip

Change History (15)

comment:1 by Dimitar Misev, 5 years ago

Description: modified (diff)

comment:2 by Dimitar Misev, 5 years ago

The attached example shows the problem.


Slice 1:

for $c1 in ( openEO_S2_32632_10m_L2A ) return encode ( (unsigned char)

((((((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(2)] - 
             $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(2)]) /
    ((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(2)] + 
             $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(2)]))
  ) + 1.0) * 127.5)
, "tiff" )

Slice 2:

for $c1 in ( openEO_S2_32632_10m_L2A ) return encode ( (unsigned char)

((((((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(1)] - 
             $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(1)]) /
    ((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(1)] + 
             $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE:"CRS:1"(1)]))
  ) + 1.0) * 127.5)
, "tiff" )

Induced max:

for $c1 in ( openEO_S2_32632_10m_L2A ) return encode ( (unsigned char)

(((condense max 
   over $pm t (imageCrsDomain($c1[DATE("2018-06-16T00:00:00Z":"2018-06-23T00:00:00Z")],DATE)) 
   using (((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE($pm)] - 
                   $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE($pm)]) /
          ((double)$c1.B08[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE($pm)] + 
                   $c1.B04[E(642611.0051231487:689621.8539048014),N(5107122.827708254:5219211.141179815),DATE($pm)]))
  ) + 1.0) * 127.5)
, "tiff" )
Last edited 5 years ago by Dimitar Misev (previous) (diff)

by Dimitar Misev, 5 years ago

Attachment: example.png added

comment:3 by Dimitar Misev, 5 years ago

When the induced max condenser is replaced with an equivalent coverage constructor it works as expected, e.g.

for $c1 in ( openEO_S2_32632_10m_L2A ) return encode (
   coverage maxcov
   over $pe E(26265:26366), $pn N(-1920:3289)
   values max($c1.B08[E($pe),N($pn),DATE("2018-06-16T00:00:00Z":"2018-06-23T00:00:00Z")])
, "tiff" )

This is not a good alternative due to significantly worse performance, however.

comment:4 by Vlad Merticariu, 5 years ago

Makes sense, since the operation performed at every step is actually an induced one, so equivalent to max(slice1, slice2).

This would mean supporting different behavior of nulls in the same operation. Max is not the best example, but if we think of +, we would need to support val + null = null (current case, which would produce results only in the overlapping area) and val + null = val, where null is treated like an identity element.

comment:5 by Dimitar Misev, 5 years ago

So having some sort of a modifier for binary operations, e.g. NULL VALUES IDENTITY?

comment:6 by Vlad Merticariu, 5 years ago

Yes, that would be a solution. However, after discussion with Peter we shouldn't deviate from the standard SQL behavior (which is the one currently implemented, where val + null = null) in the general case. There remains the question of whether this should be specifically supported in induced condensers.

comment:7 by Dimitar Misev, 5 years ago

I think the induced condenser is understood as an aggregation operation, so the "val + null = val" rule is the appropriate one to apply.

comment:8 by Vlad Merticariu, 5 years ago

Owner: changed from Vlad Merticariu to ahambasan
Status: newassigned

Ok, makes sense to me. In order to support this, the following steps are necessary:

  • binary induced operations must be extended with an option which indicates that null values should be treated as identity (i.e. op(val, null) = op(null, val) = val instead of op(val, null) = op(null, val) = null which is the current behavior and should remain the default one). This option is only for internal use, it won't be exposed in the QL.
  • the induced condenser performs a binary induced operation at every step. The operation must be called with the option above enabled.

Test case: the last query in comment:2 should return data for the entire slice.

by Dimitar Misev, 5 years ago

current patch

comment:9 by Dimitar Misev, 5 years ago

Owner: changed from ahambasan to apercov

comment:10 by apercov, 5 years ago

Status: assignedaccepted

comment:11 by Dimitar Misev, 5 years ago

Here's a simpler rasql query that helps reproduce the problem easier:

rasql -q 'select encode(
condense + over i in [0:1] 
using (
<[0:1,0:2,0:2]

0, 0, 0;
1, 1, 1;
2, 2, 2;

3, 3, 3;
1, 1, 1;
0, 0, 0> [i[0], 0:2, 0:2]
),
"csv")' --out string | grep Result

this is essentially like doing + on the two slices

0, 0, 0          3, 3, 3
1, 1, 1    +     1, 1, 1
2, 2, 2          0, 0, 0

As a result we get as expected

{3,3,3},{2,2,2},{2,2,2}
=
3,3,3
2,2,2
2,2,2

Now if we set 0 as null value for the array constant:

rasql -q 'select encode(
condense + over i in [0:1] 
using (
<[0:1,0:2,0:2]

0, 0, 0;
1, 1, 1;
2, 2, 2;

3, 3, 3;
1, 1, 1;
0, 0, 0> [i[0], 0:2, 0:2] null values [0]
),
"csv")' --out string | grep Result

we get a bad result, because null value + value = null value

0,0,0
2,2,2
0,0,0

We want to get the same result as before though, i.e. the behavior should be like in add_cells:

null value + value = value

Relevant documentation: http://doc.rasdaman.org/04_ql-guide.html#nulls-in-aggregation-queries

comment:12 by Peter Baumann, 5 years ago

I see what has upset me: regular induced ops follow null op value = null, and the induced condense is defined as using op for iterative evaluation. However, taking SQL as the role model, there is an explicit "null elimination step", and that is what we need in our condensers, too. That leads us to the behaviour described above, null op value = value.

comment:13 by Dimitar Misev, 5 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.