While sometime ago, I was looking for a database project requirement to process analytical workload. The organisation choice was Oracle and Developers choice was postgres or vertica. But to handle Vertica we do not have that much of data processing actually needed so the final contenders are Oracle and Postgres.
As per licensing guidelines I cannot publish the details of outcome , but I would say the results are not good for me as a Oracle DBA , atleast in processing aggregate functions Oracle for a 2 million record table with more than 200 columns doing a count (*). But indeed we want an enterprise database and stable one not the open source database. Well our data in terms of distinctivity does not change much so some sort of approximation is also fine for us.
That is what now in 12.2, with Approximate Query Processing which is as per documentation
Approximate Query Processing
The 12.2 release extends the area of approximate query processing by adding approximate percentile aggregation. With this feature, the processing of large volumes of data is significantly faster than the exact aggregation. This is especially true for data sets that have a large number of distinct values with a negligible deviation from the exact result.
Approximate query aggregation is a common requirement in today's data analysis. It optimizes the processing time and resource consumption by orders of magnitude while providing almost exact results. Approximate query aggregation can be used to speed up existing processing.
Oracle provides a set of SQL functions that enable you to obtain approximate results with negligible deviation from the exact result. There are additional approximate functions that support materialized view based summary aggregation strategies. The functions that provide approximate results are as follows:
APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT_DETAIL
APPROX_COUNT_DISTINCT_AGG
TO_APPROX_COUNT_DISTINCT
APPROX_MEDIAN
APPROX_PERCENTILE
APPROX_PERCENTILE_DETAIL
APPROX_PERCENTILE_AGG
TO_APPROX_PERCENTILE
Approximate query processing can be used without any changes to your existing code. When you set the appropriate initialization parameters, Oracle Database replaces exact functions in queries with the corresponding SQL functions that return approximate results.
Parameter that effects the Approximate Query Processing
SQL> show parameter approx NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ approx_for_aggregation boolean TRUE approx_for_count_distinct boolean TRUE approx_for_percentile string ALL SQL>
When we use the functions , the value returns in hexadecimal to retrieve exact values we need to use associate to_* functions along with materialized views
SQL> select unit_code,APPROX_COUNT_DISTINCT_AGG(id) from (select unit_code,APPROX_COUNT_DISTINCT_DETAIL(ID) id from INSIGHTS.TABLE group by unit_code) group by unit_code; UNIT_CODE APPROX_COUNT_DISTINCT_AGG(ID) --------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------- MHIX115 0D0C0925009E00000000000000A1000000000000000000000040000000000000000000000000000000001000000000000000000000000000000000000000000000040000000000000000000000000000 POIX108 0D0C0DD500740000000000000075000000000000000000000000000080000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000 PHIX365 0D0C0DD500270000000000000027000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000
So, let me create a MV first with my query
CREATE MATERIALIZED VIEW test_approx_agg_mv AS SELECT unit_code, APPROX_COUNT_DISTINCT_DETAIL(id) id FROM INSIGHTS.TABLE GROUP BY unit_code;
Then let's access the query as below, as you see the values provided are approximate , when you compare the values with original count (second query) SQL> SELECT unit_code, TO_APPROX_COUNT_DISTINCT(id) "ids" FROM test_approx_agg_mv ORDER BY unit_code; UNIT_CODE ids --------------- ---------- ABR13 2 ABT10 42 ABT13 63 SQL> SELECT unit_code, count(id) from INSIGHTS.TABLE where unit_code in ('ABR13','ABT10','ABT13') group by unit_code; UNIT_CODE COUNT(id) --------------- ------------- ABR13 11 ABT13 264 ABT10 202
If i directly try to access the column with regular count we get an error;
SQL>SELECT unit_code, count(id) from test_approx_agg_mv ORDER BY unit_code * ERROR at line 1: ORA-00932: inconsistent datatypes: expected - got BLOB SQL>
So use the approximation function with associate function to retrieve the data, this works well with MV's rather plain queries
-Thanks
Suresh
Follow Me!!!