Monday, November 3, 2008

OBIEE Huge dimensions? Let’s split them!

OBIEE is (like most other tools) based on the Kimball methodology of representing date in a star model, using dimension and fact tables. The problem with dimension tables is that if they have several levels they can easily become huge and therefor take a long time to load. And if there is one users don't want is waiting.....


An example: a dimension with 5 levels and 25 categories on each level can give you 25^5 = 9.765.625 possibilities. An entry on level 1 would generated a "select distinct level_1 from dim_table". No mater how good your indexing is, a select distinct on almost 10 million rows is always slower then a select distinct on 25 rows.


How can we make this faster?


Step 1 is to organize your ETL is such a way that you get a dimension table for each level. (In practices you will probably group 2 or 3 levels together.)
Each dimension level table should also have the predecessor columns in them:





dim_table_level_1; level_1
dim_table_level_2; level_1, level_2
dim_table_level_3; level_1, level_2, level_3
dim_table_level_4; level_1, level_2, level_3, level_4
dim_table_level_5; level_1, level_2, level_3, level_4, level_5


Now we could map this in OBIEE as dim_table_level_1 => dim_table_level_2 => dim_table_level_3 => dim_table_level_4 => dim_table_level_5 => fact_table. But then again we would you loose the benefit of "small" dimension table by introducing costly joins.


Example:

select T36.LEVEL_01 as c1, sum(T56.F_FACT_VAL) as c2from
DIM_LEVEL_01 T36, DIM_LEVEL_02
T38, DIM_LEVEL_03 T41,
DIM_LEVEL_04 T45, DIM_LEVEL_05
T50, F_FACTS T56where ( T36.LEVEL_01 = T38.LEVEL_01 and T36.LEVEL_01 = T56.LEVEL_1 and T38.LEVEL_02 = T41.LEVEL_02 and T38.LEVEL_02 = T56.LEVEL_2 and T41.LEVEL_03 = T45.LEVEL_03 and T41.LEVEL_03 = T56.LEVEL_3 and T45.LEVEL_04 = T50.LEVEL_04 and T45.LEVEL_04 = T56.LEVEL_4 and T50.LEVEL_05 = T56.LEVEL_5 ) group by T36.LEVEL_01order by c1



Let OBIEE make the decision!


If you really want to maximize the speed of OBIEE you can use the "federated query" functionality. Beside splitting the dimension table we also split the dimension key in the fact table. (In a later article I will show you how you make your query's "aggregate aware" and speed them up a lot!.)
Your fact table will look like: level_1, level_2, level_3, level_4, level_5, fact_1
If you map this in physical layer as: dim_table_level_1 => fact_table, dim_table_level_2 => fact_table, dim_table_level_3 => fact_table, dim_table_level_4 => fact_table and dim_table_level_5 => fact_table and put all the dimension_tables in the same datasource in the business model layer:

Next we make the dimension accordenly:


You will see that OBIEE automatically choices the correct dim_table while drilling down:

level 1:
select T36.LEVEL_01 as c1,
sum(T56.F_FACT_VAL) as c2from DIM_LEVEL_01
T36, F_FACTS T56where ( T36.LEVEL_01 = T56.LEVEL_1
) group by T36.LEVEL_01order by c1



level 2:
select T38.LEVEL_01 as c1, T38.LEVEL_02 as
c2, sum(T56.F_FACT_VAL) as c3from
DIM_LEVEL_02 T38, F_FACTS
T56where ( T38.LEVEL_01 = 'A' and T38.LEVEL_02 = T56.LEVEL_2 ) group by
T38.LEVEL_01, T38.LEVEL_02order by c1, c2



level 3:
select T41.LEVEL_01 as c1, T41.LEVEL_02 as
c2, T41.LEVEL_03 as c3,
sum(T56.F_FACT_VAL) as c4from DIM_LEVEL_03
T41, F_FACTS T56where ( T41.LEVEL_01 = 'A' and
T41.LEVEL_02 = 'AA' and T41.LEVEL_03 = T56.LEVEL_3 ) group by T41.LEVEL_01,
T41.LEVEL_02, T41.LEVEL_03order by c1, c2, c3



level 4:
select T45.LEVEL_01 as c1, T45.LEVEL_02 as
c2, T45.LEVEL_03 as c3,
T45.LEVEL_04 as c4, sum(T56.F_FACT_VAL) as c5from
DIM_LEVEL_04 T45, F_FACTS
T56where ( T45.LEVEL_01 = 'A' and T45.LEVEL_02 = 'AA' and T45.LEVEL_03 =
'AAF' and T45.LEVEL_04 = T56.LEVEL_4 ) group by T45.LEVEL_01, T45.LEVEL_02,
T45.LEVEL_03, T45.LEVEL_04order by c1, c2, c3, c4



level 5:
select T50.LEVEL_01 as c1, T50.LEVEL_02 as
c2, T50.LEVEL_03 as c3,
T50.LEVEL_04 as c4, T50.LEVEL_05 as
c5, sum(T56.F_FACT_VAL) as c6from
DIM_LEVEL_05 T50, F_FACTS
T56where ( T50.LEVEL_01 = 'A' and T50.LEVEL_02 = 'AA' and T50.LEVEL_03 =
'AAF' and T50.LEVEL_04 = 'AAFG' and T50.LEVEL_05 = T56.LEVEL_5 ) group by
T50.LEVEL_01, T50.LEVEL_02, T50.LEVEL_03, T50.LEVEL_04, T50.LEVEL_05order by c1,
c2, c3, c4, c5

Edit: OBIEE guru Stijn Gabriels pointed out to me that you also have to set the levels for each dimension column.

(When you start with a "virgin" repository OBIEE will do automaticly, but you should always check them!)

This article was orinally written for the ciber knowledge blog :
http://knowledge.ciber.nl/weblog/?p=147

Till Next Time

7 comments:

Stijn said...

Hi John,
I think you need to add two things to this blog item.
1) Describe the need of an hierarchy: The dimensional levels should respond to the hierarchy you created for this dimension.
2) Describe how to set the aggregation content: On the content tab of each logical table source you must add on what level the data is aggregated.
Regards,
Stijn

John Minkjan said...

Hi Stijn,

Thanks for the input. On point 1 I edtied the text accordently. Point 2 is valid for fact tables, you don't have to this for dimension tables. I will discuss splitting fact tables in "aggregate aware" tables in a future article.

regards

John

Pete Scott said...

Hi John
I know you have not really covered the fact table yet, but there can be difficulties with a 'wide' fact table that includes the level keys for dimensions.
Firstly, the rows are longer, so fewer rows per fact table data block which can impact performance at the database. If children move between parents (a far too common thing) - the fact table needs to be rebuilt to reflect the changes. If the database supports partitioning then there is the risk that partition elimination may not work if the queries generated use the level 1 key and the data is partitioned at (say) level 4, that we need to look in all the partitions to find the matches at level 1 even if they are only in one partition - true indexes may help, but again we need to scan the index for the whole fact table and not just a smaller part of it

John Minkjan said...

Hi Pete,

You are absolutly right! All I tried todo in this article is to show how to split a big dimension. In pratice you wouldn't split this in 5 seperate tables but maybe two. Connecting to "aggregate aware" fact tables I will cover in a future article.

regards

John

Andriy Yakushyn said...

Great post John! And congrats on your achievement in business intelligence forum on Oracle's site! Congratulations!

Stijn said...

Hi John,
According to Oracle OBIEE training material, you do need to set the aggregation content for each logical table source (lts), also for dimension tables.
Especially when you combine aggregated lts with fragmented lts. If you won't do it, you can get incorrect results. It is even stated that it is a good practice to do this, even on the lts with the lowest level of detail, for documentation purposes: for other developers it will be clear on what level the data in each lts is stored.
Regards,
Stijn

John Minkjan said...

Hi Stijn,

Your right, an oversight from me.... I didn't do any setting and it worked, seems that I was lucky with the automatic setting of the repository. I will edit the article.

regards John