Sunday, July 27, 2008

OBIEE cache management.

____________________________

2011/08/26: Update

-- This article is based on OBIEE10g technology

-- In the original text from 2008/07/27 there was a typo in the example:

MAX_CACHE_ENTRIES = 80000 should read MAX_CACHE_ENTRIES = 800

Some of you copied the example without reading the text. This can lead to a server crash a described in tech note 1340621.1. Especially on a clustered environment.

“It appears that the "purge process" was not able to purge excessive number of cache entries. OBI Server crashes when you attempt to do so.”

It will also generated log entry like:

[47041] NQSConfig.INI warning: The value of MAX_CACHE_ENTRIES is currently 10000 and exceeds the recommended limit of 1200.

Please check any numbers against common sense and the rule of thumbs described in this article. 

Till Next Time

Thnx to Stijn Gabriels for noticing and reporting this error on the blog.

____________________________

Some remarks upfront:


  • OBIEE is NOT a database. It’s a tool to plough trough multidimensional data out of a data warehouse or datamart.
  • OBIEE is NOT an extraction tool. It’s not designed to extract 100k+ rows to a CSV or XLS file.
  • OBIEE reports returning more then 1000 rows are often use as source for other tools (like Excel) and don’t contain any information for the user without having him/her plough trough the data manually. (consider using BI-publisher instead)
  • OBIEE caching is NOT in memory! It’s a file based system.

How does the OBIEE cache work?

The program first hashes the request string, looks in the cache directory is there is already a valid file present, if not it will execute the request against the database.

If you have a close look at your cache directory you will see that there it’s collection of “.TBL” files. If you open the file in an editor you will recognize things like, user, repository, execute request and of course the resulting data.

Cache entries will become "personal" if you have row level security inside OBIEE. One other reason for a cache entry to become personal is when you have a VPD (Virtual Private Database) in place. A good posting on this subject can be found here: http://obieeblog.wordpress.com/2008/12/29/obiee-and-virtual-private-database-vpd/

Since all the cache is written to disk, the cache directory needs to be on the quickest drive available preferable with its own disk controller.

Before we go in to detail over the possible cache settings we first have to ask ourselves what the valid reasons to use the OBIEE cache are.


  • The user executes the same request over and over within the “valid” period of the cache.
  • The result of the request is the rollup of a large amount of underlying data. But if the rollup level is greater then 1 to 100 you really should go back to your data warehouse design.

Caching a report because “it takes to long” is usually a sign that your data warehouse or data mart doesn’t meet the user requirements.

The parameters for the cache management can be found in NQConfig.ini.

ENABLE
To enable the cache set the ENABLE parameter to YES.

DATA_STORAGE_PATHS
This parameters specifies one or more directory paths for where the cached query results data is stored and are accessed when a cache hit occurs. The maximum capacity in bytes, kilobytes, megabytes or gigabytes. The maximum capacity for each path is 4 GB. For optimal performance, the directories specified should be on high performance storage systems.

Each directory listed needs to be an existing, fully-qualified, writable directory pathname, with double quotes ( " ) surrounding the pathname. Specify mapped directories only.
UNC path names (“\\server.name.edu\somefolder“) and network mapped drives are allowed only if the service runs under a qualified user account. To change the account under which the service is running, see the corresponding topic in the Oracle Business Intelligence Enterprise Edition Deployment Guide.

Specify multiple directories with a comma separated list.

When you specify more than one directory, they should reside on different physical drives. (If you have multiple cache directory paths that all resolve to the same physical disk, both available and used space may be double-counted.)

Example: DATA_STORAGE_PATHS = "d:\OracleBI\cache" 256MB, “f:\OracleBI\cache" 256MB ;

OBIEE fills uses these directories alternating, request 1 goes to first directory, request 2 goes to second directory, request 3 goes to first directory etc.
some notes:


  1. Don’t make a directory larger than 4 GB, this caused by the fact that some of the OBIEE core products are VC++ based which uses an unsigned 32 bit integer for the memory allocation.
  2. Enable usage tracking to see which reports hit the cache or eat your resources.
  3. Working with an organization of 8000 users who run an average of 10 personal and or shared reports during the cache valid period (average cache entry size 10 KB), I have never seen the total allocated cache space become more then 1 GB during a 6 months monitoring period. If you really need a cache allocation of more then 2GB there is probably something seriously wrong with your DWH or DM.
  4. Invest in one or more small high-speed disks with low access times and large cache memories. Give the drive(s) its/their own controller. If your are on a SAN ask your SAN system manager to give you a high priority access drive, so you don’t have to “wait”. You might want to consider creating an "in memory" drive.
  5. An Oracle Business Intelligence Server defined as a clustered server does not share cached data. The DATA_STORAGE_PATHS entry needs be unique for each server defined as a cluster participant.
  6. Specifying more than one directory per drive does not improve performance, because file input and output (I/O) takes place through the same I/O controller. In general, specify only one directory per disk drive. Specifying multiple directories on different drives may improve the overall I/O throughput of the OBIEE Server internally by distributing I/O across multiple devices.
  7. Seeding the OBIEE cache with large tables (more then 1MB) as source for relative small requests ( less then 50kb) force OBIEE to do a full file read for every request. Most modern databases have large shared pools and can do much quicker this read from memory.
  8. Most datacentres have dedicated VLAN's for communication between databases and OBIEE. Most of these connections are much quicker then reading from disk.

MAX_ROWS_PER_CACHE_ENTRY
Specifies the maximum number of rows in a query result set to qualify for storage in the query cache. Limiting the number of rows is a useful way to avoid using up the cache space with runaway queries that return large numbers of rows. If the number of rows a query returns is greater than the value specified in the MAX_ROWS_PER_CACHE_ENTRY parameter, the query is not cached. When set to 0, there is no limit to the number of rows per cache entry.
Default: MAX_ROWS_PER_CACHE_ENTRY = 100000 ;
Setting this any higher makes OBIEE an ETL tool.

MAX_CACHE_ENTRY_SIZE
Specifies the maximum size for a cache entry. Potential entries that exceed this size are not cached. The default size is 1 MB.
Specify GB for gigabytes, KB for kilobytes, MB for megabytes, and no units for bytes.
Example: MAX_CACHE_ENTRY_SIZE = 1 MB ; Altering this value should be done in conjunction with the setting of MAX_ROWS_PER_CACHE_ENTRY.

MAX_CACHE_ENTRIES
Specifies the maximum number of cache entries allowed in the query cache. Limiting the total number of cache entries provides another parameter with which to manage your cache storage. The actual limit of cache entries might vary slightly depending on the number of concurrent queries. The default value is 1000, maximum recommended by oracle = 1200 (higher is allowed, but don’t go overboard). Example: MAX_CACHE_ENTRIES = 1000 ;

Rule of thumb: Average number of ‘UNIQUE’ (personal and shared) reports run per “cache period” * Average ‘drill’ depth * (number of VPD users )

POPULATE_AGGREGATE_ROLLUP_HITS
Specifies whether to aggregate data from an earlier cached query result set and create a new entry in the query cache for rollup cache hits. The default value is NO.
Typically, if a query gets a cache hit from a previously executed query, then the new query is not added to the cache. A user may have a cached result set containing information at a particular level of detail (for example, sales revenue by ZIP Code). A second query may ask for this same information, but at a higher level of detail (for example, sales revenue by state). The POPULATE_AGGREGATE_ROLLUP_HITS parameter overrides this default when the cache hit occurs by rolling up an aggregate from a previously executed query, in this example, by aggregating data from the first result set stored in the cache. That is, Oracle BI sales revenue for all ZIP Codes in a particular state can be added to obtain the sales revenue by state. This is referred to as a rollup cache hit.
Normally, a new cache entry is not created for queries that result in cache hits. You can override this behaviour specifically for cache rollup hits by setting POPULATE_AGGREGATE_ROLLUP_HITS to YES.
Non rollup cache hits are not affected by this flag. If a query result is satisfied by the cache—that is, the query gets a cache hit—then this query is not added to the cache. When this parameter is set to YES, then when a query gets an aggregate rollup hit (for example, "sales by region" is answered from "sales by district, region") then the result is put into the cache. Setting this parameter to TRUE may result in better performance, but results in more entries being added to the cache.
Example: POPULATE_AGGREGATE_ROLLUP_HITS = YES ;

USE_ADVANCED_HIT_DETECTION
When caching is enabled, each query is evaluated to determine whether it qualifies for a cache hit. A cache hit means that the server was able to use cache to answer the query and did not go to the database at all. The Oracle BI Server can use query cache to answer queries at the same or later level of aggregation (Data Mart Automation).
The parameter USE_ADVANCED_HIT_DETECTION enables an expanded search of the cache for hits. The expanded search has a performance impact, which is not easily quantified because of variable customer requirements. Customers that rely heavily on query caching and are experiencing misses might want to test the trade-off between better query matching and overall performance for high user loads. Example:USE_ADVANCED_HIT_DETECTION = YES;

Reasons Why a Query is Not Added to the Cache:


  • Non-cacheable SQL element. If a SQL request contains Current_Timestamp, Current_Time, Rand, Populate, or a parameter marker then it is not added to the cache.
  • Non-cacheable table. Physical tables in the Oracle BI Server repository can be marked 'non cacheable'. If a query references any non-cacheable table then the query results will not be added to the cache.
  • Cache hit. In general, if the query gets a cache hit on a previously cached query, then the results of the current query are not added to the cache. The exception is query hits that are aggregate roll-up hits.
  • Result set is too big.
    Query is cancelled. This can happen by explicit cancellation from Oracle BI Presentation Services or the Administration Tool, or implicitly through timeout.
  • Oracle BI Server is clustered. Queries that fall into the ‘cache seeding’ family are propagated throughout the cluster. Other queries continue to be stored locally. Therefore, even though a query may be put into the cache on Oracle BI Server node 1, it may not be on Oracle BI Server node 2.

MAX_SUBEXPR_SEARCH_DEPTH
The parameter MAX_SUBEXPR_SEARCH_DEPTH allows you to configure how deep the hit detector looks for an inexact match in an expression of a query. The default is 5. At the default level (5), for example, a query on the expression sin(cos(tan(abs(round(trunc(profit)))))) misses on Profit, which is at level 7. Changing the search depth to 7 opens up Profit for a potential hit. Advice set it to 99.

The [CACHE] part from your NQConfig.ini file could now look like this:

#####################################################
#
# Query Result Cache Section
#
#####################################################

[ CACHE ]

ENABLE = YES;
DATA_STORAGE_PATHS = "d:\OracleBI\cache" 1GB, “f:\OracleBI\cache" 1GB ;
MAX_ROWS_PER_CACHE_ENTRY = 10000 ;
MAX_CACHE_ENTRY_SIZE = 1 MB ;
MAX_CACHE_ENTRIES = 1200 ; 

POPULATE_AGGREGATE_ROLLUP_HITS = YES;
USE_ADVANCED_HIT_DETECTION = YES;
MAX_SUBEXPR_SEARCH_DEPTH = 99;

If your OBIEE server is in a clustered environment use the following parameters for the “shared” part:

GLOBAL_CACHE_STORAGE_PATH
The physical location for storing cache entries shared across clustering. All clustering nodes share the same location.
Example: GLOBAL_CACHE_STORAGE_PATH = "" SIZE;

MAX_GLOBAL_CACHE_ENTRIES
The maximum number of cache entries stored in the location specified by
GLOBAL_CACHE_STORAGE_PATH.
Example: MAX_GLOBAL_CACHE_ENTRIES = 1000;

CACHE_POLL_SECONDS
The interval in seconds that each node pulls from the shared location specified in
GLOBAL_CACHE_STORAGE_PATH.
Example: CACHE_POLL_SECONDS = 300;

CLUSTER_AWARE_CACHE_LOGGING
Turns on logging for the cluster caching feature. Used only for troubleshooting. The default is NO.
Example: CLUSTER_AWARE_CACHE_LOGGING = NO;


Till Next Time

21 comments:

Anonymous said...

You said that cache entries are personal, so a different user executing the same request will lead to a new cache entry. Would that mean that if I seed my cache I have to run a cache seeding request once for each possible user of my data?

John Minkjan said...

Hi John,

That depends on the datasecurity model you are using. With user / rowlevel security you will get a need a seperate seed for every user. If it your security is on "group level" you need a seed for each group. If all users are allowed to see all data, one entry is enough.

regards

John

Anonymous said...

We have row level security where a Logical Table Source restricts data selected based on a list of account numbers from a Session Variable. The USER variable is not used in the LTS "where" clause. Wouldn't any user's query selecting against an account number that is in the other user's cache get a cache hit?
(note that the init block which populates the account# session variable does refer to the USER variable in its selection of accounts from a security table).
Thanks in advance,
John

John Minkjan said...

Hi John,

Yeah, that should work. The thing is that OBIEE is kind picky when a a session variable is in the query. Why don't you seed the cache with a general query and then run a couple of querys for different users. After that you check the NQquery log if you it picked up the cache or not. (Set the logging level for the users temporary to 2 or higher)

Anonymous said...

I've just opened an SR with Oracle because I'm trying to run requests through iBots to seed the cache but have found that our session variables from init blocks are not being populated. As we've employed these session variables in our LTS it is preventing the requests from running via iBots and/or Job Manager.
Great blog, very helpful to many of us.

Anonymous said...

Hi,

Can u explain what are the Qualified misses and unqualified misses in the cache entries...

If Not Let me try...

John Minkjan said...

Hi Raghev,

Please share your toughts on the subject.

regards John

Anonymous said...

I'm baffled why I don't get a cache hit when I run a request from Answers using the exact same criteria as was used to seed the cache. The cache was created by a different user via an iBot answers request. We are not using row level security in this case, although everyone has their own login, authenticated via our own home grown portal security. Why is cacheing so often discussed as a nice feature if you can't hit against anyone elses cache? We have too many users to possibly run cache seeding requests for all of them. Am I missing something here?

John Minkjan said...

Hi John,

First of all I'm not a fan of using the OBIEE cache. It's usually a sign of the DWH not complying with user needs. I see the OBIEE cache more as a speed up if the same user / group is a asking the (partly) same question within the same ETL window. From my experiences there is often more gain in tunning the DWH to the users need.

regards

John

Turribeach.com.ar said...

I don't think this is correct: "The cache entries are personal, so a different user executing the same request will lead to a new cache entry"

I can see why the query will be different if you are using row level secrurity. But if the query is the same them OBIEE uses the cache for a different user. I have tested and I am quite sure about it.

John Minkjan said...

Hi Christian,

You are totally right, that sentence went wrong in translation. I have added the correct syntax.

regards John

Anonymous said...

John,
Thanks for explaining the workings of OBIEE cache in such great detail. Could u please clarify the following -
I was running some tests on my local system and I observed that a user’s request only will only get a cache hit on that same user’s cache.

We are not using VPD. As per the Admin guide -
“Initializing Cache Entries for User Ids
To initialize cache entries for user Ids, the Connection Pool needs to be set up for shared login ..If the shared login is disabled and a user specific database login is used, cache will be shared.”

Does the above line from the Admin guide explain the behavior I see or am I missing something ?
I as this because, based on your article, and some other postings, it seems that if no row level security (VPD) is implemented, the cache should be shared among users. But that is not happening and I would like to understand why.

Thanks

John Minkjan said...

Hi Msp,

I'll get back to you ASAP. I will set up a test enverioment for this behauvior first.

regards

John

Ritesh said...

Hi John,

Did you get chance to test / validate MSP's comment?

Can you please share your finding?

John Minkjan said...

@Rits,

Sadly no.... I have a great lack of time at the moment. And to investigate this would take me a couple of hour to set up.

regards

Anonymous said...

Hi I am new to OBIEE and lookging for detailed document about how the request get processedin OBIEE

Aniruddha said...

Good arcticle. One quick question.

If a report is a pre-scheduled report with number of drill-through capabilities how does the caching work. Does the report and the drill through paths get cached.

Gaetano Lisco said...

Hi John,
first of all thank you for your very useful guides and suggestions. If I set the size of cache to 2 GB, BI Server does not cache anything because it allocates 0 MB to the size! But if I set the size to 2000 MB (instead of 2 GB) it regularly works and cache management "sees" 2000 MB of available space. This happens on Windows. Do you have feedback similar to this one? Thank you!

Chang said...

We are thinking of using "in memory" drive to improve the cache performance. However, this is a new area, and could someone suggest good software/hardware for "in memory" drive? Thanks.

David Andersen said...

Here's something I've not read anywhere. OBIEE will fail to give you a cache hit if your report has fewer columns than the # of columns in the caching report. In other words, if the report has Columns A and B and you have a cache entry with Columns A, B, and C this will not result in a hit. This is contrary to the doco and contrary to intution. I've got nothing else special going on in my testing, running OBIEE 10.1.3.4.1 against an Oracle database. IMO this almost makes caching useless and I can't believe I haven't read this anywhere else (have not search the Oracle SRs yet).

Nachiket said...

Hi John,
I have a question for you.
Before that I would like say my background check.

In my previous project, we each of us from the DEV TEAM had a separate userid & pwd & access to all SAs.
And cache entries were personal.
If I get any hit & i want to extract the physical sql , I used to login with another UserID ( Act As ) & fire the same request & could get the physical sql.


Now in my personal laptop , I have installed OBIEE 10g. I seeded a cache for 1 small report using this command-
NQCMD -d AnalyticsWeb -u Naresh -p Naresh -s C:\SeedCache.txt

Now when I login using another user say Suresh, I can see the cache hit in the NQQuery.log.

why the cache is getting shared ?