Archive for July, 2007

Jul 25 2007

Truncate and delete

Published by BI Explorer under Oracle

Truncate:
1. Moves the high water mark down to beginning.
2. Doesnt need commit. It is a DDL command.
3. No condition needed. Removes all rows.
4. Extents are deallocated. (If you specify the reuse storage clause, then the extents are not deallocated)
5. No rollback.

Delete:
1. Doesnt change the high water mark.
2. Needs commit. It’s a DML command.
3. Can specify a condition and remove specific rows.
4. Extents are not deallocated.
5. Changes can be rolled back.

No responses yet

Jul 21 2007

Tom Kyte on SQL tuning

Published by BI Explorer under Best Practises, Oracle

Hi,
I am a huge Tom Kyte fan!

Here is what he has to say about SQL tuning. Great thoughts.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:24189818401006

No responses yet

Jul 17 2007

Why index doesnt access null values?

Published by BI Explorer under Oracle

Oracle indexes do not contain entries that point to rows with null values on all the indexed columns.

Therefore any query that has a column IS NULL or IS NOT NULL will not use the index.

No responses yet

Jul 11 2007

The maximum size of a fact table.

Published by BI Explorer under Dimensional modelling

Well, this is just a workout. It may or may not make much sense!

Here I have detailed how we can calculate the maximum size of a fact table.

Assume our datamart has 5 dimensions: Time_dim, customer_dim, product_dim, product_warehouse_dim, supplier_dim. And our fact table has 5 foreign keys, 15 measures, 2 degenerate columns

Time_dim has 20 years of rows on daily grain: so 365 X 20 = 7300 rows
There are 4000 customers in customer_dim.
200 products.
10 warehouses.
12 suppliers.

So the maximum possible fact table rows is
7300 x 4000 x 200 x 10 x 12 = 7008000000000 (7008000 million or 7008 billion)

Now this is the maximum no of rows possible in the fact table.(Assuming that every customer brought every product from every store supplied by every supplier on every day…..etc… it goes on … permutations and combinations)

Assuming that the 22 columns have an average size of 5 bytes, we need 110 bytes per row.

So now, 110 x 7008 billion = 71794 GB

The maximum possible size for this fact table (assuming that dimension count stays same) is 71794 GB.

The maximum possible increase per year could be 3590 GB.

No responses yet

Jul 06 2007

ETL tools

Published by BI Explorer under ETL

Hi,
Here is a very short list of few of the major ETL vendors.

Ab Initio
Informatica Powercenter
BusinessObjects DataIntegrator (BODI)
Ascential DataStage (now IBM DataStage)
Cognos DecisionStream (now Data manager)
Microsoft SQL Server Integration Services (SSIS)
Pervasive DataJunction
Hummingbird Genio
Oracle Warehouse builder

No responses yet

Jul 04 2007

Why surrogate key?

Hi,
Why should we use surrogate keys in a data warehouse?

Here is why.

1. To separate the DWH from the operational environment
Think of this. You have a dimension loading from transactional tables. All of a sudden, the OLTP people decide to re-use all closed / inactive account id’s. For them, it wont matter a bit. For the DWH team, it will.
The incoming record will have an account id that is already there in DWH. It will overwrite the old record. This is unacceptable.

Consider another case where you have a customer dimension loaded from OLTP tables. And your company purchases another company. And now, the acquired company has similar type of primary key logic for its customers. It clashes with your own primary keys! Oops, how do we integrate them?

Surrogate keys!! Use surrogate keys!! This will shield the DWH from operational changes.

2. Performance
The fact table will have a composite key. If surrogate keys are used, then in the fact table, we will have integers for keys.
In most cases, all we need is only 4 bytes for each surrogate key column. So, if you have 5 dimensions, all we need is 20 bytes!
If you have a 10 million rows, you only need 20 MB space for it.
If you had primary keys instead, you might have need several bytes for each key. Remember, in this case, each byte increases space by 1 MB. So, if you have a billion rows, it will increase size by 1 GB. For every single byte increase!!

Also, the queries will run faster when you join on integers rather than varchar. Using surrogate keys will enable faster querying.

And add to it, the option of being able to partition as per time_key or any other suitable key. Since it is a sequence, you can easily partition it.

3. History preservation
Consider this. Today, in July, I make a purchase. My current customer record will say

Prem 101 Bangalore Male

Next month, I move to Hyderabad. Make a purchase there.

Prem 101 Hyderabad Male

Now, you want to query for all purchases made by customers in Bangalore in July. How can I get it? My record says I am in Hyderabad! The way to handle this is by using SCD’s. So I need to use surrogate keys.

1 Prem 101 Bangalore Male record “Status_close”
2 Prem 101 Hyderabad Male record “Status_current”

This will enable you to preserve history. Note the additional column to indicate the current record.

4. To enable special situations that cant be done by OLTP primary keys
How can you have a OLTP primary key for a ‘not found’ record in a dimension? My fact table has a record that doesnt have a match yet in the dimension table. (Late arriving dimension.. assume) How do I handle it? I need to say it maps to a ‘not found’ record in the dimension table.

Surrogate key usage will enable such option!

No responses yet

Jul 04 2007

A new site for SQL tuning

Published by BI Explorer under Oracle

Hi,
Pls have a look at this site. Contains useful content.

http://people.aapt.net.au/roxsco/tuning/

Prem

One response so far