Archive for the 'Business Intelligence' Category

Sep 23 2008

Staging area: Necessary or overhead?

Published by biexplorer under ETL

In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.

Ok, what is a stage area?
It is that part of a data warehouse where data is stored physically on disks or files (or whatever)… but as an intermediate step before loading the data warehouse / data marts. It is where activities like cleansing, de-duplication, etc take place. It is like a pit stop for a racing car before reaching the destination.

Some characteristics of the staging area are

  1. accessible to and owned by ETL / DW team
  2. OLAP / reporting teams do not have access to it
  3. indexed very little
  4. ETL developers are usually free to create / drop tables, controlled though (by the architect or modeling team)

Types of staging areas:

  1. Persistent staging - stage data is not deleted, if you want to maintain history.
  2. Transient staging - stage data is deleted after each ETL load

Most data warehouses have one or more staging areas, the types being either persistent or transient or both.

But should you really have a stage area? Can’t you do without it? After all these days, ETL tools are more capable of handling more data in memory fully.

Is staging necessary or is it an overhead?

Continue Reading »

No responses yet

Sep 23 2008

ETL effort estimation: Points to factor-in

Published by biexplorer under ETL, Uncategorized

Estimation of ETL effort is not always fun. (as with any estimation).

There are several ways of estimating the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).

But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.

Though I haven’t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.

So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.

Source based:

  1. No of different sources & types
  2. Incremental extraction needs
  3. Profiling of data sources
  4. Cleansing / de-duplication dirty data sources
  5. Availability of documentation / transition of knowledge of source data
  6. Access control & management, if needed
  7. Data volumes for unit testing

Continue Reading »

No responses yet

Aug 19 2008

BOBJ customers angry after migration of support to SAP system

Published by biexplorer under BI News, BI vendors

Please read this article.

Recently, SAP - BOBJ migrated its BOBJ customer support platform to SAP. The result was a disaster. Several customers didn?t have access to the support site.

Read more on this at the link given above.

Also have a look at this link at BOBJ support site where users are screaming high and dry.

No responses yet

Jul 12 2008

BODI / BODS do not support decfloat datatype (in DB2)

Recently I was working with a few DB2 UDB 9.5 tables and discovered that decfloat datatype columns were not recognized even if I had checked the option to recognize unsupported datatypes as varchar.

I mailed someone who interacts closely with the product group in SAP - Business Objects.

Continue Reading »

No responses yet

Jun 10 2008

Get BODI job schedule info from repository

Published by biexplorer under BODI

Here is a query that you can run on the BODI repository to fetch the job schedule details.

Select upper (al_lang.NAME) as jobname,
upper (al_sched_info.sched_name) as schedule_name,
al_sched_info.start_time as start_time,
al_sched_info.host_name host_server
from di_edw.al_lang al_lang full outer join di_edw.al_sched_info al_sched_info on al_lang.guid = al_sched_info.job_guid
where active = 1
and al_lang.object_type = 0
and TYPE = 0
and al_lang.object_key =
(SELECT MAX (object_key)
FROM di_edw.al_lang l
WHERE l.NAME = al_lang.NAME AND l.object_type = 0 AND l.TYPE = 0)
ORDER BY 1, 2

No responses yet

Apr 30 2008

A nice article on DW worst practices

Published by biexplorer under Best Practises

A nice article on few DW worst practices.

Top Ways How Not To Scale Your Data Warehouse

I got this one from Tom Kyte’s blog. Thanks Tom.

No responses yet

Apr 30 2008

Handling many to many relationships (corrections made)

(There were issues with the design in my original post. Those have been corrected. I have let the errors stay and have highlighted where I have made corrections so that you can understand better)

Consider this situation.

2 new guys Ravi and Raj go to our bank and open new savings accounts 101 and 102. And Ravi deposits 100 and Raj deposits 144. This can be represented by the following.

CLICK ON THE IMAGES BELOW TO SEE A LARGER PICTURE

Now suddenly, Ramu and Ramya come to the bank and open a joint account 103. And they deposit 1000. How will you represent this? What will you enter for CUST_KEY ? Will you enter 3 or 4? It was the same transaction of Rs 1000. But which customer to indicate? What will you fill in the red ? position below?

Continue Reading »

No responses yet

Apr 26 2008

Migration of a DW solution from Oracle to DB2

Published by biexplorer under BODI, DB2, Migration

Recently we migrated an entire DW module from Oracle to DB2. And our ETL was done using Business Objects Data Integrator.

Below is a sequence of steps that we followed to migrate the entire solution (including data structures, ETL code and universe)

Assumption:
–>All estimates done for 4 DB2 databases, 3 federated systems, 62 tables, 33 indexes, 6 DI jobs, 110 data flows, 151 transformations, 16 SQL transforms, 34 SQL() scripts, 220 objects in 23 classes and 2 reports.
–>Add 20% time as buffer for unexpected issues.
–>Some tasks can be executed in parallel, while some have strong dependency on completion of previous tasks.

Work breakdown of migration and rough estimation:

1. Initiation & planning : 40 Hrs
Establish objectives and goals
Requirements gathering
Establish scope
Plan resources
Identify development environment

2. Assessment : 40 Hrs
Assess tool and technology needs
Assess and understand technical requirements
Establish technical guidelines
Finalize tools and development environment

Continue Reading »

No responses yet

Mar 18 2008

BO Data Services XI 3.0

BO Data Services is a new offering from Business Objects.

Well, it looks like a new bundling and not a new product by itself.

BO Data quality + BO Data Integrator = BO Data Services.

I tried to gather some information regarding this. Here is what I have found so far.

Continue Reading »

No responses yet

Mar 03 2008

Times.. they are changing.

Looks like traditional data warehousing will soon be taken over by a sea of changes.

(Click on underlined words to open the links)

Integration of OLTP with DW? Read about it here.

Here is a discussion on new approaches in DW.

If you want to know Werner’s opinion on the future of Business Objects Data Integrator (BODI), see here.

Appears like BO Data quality + BO Data Integrator = BO Data Services. A discussion on it here.

And is Informatica the next acquisition target? Some say maybe… some say no.

No responses yet

Next »