<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title></title>
	<atom:link href="http://biexplorer.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://biexplorer.com</link>
	<description></description>
	<pubDate>Wed, 24 Sep 2008 06:37:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Contemplating a move to wordpress domain</title>
		<link>http://biexplorer.com/contemplating-a-move-to-wordpress-domain/</link>
		<comments>http://biexplorer.com/contemplating-a-move-to-wordpress-domain/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 06:37:20 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Other]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=104</guid>
		<description><![CDATA[I am thinking of moving back to biexplorer.wordpress.com and giving up biexplorer.com.
This is because I am nearing renewal soon and am wondering if it makes sense to have your own domain or not.
Will keep you updated.
But please bookmark http://biexplorer.wordpress.com too.
]]></description>
			<content:encoded><![CDATA[<p>I am thinking of moving back to biexplorer.wordpress.com and giving up biexplorer.com.</p>
<p>This is because I am nearing renewal soon and am wondering if it makes sense to have your own domain or not.</p>
<p>Will keep you updated.</p>
<p>But please bookmark <span style="text-decoration: underline;"><a href="http://biexplorer.wordpress.com" target="_blank">http://biexplorer.wordpress.com</a></span> too.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/contemplating-a-move-to-wordpress-domain/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Staging area: Necessary or overhead?</title>
		<link>http://biexplorer.com/stage-area/</link>
		<comments>http://biexplorer.com/stage-area/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 17:40:51 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=103</guid>
		<description><![CDATA[In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.
Ok, what is a stage area?
It is that part of a data warehouse where data is stored physically on disks or files (or whatever)&#8230; but as an intermediate step before loading the data [...]]]></description>
			<content:encoded><![CDATA[<p>In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.</p>
<p><strong>Ok, what is a stage area?<br />
</strong>It is that part of a data warehouse where data is stored physically on disks or files (or whatever)&#8230; but as an intermediate step before loading the data warehouse / data marts. It is where activities like cleansing, de-duplication, etc take place. It is like a pit stop for a racing car before reaching the destination.</p>
<p>Some characteristics of the staging area are</p>
<ol>
<li>accessible to and owned by ETL / DW team</li>
<li>OLAP / reporting teams do not have access to it</li>
<li>indexed very little</li>
<li>ETL developers are usually free to create / drop tables, controlled though (by the architect or modeling team)</li>
</ol>
<p><strong>Types of staging areas:</strong></p>
<ol>
<li><strong>Persistent </strong>staging - stage data is <strong>not</strong> deleted, if you want to maintain history.</li>
<li><strong>Transient </strong>staging - stage data is deleted after each ETL load</li>
</ol>
<p>Most data warehouses have one or more staging areas, the types being either persistent or transient or both.</p>
<p>But should you really have a stage area? Can&#8217;t you do without it? After all these days, ETL tools are more capable of handling more data in memory fully.</p>
<p>Is staging necessary or is it an overhead?</p>
<p><span id="more-103"></span></p>
<p>I think it is very critical to have a staging area, especially if you are moving a lot of data from multiple sources. Here are a few reasons why :</p>
<ol>
<li><strong>Intermediate processing:</strong> Often you need to perform transformations, cleansing and other processing on huge chunks of data from multiple sources. It is not feasible to do all of this in memory purely (extraction from source, processing and direct load to target). We need intermediate steps where data can be stored, the ETL tool can take a breather and start again. This also helps in several other ways discussed below.</li>
<li><strong>Auditing</strong>:<strong> </strong>Having a stage area provides a means of knowing the condition of data at certain points during the ETL cycle. If your ETL job fails mid-way, you can look at the stage tables and figure out the condition of data as it passes through the various steps. Thus it provides for easy auditing.</li>
<li><strong>Recovery:</strong> Suppose if your ETL job fails midway, you would need to fetch the data once again from the source and repeat the entire process all over again, if you do not maintain a stage area. Having stage areas could mean that the job can be restarted from the point of failure.</li>
<li><strong>Backup: </strong>If your source data gets overwritten, the original data is lost.. probably forever. And there is no way in which you can perform a reconciliation with the DW data against the source. But if you had a persistent stage, your data would be backed up in these intermittent databases.</li>
<li><strong>Cleansing, synchronizing, de-duping data</strong>, etc: If you have multiple sources, stage areas are a neat method to synchronize your data, at the same time, to cleanse and conform them. (For eg, if your customer data is from 3 different sources, it makes sense to pull them all into a stage area and have then synched. This is especially true if the different sources are not available at the same time)</li>
<li><strong>Helps reduce contention on busy source (OLTP) systems:</strong> It helps bring down the contention on the source systems. Once data is pulled, you can work on it in the stage area and get it transformed to your needs. If you were to apply these transformations on source data directly without a stage, you would be engaging the connections to the source for much longer. Thsi would impact OLTP systems adversely.</li>
<li><strong>Increases availability of data: </strong>If one of your source systems is temporarily down, you still will have access to the data which is staged.</li>
<li><strong>Joins are easier and faster:</strong> It is much easier and faster if the tables are in the same database schema as against different disparate databases.</li>
</ol>
<p>Source: Several, but notably, Ralph Kimball, The data warehouse ETL toolkit.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/stage-area/feed/</wfw:commentRss>
		</item>
		<item>
		<title>ETL effort estimation: Points to factor-in</title>
		<link>http://biexplorer.com/etl-effort-estimation-points-to-factor-in/</link>
		<comments>http://biexplorer.com/etl-effort-estimation-points-to-factor-in/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 09:37:44 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[ETL]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=102</guid>
		<description><![CDATA[Estimation of ETL effort is not always fun. (as with any estimation).
There are several ways of estimating the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).
But the most widely used is the one that factors in complexity based on the understanding of things [...]]]></description>
			<content:encoded><![CDATA[<p>Estimation of ETL effort is not always fun. (as with any estimation).</p>
<p>There are several ways of estimating the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).</p>
<p>But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.</p>
<p>Though I haven&#8217;t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.</p>
<p>So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.</p>
<p><strong>Source based:</strong></p>
<ol>
<li>No of different sources &amp; types</li>
<li>Incremental extraction needs</li>
<li>Profiling of data sources</li>
<li>Cleansing / de-duplication dirty data sources</li>
<li>Availability of documentation / transition of knowledge of source data</li>
<li>Access control &amp; management, if needed</li>
<li>Data volumes for unit testing</li>
</ol>
<p><span id="more-102"></span></p>
<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
Stage / Target based:</strong></p>
<ol>
<li>Proper database design available (primary keys defined, right columns indexed, etc)</li>
<li>Familiarity of the team with the various source, staging and target systems</li>
<li>No of different target types</li>
<li>Truncate / append / merge options</li>
<li>Fact / dimension / stage load</li>
</ol>
<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
</strong><strong>Transformation / features / mechanisms based:</strong></p>
<ol>
<li>No of transformations and their complexity</li>
<li>Usage of reusability feature</li>
<li>Usage of features like CDC, SCD, etc</li>
<li>Need for error / failure handling and recovery mechanisms</li>
<li>Auditing and validation needs of ETL</li>
<li>Complex functions / calculations for measures</li>
<li>Parallelism / DOP / etc</li>
</ol>
<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
</strong><strong>Resource based :</strong></p>
<ol>
<li>Which is the tool used? What is the skill level of the resources (fresh, medium skilled, expert)?</li>
<li>Availability of the resources (part time, full time, etc)</li>
</ol>
<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
</strong><strong>Other :</strong></p>
<ol>
<li>Are there any people, task or decision dependencies?</li>
<li>Unit / system &amp; integration testing needs</li>
<li>Migration / deployment needs</li>
<li>Performance and tuning needs</li>
<li>Project management / status reporting needs</li>
<li>How clear is the requirement? Is scope properly defined?</li>
<li>How well is the project being managed?</li>
<li>Have you factored in time for rework?</li>
</ol>
<p><strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</strong></p>
<p><strong>Few interesting links on this topic :</strong><br />
1. Vincent McBurney&#8217;s article on <span style="text-decoration: underline;"><a href="http://it.toolbox.com/blogs/infosphere/my-wiki-wiki-ways-estimating-etl-development-time-10940" target="_blank">estimating ETL development time</a></span><br />
2. <a href="http://geekswithblogs.net/Prabhats/archive/2007/03/01/107632.aspx" target="_blank"><span style="text-decoration: underline;">FPA</span></a><br />
3. <a href="http://www.ewsolutions.com/resource-center/rwds_folder/rwds-archives/issue.2008-03-01.6090544414/document.2008-03-01.9435972766/view?searchterm=Estimation%20Methodology%20for%20Data%20Warehousing%20Projects" target="_blank"><span style="text-decoration: underline;">FPA based estimation for a Data warehouse</span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/etl-effort-estimation-points-to-factor-in/feed/</wfw:commentRss>
		</item>
		<item>
		<title>BOBJ customers angry after migration of support to SAP system</title>
		<link>http://biexplorer.com/bobj-customers-angry-after-migration-of-support-to-sap-system/</link>
		<comments>http://biexplorer.com/bobj-customers-angry-after-migration-of-support-to-sap-system/#comments</comments>
		<pubDate>Tue, 19 Aug 2008 09:03:12 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[BI News]]></category>

		<category><![CDATA[BI vendors]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=100</guid>
		<description><![CDATA[

Please read this article.
Recently, SAP - BOBJ migrated its BOBJ customer support platform to SAP. The result was a disaster. Several customers didn?t have access to the support site.
Read more on this at the link given above.
Also have a look at this link at BOBJ support site where users are screaming high and dry.


]]></description>
			<content:encoded><![CDATA[<div class="entry">
<div class="snap_preview">
<p>Please read <span style="text-decoration: underline;"><a href="http://www.pcworld.com/businesscenter/article/148509/business_objects_users_angry_over_sap_support_transition.html" target="_blank">this article</a></span>.</p>
<p>Recently, SAP - BOBJ migrated its BOBJ customer support platform to SAP. The result was a disaster. Several customers didn?t have access to the support site.</p>
<p>Read more on this at the link given above.</p>
<p>Also have a look at <span style="text-decoration: underline;"><a href="http://www.forumtopics.com/busobj/viewtopic.php?t=112015&amp;start=225&amp;postdays=0&amp;postorder=asc" target="_blank">this link at BOBJ support site</a></span> where users are screaming high and dry.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/bobj-customers-angry-after-migration-of-support-to-sap-system/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Is open source database a viable solution?</title>
		<link>http://biexplorer.com/is-open-source-database-a-viable-solution/</link>
		<comments>http://biexplorer.com/is-open-source-database-a-viable-solution/#comments</comments>
		<pubDate>Fri, 08 Aug 2008 20:16:08 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Database]]></category>

		<category><![CDATA[Interesting stuff]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=99</guid>
		<description><![CDATA[Please read the article at this link.
What do you think? Is open source database a viable solution?
The link says (as quoted by Forrester) that the market share for

Open source database is at $850 million
Commercial databases is $ 16 billion

Key take aways from this article:

Many vendors are now open to the idea of a open source [...]]]></description>
			<content:encoded><![CDATA[<p>Please read the article at <a href="http://searchdatamanagement.techtarget.com/news/article/0,289142,sid91_gci1323310,00.html" target="_blank"><span style="text-decoration: underline;">this link</span></a>.</p>
<p>What do you think? Is open source database a viable solution?</p>
<p>The link says (as quoted by Forrester) that the market share for</p>
<ul>
<li>Open source database is at $850 million</li>
<li>Commercial databases is $ 16 billion</li>
</ul>
<p><span id="more-99"></span>Key take aways from this article:</p>
<ol>
<li>Many vendors are now open to the idea of a open source database. I personally feel small and medium sized enterprises are the ones who would be keen on it.</li>
<li>Forrester says the open source databases available today are capable of handling 80% of applications today. That to me is a surprisingly high number.</li>
</ol>
<p>In my opinion, this is an interesting area for the future. Today we have a few reliable open source databases like</p>
<ol>
<li>Postgre SQL</li>
<li>MySQL</li>
</ol>
<p>But open source has its own advantages and disadvantages. Below are a few that I can think of.</p>
<p><strong>Advantages:</strong></p>
<ol>
<li>Cheap / free to use and deploy.</li>
<li>Is almost capable of handling all small enterprise applications.</li>
</ol>
<p><strong>Disadvantages:</strong></p>
<ol>
<li>Very difficult to migrate from a commercial database to open source as of today. Can change soon.</li>
<li>Though database itself is cheap, sometimes the cost of maintaining it is high. And finding resources capable of working on an open source database is a challenge.</li>
<li>Post implementation support can be a challenge. But it is improving.</li>
<li>Lack of proper case studies and published implementations make enterprises skeptical of open source.</li>
<li>Awareness and confidence is poor.</li>
</ol>
<p>As a parting thought, the article also rightly says that companies should try the open source way for small to medium sized applications that are NON-MISSION-CRITICAL and evaluate the database first. If found satisfactory, they should migrate to an open source database for their other applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/is-open-source-database-a-viable-solution/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Displaying &#038; reading execution plans</title>
		<link>http://biexplorer.com/displaying-reading-execution-plans/</link>
		<comments>http://biexplorer.com/displaying-reading-execution-plans/#comments</comments>
		<pubDate>Fri, 18 Jul 2008 18:58:23 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=96</guid>
		<description><![CDATA[Here is a good link on Explain plans and on displaying them.
http://optimizermagic.blogspot.com/2008/02/displaying-and-reading-execution-plans.html
]]></description>
			<content:encoded><![CDATA[<p>Here is a good link on Explain plans and on displaying them.</p>
<p>http://optimizermagic.blogspot.com/2008/02/displaying-and-reading-execution-plans.html</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/displaying-reading-execution-plans/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Min and Max in same query</title>
		<link>http://biexplorer.com/min-and-max-in-same-query/</link>
		<comments>http://biexplorer.com/min-and-max-in-same-query/#comments</comments>
		<pubDate>Sat, 12 Jul 2008 20:20:32 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=95</guid>
		<description><![CDATA[I was reading this today and thought it was interesting.
Karen explains why having min and max in the same query impact performance and how to tackle the same.
]]></description>
			<content:encoded><![CDATA[<p>I was reading <a href="http://karenmorton.blogspot.com/2008/07/minmax-and-index-full-scans.html"><span style="text-decoration: underline;">this</span></a> today and thought it was interesting.</p>
<p>Karen explains why having min and max in the same query impact performance and how to tackle the same.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/min-and-max-in-same-query/feed/</wfw:commentRss>
		</item>
		<item>
		<title>BODI / BODS do not support decfloat datatype (in DB2)</title>
		<link>http://biexplorer.com/bodi-bods-do-not-support-decfloat-datatype-in-db2/</link>
		<comments>http://biexplorer.com/bodi-bods-do-not-support-decfloat-datatype-in-db2/#comments</comments>
		<pubDate>Sat, 12 Jul 2008 06:40:53 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[BODI]]></category>

		<category><![CDATA[Data Integration]]></category>

		<category><![CDATA[Data Quality]]></category>

		<category><![CDATA[Data Services]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=94</guid>
		<description><![CDATA[Recently I was working with a few DB2 UDB 9.5 tables and discovered that decfloat datatype columns were not recognized even if I had checked the option to recognize unsupported datatypes as varchar.
I mailed someone who interacts closely with the product group in SAP - Business Objects.

As of today, BOBJ PG says, DECFLOAT datatypes are [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was working with a few DB2 UDB 9.5 tables and discovered that decfloat datatype columns were not recognized even if I had checked the option to recognize unsupported datatypes as varchar.</p>
<p>I mailed someone who interacts closely with the product group in SAP - Business Objects.</p>
<p><span id="more-94"></span></p>
<p>As of today, BOBJ PG says, DECFLOAT datatypes are not supported by both BO Data Integrator and BO Data Services. They have taken it as a enhancement request. Probably will not be as high a priority to see them in the next release itself.</p>
<p>Until then, avoid DECFLOAT datatypes in DB2.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/bodi-bods-do-not-support-decfloat-datatype-in-db2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Get BODI job schedule info from repository</title>
		<link>http://biexplorer.com/get-bodi-job-schedule-info-from-repository/</link>
		<comments>http://biexplorer.com/get-bodi-job-schedule-info-from-repository/#comments</comments>
		<pubDate>Tue, 10 Jun 2008 12:48:37 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[BODI]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=91</guid>
		<description><![CDATA[Here is a query that you can run on the BODI repository to fetch the job schedule details.
Select upper (al_lang.NAME) as jobname,
upper (al_sched_info.sched_name) as schedule_name,
al_sched_info.start_time as start_time,
al_sched_info.host_name host_server
from di_edw.al_lang al_lang full outer join di_edw.al_sched_info al_sched_info on al_lang.guid =                  [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a query that you can run on the BODI repository to fetch the job schedule details.</p>
<p><span style="color: #800000;">Select upper (al_lang.NAME) as jobname,<br />
upper (al_sched_info.sched_name) as schedule_name,<br />
al_sched_info.start_time as start_time,<br />
al_sched_info.host_name host_server<br />
from di_edw.al_lang al_lang full outer join di_edw.al_sched_info al_sched_info on al_lang.guid =                                                                                        al_sched_info.job_guid<br />
where active = 1<br />
and al_lang.object_type = 0<br />
and TYPE = 0<br />
and al_lang.object_key =<br />
(SELECT MAX (object_key)<br />
FROM di_edw.al_lang l<br />
WHERE l.NAME = al_lang.NAME AND l.object_type = 0 AND l.TYPE = 0)<br />
ORDER BY 1, 2</span></p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/get-bodi-job-schedule-info-from-repository/feed/</wfw:commentRss>
		</item>
		<item>
		<title>DB2 performance tuning</title>
		<link>http://biexplorer.com/db2-performance-tuning/</link>
		<comments>http://biexplorer.com/db2-performance-tuning/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 09:21:20 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[DB2]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=90</guid>
		<description><![CDATA[Here is a good article on tuning DB2 databases.
And here is another that might help you.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ibm.com/developerworks/db2/library/techarticle/dm-0404mcarthur/" target="_blank"><span style="text-decoration: underline;"><strong>Here</strong></span></a> is a good article on tuning DB2 databases.</p>
<p>And <a href="http://www.ibmdatabasemag.com/story/showArticle.jhtml?articleID=15300098" target="_blank"><span style="text-decoration: underline;"><strong>here</strong></span></a> is another that might help you.</p>
]]></content:encoded>
			<wfw:commentRss>http://biexplorer.com/db2-performance-tuning/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
