Oracle MONTHS_BETWEEN functie in TSQL

UDF is slow

Today I discovered that a query that uses an udf to calculate the age of an employee ( see code below) is much slower than when the same TSQL is called directly (without using an UDF). Probably because the query optimiser sees the UDF as a black box and generates a very stupid plan.

Slow Example  (  took more than one hour in our environment when we decided to kill it ). 

Fast example ( < 3 minutes). 

 

troubleshoot locking / blocking in sql server

exec dbo.sp_WhoIsActive

–Filters–Both inclusive and exclusive
–Set either filter to ” to disable
–Valid filter types are: session, program, database, login, and host
–Session is a session ID, and either 0 or ” can be used to indicate “all” sessions
–All other filter types support % or _ as wildcards
@filter = ”,
@filter_type = ‘session’,
@not_filter = ”,
@not_filter_type = ‘session’,

–Retrieve data about the calling session?
@show_own_spid = 0,

–Retrieve data about system sessions?
@show_system_spids = 0,

–Controls how sleeping SPIDs are handled, based on the idea of levels of interest
–0 does not pull any sleeping SPIDs
–1 pulls only those sleeping SPIDs that also have an open transaction
–2 pulls all sleeping SPIDs
@show_sleeping_spids = 0,

–If 1, gets the full stored procedure or running batch, when available
–If 0, gets only the actual statement that is currently running in the batch or procedure
@get_full_inner_text = 0,

–Get associated query plans for running tasks, if available
–If @get_plans = 1, gets the plan based on the request’s statement offset
–If @get_plans = 2, gets the entire plan based on the request’s plan_handle
@get_plans = 1,

–Get the associated outer ad hoc query or stored procedure call, if available
@get_outer_command = 1,

–Enables pulling transaction log write info and transaction duration
@get_transaction_info = 1,

–Get information on active tasks, based on three interest levels
–Level 0 does not pull any task-related information
–Level 1 is a lightweight mode that pulls the top non-CXPACKET wait, giving preference to blockers
–Level 2 pulls all available task-based metrics, including:
–number of active tasks, current wait stats, physical I/O, context switches, and blocker information
@get_task_info = 1,

–Gets associated locks for each request, aggregated in an XML format
@get_locks = 1,

–Get average time for past runs of an active query
–(based on the combination of plan handle, sql handle, and offset)
@get_avg_time = 0,

–Get additional non-performance-related information about the session or request
–text_size, language, date_format, date_first, quoted_identifier, arithabort, ansi_null_dflt_on,
–ansi_defaults, ansi_warnings, ansi_padding, ansi_nulls, concat_null_yields_null,
–transaction_isolation_level, lock_timeout, deadlock_priority, row_count, command_type

–If a SQL Agent job is running, an subnode called agent_info will be populated with some or all of
–the following: job_id, job_name, step_id, step_name, msdb_query_error (in the event of an error)

–If @get_task_info is set to 2 and a lock wait is detected, a subnode called block_info will be
–populated with some or all of the following: lock_type, database_name, object_id, file_id, hobt_id,
–applock_hash, metadata_resource, metadata_class_id, object_name, schema_name
@get_additional_info = 1,

–Walk the blocking chain and count the number of
–total SPIDs blocked all the way down by a given session
–Also enables task_info Level 1, if @get_task_info is set to 0
@find_block_leaders = 1,

–Pull deltas on various metrics
–Interval in seconds to wait before doing the second data pull
@delta_interval = 0,

–List of desired output columns, in desired order
–Note that the final output will be the intersection of all enabled features and all
–columns in the list. Therefore, only columns associated with enabled features will
–actually appear in the output. Likewise, removing columns from this list may effectively
–disable features, even if they are turned on

–Each element in this list must be one of the valid output column names. Names must be
–delimited by square brackets. White space, formatting, and additional characters are
–allowed, as long as the list contains exact matches of delimited valid column names.
@output_column_list = ‘[dd%][session_id][sql_text][sql_command][login_name][wait_info][tasks][tran_log%][cpu%][temp%][block%][reads%][writes%][context%][physical%][query_plan][locks][%]’,

–Column(s) by which to sort output, optionally with sort directions.
–Valid column choices:
–session_id, physical_io, reads, physical_reads, writes, tempdb_allocations,
–tempdb_current, CPU, context_switches, used_memory, physical_io_delta,
–reads_delta, physical_reads_delta, writes_delta, tempdb_allocations_delta,
–tempdb_current_delta, CPU_delta, context_switches_delta, used_memory_delta,
–tasks, tran_start_time, open_tran_count, blocking_session_id, blocked_session_count,
–percent_complete, host_name, login_name, database_name, start_time, login_time

–Note that column names in the list must be bracket-delimited. Commas and/or white
–space are not required.
@sort_order = ‘[start_time] ASC’,

–Formats some of the output columns in a more “human readable” form
–0 disables outfput format
–1 formats the output for variable-width fonts
–2 formats the output for fixed-width fonts
@format_output = 1,

–If set to a non-blank value, the script will attempt to insert into the specified
–destination table. Please note that the script will not verify that the table exists,
–or that it has the correct schema, before doing the insert.
–Table can be specified in one, two, or three-part format
@destination_table = ”,

–If set to 1, no data collection will happen and no result set will be returned; instead,
–a CREATE TABLE statement will be returned via the @schema parameter, which will match
–the schema of the result set that would be returned by using the same collection of the
–rest of the parameters. The CREATE TABLE statement will have a placeholder token of
–<table_name> in place of an actual table name.
@return_schema = 0

monitor sql server

SELECT mg.session_id
, db_name(ss.database_id) as DBName
, ss.login_name
, ss.host_name
, ss.program_name
, convert(varchar(max),st.[text]) as QueryText
, rp.[name] as ResourceGroup –Welke resource governor memory pool betreft dit
, ss.login_time as [SPID_login_time]
, rs.waiter_count –Aantal wachtende processen op een grant van een query memory resource
, mg.ideal_memory_kb /1024 as [ideal_memory_MB] –Size, in kilobytes, of the memory grant to fit everything into physical memory. This is based on the cardinality estimate.
, mg.requested_memory_kb /1024 as [requested_memory_MB] –Total requested amount of memory in kilobytes.
, mg.granted_memory_kb /1024 as [granted_memory_MB] –Total amount of memory actually granted in kilobytes. Can be NULL if the memory is not granted yet.
, mg.required_memory_kb /1024 as [required_memory_MB] –Minimum memory required to run this query in kilobytes. requested_memory_kb is the same or larger than this amount.
, mg.used_memory_kb /1024 as [used_memory_MB] –Physical memory used at this moment in kilobytes.
, mg.max_used_memory_kb /1024 as [max_used_memory_MB]–Maximum physical memory used up to this moment in kilobytes.
, mg.query_cost –Estimated query cost.
, mg.timeout_sec –Time-out in seconds before this query gives up the memory grant request.
, ss.cpu_time /1000 as [SPID_CpuSec]
, ss.reads as [SPID_reads]
, ss.writes as [SPID_writes]
, ss.logical_reads as [SPID_logical_reads]
–, (SELECT CEILING(physical_memory_kb/1024.0) FROM sys.dm_os_sys_info WITH (NOLOCK)) as [Physical_Os_Memory_MB]
–, (SELECT CEILING(available_physical_memory_kb/1024.0) FROM sys.dm_os_sys_memory WITH (NOLOCK)) as [Available_Os_Memory_MB]
–, (SELECT cast([value_in_use] as int) FROM [master].[sys].[configurations] WHERE Name = ‘Max Server Memory (MB)’) as [Max_SQL_Server_Memory_MB]
–, (SELECT cntr_value /1024 FROM sys.dm_os_performance_counters WHERE counter_name = ‘Total Server Memory (KB)’) as [Total_SQL_Server_Memory_MB]
–, (SELECT cntr_value/ 1024 FROM sys.dm_os_performance_counters WHERE counter_name = ‘Target Server Memory (KB)’) as [Target_SQL_Server_Memory_MB]
FROM sys.dm_exec_query_memory_grants mg with(nolock)
CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) st
inner join sys.dm_exec_sessions ss with(nolock) on mg.session_id = ss.session_id
inner join sys.dm_resource_governor_resource_pools rp with(nolock) on mg.pool_id = rp.pool_id
inner join sys.dm_exec_query_resource_semaphores rs with(nolock) on mg.resource_semaphore_id = rs.resource_semaphore_id
and mg.pool_id = rs.pool_id
WHERE [mg].[session_id] <> @@SPID
and is_small = 0 –Hide trivial resource requests

calculate age

/*
select dbo.calc_age(‘1971-03-06’, ‘2016-02-04’ )
select dbo.calc_age(‘1971-03-06’, ‘2016-03-04’ )
select dbo.calc_age(convert(date, ‘1977-03-06’), convert(date, ‘2016-03-05’) )
select dbo.calc_age(‘1971-03-06’, ‘2016-03-06’ )
select dbo.calc_age(‘1971-03-06’, ‘2016-03-08’ )
select dbo.calc_age(‘1971-03-06’, ‘2016-11-01’ )

*/

alter function dbo.calc_age ( @birth_date date, @t1 date)
returns int as
begin
declare @age as int

select @age = DATEDIFF(yy, @birth_date, @t1) –
CASE — before birthday
WHEN month(@t1)<month(@birth_date)
or (month(@t1)=month(@birth_date) and
day(@t1) < day(@birth_date) )
THEN 1
ELSE 0
END
return @age
end

determine smallest datatype to hold values in table

set nocount on ;

— This script will look for smallest datatypes . e.g. smallint instead of nvarchar(255)

declare @sql as varchar(max) =”
,@data_type_id int
,@col_name varchar(max)
, @data_type as varchar(255)
, @col_str as varchar(max) =”

— Step 1 put column names and datatype into #cols table
IF OBJECT_ID(‘tempdb..#cols’) IS NOT NULL
DROP TABLE #cols

select column_id, c.name col_name , c.system_type_id, c.max_length, convert(varchar(255), ”) data_type, convert(varchar(max), ”) check_sql
into #cols
from sys.columns c
where objecT_id = object_id(‘[STG].[KCVS]’)

–select * from #cols
–select * from sys.types

— Step 2 for each col determin smallest type

declare c cursor for
select col_name, dbo.udf_determine_min_datatype(col_name, ‘stg.KCVS’) sql
from #cols
–where column_id >= 1 and column_id < 50

open c
FETCH NEXT FROM c
INTO @col_name , @sql

WHILE @@FETCH_STATUS = 0
BEGIN
IF OBJECT_ID(‘tempdb..#T1’) IS NOT NULL
DROP TABLE #T1

create TABLE #T1 ( value varchar(255) , data_type varchar(100), cnt int )

insert into #T1
execute (@sql)

— next determine datatype if there are > 1
;
with q as(
select
case when value = ‘#NULL!’ then 9999 — ignore
when data_type like ‘smallint%’ then 100 else 10 end seq_nr — order datatypes varchar over smallint
, *
from #T1
)
select @data_type = data_type from q where seq_nr = (select min(seq_nr) from q )
update #cols set data_type = @data_type , check_sql= @sql where col_name =@col_name

FETCH NEXT FROM c
INTO @col_name , @sql
END

CLOSE c
DEALLOCATE c

select * from #cols

select @col_str += col_name + ‘ ‘ + data_type + ‘ NULL,

from #cols

print @col_str
/*

alter function dbo.udf_determine_min_datatype(
@col_name varchar(255),
@tbl_name varchar(255)
)
RETURNS varchar(max)
as
begin
declare @res as int
, @sql as varchar(max) =”

set @sql = ‘
select ‘+@col_name+ ‘,
case when isnumeric(‘+ @col_name + ‘)=1 then ”smallint”
when ‘+@col_name + ‘ is null then ”smallint”
else ”varchar(” + convert(varchar(10),(select max(len(‘+@col_name+ ‘)) from ‘+@tbl_name + ‘) ) +”)” end dt
, count(*) cnt
from ‘+ @tbl_name + ‘ group by ‘+ @col_name

return @sql

end
*/
–select S001aQ1,
— case when isnumeric(S001aQ1)=1 then ‘smallint’
— when S001aQ1 is null then ‘smallint’
— else ‘varchar(‘ + convert(varchar(10),(select max(len(S001aQ1)) from stg.KCVS) ) +’)’ end dt
— , count(*) cnt
— from stg.KCVS group by S001aQ1

Datavault, Anchor modeling or Inmon 3NF

This article will give you the pros and cons of different datawarehouse architectures, so that you can choose which one is best for your customer. As a BI architect, my job is to be objective. Try not to defend the things you know, but try to keep an open mind.

We will limit our scope to 3 popular architectures: Datavault, Anchor modeling and Inmon 3NF.

Example domain: Nasa Facilities. This is just a simple listing of Nasa centers and facilities. Each Centers hosts one or more facilities.

Powered by Socrata

Modelled in Data Vault 2.0

 

MODELLED in Inmon style 3NF
nasa_f_3NF
MODELLED IN ANCHOR MODELING

Usability

 

Links
  1. Comparisons between modelling techniques – Hybrid (including Data Vault) Roelant V.

Multiple schemas (MS)

Suppose we have 2 source systems A and B. Both systems contain information regarding employees. System A contains login_name and email address, System B contains department.

In our datawarehouse we like to merge this information. My preferred way of doing is is the following.

System A has a schema called A in My_DWH.

System B has a schema called B in My_DWH.

So in this example My_DWH contains the table A.user and B.user.
We can also create a table or view called dbo.user in My_DWH that contains a merged view of the two.

It’s important to have A.user and B.user for lineage. You need to be able to go back to the source of records in dbo.user.

* It’s better to use this pattern also when you only have System A in your data warehouse (i.e. there is only 1 user entity). Otherwise you will have a lot to do when System B is added at a later time.

When using this pattern in Staging, we avoid issues when different sources have the same table names.

[MS_src_schema_id]

When in the above example records from schema A and B are merged in My_DWH.dbo.user, we might get uniqueness issues of natural keys. To solve these issues we always add a column named src_schema_id to the natural key.

e.g. dbo.user
sur_key , src_schema_id          , natural_key ,   synonym_id 
1              , A                                   , ‘bvdberg’        , null
2              , B                                   , ‘bvdberg’        , 1

The combination of src_schema_id and natural_key is unique and we can use the [Syn] pattern to de-duplicate these records in DWH. Note that we can choose to only show sur_key 1 in the datamart. (surrogate foreign keys will not point to entities that have synonyms).