All posts by dev129

Developer

Calculating year to day targets by month

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
DECLARE @TargetDate DATETIME = '01 mar 2018'

IF OBJECT_ID(N'tempdb..#temp') IS NOT NULL DROP TABLE #temp

;WITH cte AS (
	SELECT CAST('01 jan 2018' AS DATE) AS dateid
	UNION ALL
	SELECT dateadd(month,1,dateid) FROM cte 
	WHERE MONTH(dateid) <> 12 
)


SELECT dateid,'averageLoan' AS [TargetName], month(dateid) AS [Target] INTO #temp FROM cte
UNION ALL
SELECT dateid,'target_1', 2 AS [Target] FROM cte
UNION ALL
SELECT dateid,'target_2', 3 AS [Target] FROM cte
UNION ALL
SELECT dateid,'target_3', 4 AS [Target] FROM cte
UNION ALL
SELECT dateid,'target_4', 5 AS [Target] FROM cte


SELECT 
	t1.targetName					AS TargetName, 
	SUM(t1.[target] * t2.[target])	AS TargetValue

FROM		#temp t1 
INNER JOIN	#temp t2	ON t1.dateid = t2.dateid

WHERE 
	MONTH(t1.dateid) <= MONTH(@TargetDate)
AND 
	t1.targetName <> 'averageLoan'
AND 
	t2.targetName = 'averageLoan'
GROUP BY 
	t1.targetName

Average start time of an overnight job

Problem:

You have an overnight job which is triggered by the arrival of some external data.  You keep a record of the date and time that the overnight job is run.   Someone wants to know what the average start time of the job is.  You can’t use the average function on dates and times in T-SQL so you need to write some custom code to calculate the average time the job starts.

Solution:

First, you need to identify the point in time that you want the calculation to consider the start of a new day.  For a job which runs during the day could be 0 – indicating midnight.  For an overnight job you would probably want to consider the possibility that the job could be started the next morning.  In that situation you might choose 18 – indicating 6pm as the start of a new day.

Then, in the calculation itself:

  1. identify the number of seconds since the day division
  2. get the average of number of seconds
  3. add to a date time and extract the time portion

-- Defines the hour of the day when a new day starts
DECLARE @DayDivision INT = 12

IF OBJECT_ID(N'tempdb..#StartTimes') IS NOT NULL DROP TABLE #StartTimes

CREATE TABLE #StartTimes(
	start DATETIME NOT NULL
)

INSERT INTO #StartTimes
VALUES 
	 ('2018-01-09 00:01:38.000')
	,('2018-01-09 23:43:22.000')

SELECT 
	-- 3. Add the number of seconds to a day starting at the 
	--    day division hour, then extract the time portion
	CAST(DATEADD(SECOND,
		-- 2. Average number of seconds
		AVG(
			-- 1. Get the number of seconds from the day division point (@DayDivision)
			DATEDIFF(SECOND, 
				CASE WHEN DATEPART(HOUR,start) < @DayDivision THEN
					SMALLDATETIMEFROMPARTS(YEAR(DATEADD(DAY,-1,start)),MONTH(DATEADD(DAY,-1,start)),DAY(DATEADD(DAY,-1,start)),@DayDivision,0)
				ELSE
					SMALLDATETIMEFROMPARTS(YEAR(start),MONTH(start),DAY(start),@DayDivision,0)
				END
			,start)
		)
	,'01 jan 1900 ' + CAST(@DayDivision AS VARCHAR(2)) + ':00') AS TIME) AS AverageStartTime
FROM #StartTimes

Catch bad XML characters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/*
	Date : 17/01/2018
	Notes: Catches a bad xml character error and displays messages to help with debugging
*/

DECLARE @x VARCHAR(MAX)
SET @x = 'dave' + CHAR(160) + 'smith'

BEGIN TRY
	SELECT CAST(@x AS XML) AS xmlOutput
END TRY
BEGIN CATCH

	DECLARE  @errorMessage VARCHAR(MAX)
			,@errPos INT

	SELECT @errorMessage = ERROR_MESSAGE()
	
	IF PATINDEX('%illegal xml character%', @errorMessage) > 0
	BEGIN  
		SELECT @errPos = SUBSTRING(@errorMessage, 
									PATINDEX('%character %', @errorMessage) + 10,
									PATINDEX('%, illegal%', @errorMessage) - PATINDEX('%character %', @errorMessage) - 10 )

		PRINT 'Error caused by illegal character in XML:'
		PRINT 'Character:' + SUBSTRING(@x,@errPos,1)
		PRINT 'Ascii Code:'	+ CAST(ASCII(SUBSTRING(@x,@errPos,1)) AS VARCHAR(MAX))
		PRINT 'Surrounding text:' + SUBSTRING(@x,@errPos-100,200)
		PRINT ''
	END
    
END CATCH	

 

The other space character… ascii 160

I recently discovered that the space character in ascii has an evil twin.  Like all evil twins it looks exactly the same but it does evil…

Here they are in SSMS:

spacesSideBySide

They look the same – why is there another space character in ascii?

The second space is a non-breaking space character – so any program displaying the text should not separate a line at that point.

Why is ascii 160 evil?

The other day I was trying to extract some data into xml but I kept getting an illegal xml character error.

illegalCharacter

The source data had a single ascii 160 character in it, but all the other spaces were ascii 32.  Even when you find the row of data which has the non breaking space it still requires a lot of work to actually find where the character is!

How can you solve this error?

The best way is to make sure that the data going into your system is cleaned of the non breaking space – if you intend to extract it in XML.

Another possibility is to replace non breaking spaces with breaking spaces in the output.

replace160

Or, if the system that will receive the XML is happy with it, you could change the encoding to UTF-16.

illegalCharToUTF16

Adding many-to-many relationships to a data warehouse

Many to many relationships frequently occur in OLTP databases.  For instance, consider a retail system which stores the details of products in one table. Each product  could belong to  one or more product types; equally, each product type could apply to more than one product.  This is often represented in an OLTP database as:

oltp-many-to-many

However, in a data warehouse this structure is likely to cause performance problems.

Assuming Products has it’s own dimension in the DW.  Here are some alternatives for storing the associated Product Type data:

Proposed structure Issues to condsider
Add a bitmap column to the Product dimension
  • Not easily readable (for humans)
  • Good for binary indicators
  • Good for values which are quite static
  • Very good for performance
  • Needs to be documented well so that all using it clearly understand what each bit indicates
Add an xml column to the Product dimension
  • Quite readable – good for free text values like category names
  • Need to pay attention to the performance implications
  • May need to add an XML index to resolve performance issues
  • Extends easily – it is XML after all!
Add a bridging table to a dimension
  • Does have performance implications due to the joins that will need to be made to connect the tables together
  • How many connections is the dimension likely to have to the bridging table? If most don’t have connections then the performance hit may be acceptable
  • Consider the dimension type – the fewer rows of a type I dimension will have much less of a performance hit than a type II dimension
  • Requires more ETL work than previous options
Add a bridging table to a fact table
  • Consider which type of fact table it is – has the potential to add a lot of redundant information to the DW if not relevant to reporting requirements
  • Good for capturing frequently changing values that need to be reported on
  • Requires more ETL work than previous options