I ran into a scenario with a database and I'm hoping the smart people here could help me out with a "best" way to handle the lookup. The question: What would be the best way to query across different "target" tables based on a field from the source table. I've got a few tables that look something like this: ContentIdContentTypeIdIsEnabled Content Table (the source)E454820D-A695-47CC-9AFA-02FE581DCDD19262536B-49A3-4494-802F-04DFF10424ED19E2D29B7-D8EA-4369-830C-0906A63A1B999262536B-49A3-4494-802F-04DFF10424ED13FBE607E-31A8-4D4D-A309-0BED544F6199F7D226AB-D59F-475C-9D22-4A79E3F0EC071EBF10469-83F0-4666-9477-1234BE10A841F586769B-0822-468A-B7F3-A94D480ED9B013274AE49-0B3B-45AF-99EF-1AF71412A4EF46448885-D0E6-4133-BBFB-F0CD7B0FD6F715308ABB1-03E3-484A-943F-1BBEC44119C2F586769B-0822-468A-B7F3-A94D480ED9B00622B3BEC-8A74-4789-BC6E-213C4D14047FF7D226AB-D59F-475C-9D22-4A79E3F0EC071D1908CF8-1EF2-4F8E-B314-2A037557BEDE9262536B-49A3-4494-802F-04DFF10424ED06AF53B6C-1273-4154-9538-33B7E6C92BA8A0753CFB-923B-4975-AD2A-42E5282A6D5D00F814B78-7246-466B-9255-34F6BB2ADD4846448885-D0E6-4133-BBFB-F0CD7B0FD6F71 ContentIDContentTypeIDUserIdCreatedDateUtc Content1 (one of the targets)622B3BEC-8A74-4789-BC6E-213C4D14047FF7D226AB-D59F-475C-9D22-4A79E3F0EC0712342023-01-01 16:38:32.1113FBE607E-31A8-4D4D-A309-0BED544F6199F7D226AB-D59F-475C-9D22-4A79E3F0EC0723452022-04-27 09:14:01.978 ContentIDContentTypeIDUserIdCreatedDateUtc Content2 (one of the targets)0F814B78-7246-466B-9255-34F6BB2ADD4846448885-D0E6-4133-BBFB-F0CD7B0FD6F7914782012-09-27 22:45:51.5213274AE49-0B3B-45AF-99EF-1AF71412A4EF46448885-D0E6-4133-BBFB-F0CD7B0FD6F745672014-04-27 11:22:33.444 The first "Content" table has millions upon millions of records. The Content1 and Content2 tables have a subset, but contain the details I need. I'm trying to find the most efficient way to connect the Content table to both the Content1 AND Content2 tables so my output would look something like: ContentIdContentTypeIdIsEnabledUserIDCreatedDateUtc Desired Result SetE454820D-A695-47CC-9AFA-02FE581DCDD19262536B-49A3-4494-802F-04DFF10424ED112342012-01-06 12:00:00.009E2D29B7-D8EA-4369-830C-0906A63A1B999262536B-49A3-4494-802F-04DFF10424ED15678etc...3FBE607E-31A8-4D4D-A309-0BED544F6199F7D226AB-D59F-475C-9D22-4A79E3F0EC071etc...EBF10469-83F0-4666-9477-1234BE10A841F586769B-0822-468A-B7F3-A94D480ED9B013274AE49-0B3B-45AF-99EF-1AF71412A4EF46448885-D0E6-4133-BBFB-F0CD7B0FD6F715308ABB1-03E3-484A-943F-1BBEC44119C2F586769B-0822-468A-B7F3-A94D480ED9B00622B3BEC-8A74-4789-BC6E-213C4D14047FF7D226AB-D59F-475C-9D22-4A79E3F0EC071D1908CF8-1EF2-4F8E-B314-2A037557BEDE9262536B-49A3-4494-802F-04DFF10424ED06AF53B6C-1273-4154-9538-33B7E6C92BA8A0753CFB-923B-4975-AD2A-42E5282A6D5D00F814B78-7246-466B-9255-34F6BB2ADD4846448885-D0E6-4133-BBFB-F0CD7B0FD6F71 There are 28 possible connections from the Content table to other tables. I've tried a host of things, but the performance is abysmal. I'm asking this clever bunch: What would be the ideal way to do this work?

Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

Multi-target Query Assistance

I ran into a scenario with a database and I'm hoping the smart people here could help me out with a "best" way to handle the lookup.

The question: What would be the best way to query across different "target" tables based on a field from the source table.

I've got a few tables that look something like this:

ContentId	ContentTypeId	IsEnabled

Content Table (the source)

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content1 (one of the targets)

622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1234	2023-01-01 16:38:32.111
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	2345	2022-04-27 09:14:01.978

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content2 (one of the targets)

0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	91478	2012-09-27 22:45:51.521
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	4567	2014-04-27 11:22:33.444

The first "Content" table has millions upon millions of records. The Content1 and Content2 tables have a subset, but contain the details I need.

I'm trying to find the most efficient way to connect the Content table to both the Content1 AND Content2 tables so my output would look something like:

ContentId	ContentTypeId	IsEnabled	UserID	CreatedDateUtc

Desired Result Set

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1	1234	2012-01-06 12:00:00.00
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1	5678	etc...
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1	etc...
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

There are 28 possible connections from the Content table to other tables.

I've tried a host of things, but the performance is abysmal. I'm asking this clever bunch: What would be the ideal way to do this work?

Find more posts tagged with

Query Optimization

Best Practices

Accepted answers

itCanOnlyBeJared

Thanks for the clarification and the sample query. The LEFT JOINs in your query would be retrieving ALL values from the big table, and only matches from the smaller tables. Since you are only looking for the ones with matching keys (eliminating NULLs from non-matches with COALESCE), I think the INNER JOIN approach with UNIONs might be a lot more performant. Not sure if this is what you're looking for, but hopefully it is a step in the right direction. Let me know if this helps.

SELECT [Contents].[ContentId]
     , [custom_Content1].UserId
     , [custom_Content1].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content1] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content1].ContentId

UNION

SELECT [Contents].[ContentId]
     , [custom_Content2].UserId
     , [custom_Content2].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content2] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content2].ContentId
-- add UNIONs here for the INNER JOIN queries to other 26 tables

All comments

itCanOnlyBeJared

Is there a certain structure to the breakdown of data in Content1 and Content2, and what is the reasoning of them being two separate tables (they appear to have the same definition)? For example, are newer records (by CreatedDateUtc) in Content2 and older records in Content1? Also, what was the bottleneck with performance in your attempts so far? What do the queries look like? Understanding this context should help with solutioning.

Perhaps there is a reasonable partitioning scheme that would sufficiently optimize the lookup? There are some potential drawbacks, but maybe an indexed view would be worth consideration in this scenario as well.

KMSigma.SWI

I should have prefaced this with "This is a vendor database and we do not own the schema." But that % would have my first question if I saw this question in the wild, @itCanOnlyBeJared .

itCanOnlyBeJared

[quote userid="2109" url="~/groups/data-driven/f/forum/101141/multi-target-query-assistance/317197"]I should have prefaced this with "This is a vendor database and we do not own the schema."[/quote]

Ah, fun! While you don't own the schema, are you free to add indexes or views as long as there aren't any drastic changes that would impact the application? Or are you limited to coming up with the most efficient lookup query possible with exactly what you've been handed? With lookups over millions and millions of rows, the limited flexibility definitely hampers optimization attempts.

KMSigma.SWI

We can add anything as long as we don't touch the original table content. So yes, Indexes, views, functions, are all allowable.

I've tried a very ugly view (using the 28 connections with a LEFT JOIN), tried a multi-statement table function to be used as a CROSS APPLY, and a handful of other things. None of them seem particularly performant.

mesverrum

The trouble here is that given a particular inefficient table design there isn't a lot you can do via your query to make it more efficient.
Is there currently an index on all the relevant tables for the keys and where conditions you need to be able to use? Without indexes in the right places query performance is pretty doomed.

Is your key on all these tables effectively the combined contentid + contenttypeid? I've seen people make a lot of progress with silly large composite primary keys by creating a composite index with a hash. Downside of hashes is they are not good if you need to do order by or want to use range conditions in the filter, but looking at these kinds of GUID style keys I'm not sure you are going to be doing that. If a hash is not good then I think the indexed view built out of joins mentioned by @itCanOnlyBeJared makes good sense.

If the data set you actually intend to produce is really supposed to be in the millions of rows and no combination of clever indexes and where conditions is going to be able to reduce that you also just have to be prepared for the performance to be what it is. At that point your main bottleneck is likely to be read speeds and RAM and caching. How slow are you talking about? Like a minute or two or walk away and make dinner?

KMSigma.SWI

[quote userid="245123" url="~/groups/data-driven/f/forum/101141/multi-target-query-assistance/317262"]I've seen people make a lot of progress with silly large composite primary keys by creating a composite index with a hash.[/quote]

Talk to me about how this would look. I know composite keys, but I don't think I've ever boiled them down to a hash before.

E-Roc

I've done this before - it doesn't work in every situation, but can be a lifesaver when it does work. Essentially you concatenate all of the fields in the composite key and dump it into the hashbytes function. Picking a common datatype for all of the fields is usually the hardest part. So for a table with a composite key of name, streetnumber, streetname, you'd end up with something like:

select hashbytes('<algorithm>', name+convert(varchar, streetnumber)+streetname) as hashkey from

If the values for the columns of a given row were 'Big Bird', 123, 'Sesame Street' and using SHA2_256 as the algorithm, the statement would translate to

hashbytes('SHA2_256', 'Big Bird123SesameStreet') which yields a value of 0x737CFF2ABACBD32B5C2244561DB735AC29EFB623E053983E84F6D8DEA7FE589D

Comparing hex values should nearly always be faster than comparing strings outright; the big question is whether the conversions to the common datatype and computing the hashes offset the gains or not.

E-Roc

You say content1, content2, etc. all contain a subsets of the rows found in Content. In the example rows you gave for content1 and content2, these appear to be partitioned by contenttype. Is that always the case? Are there overlaps in the data in any way?

KMSigma.SWI

They are (mostly) partitioned this way @E-Roc . There are a few that I can safely exclude in their entirety of the 28 target tables that don't support this description.

E-Roc

Cool. I'm wondering about unioning the partitioned tables together before doing a join. Have you tried that yet? I'm guessing a sizeable portion of your issue is due to the number of joins, as each one has to create a sub-table in memory before the next join can be processed. If you could union them all into one table (maybe using a temp table so you could index it?) getting down to the single join may help. It also should let you transition to an inner join and get the extra benefits there.

KMSigma.SWI

The problem with the target Content tables is they have varying different fields, so getting to an acceptable list (and renaming things along the way) would be my challenge. That said, I'm happy to give it a go. What's the worst that could happen?

Might co-mingle this with the hashing of the composite key. I'll keep playing. Thanks to everyone for the roads to review.

KMSigma.SWI

Does anyone know (simply for curiosity) how the comparisons of hash bytes would compare to unique identifiers?

I'd assume hashes vs. GUIDs equality comparisons would be similarly performant because neither are technically strings (varchar/nvarchar/text/ntext).

Or am I totally off base with this assumption about them not being a primitive string type?

Macknife

Is there a specific reason for why each sub-table exists? Can data from the Content table appear in more than one sub-table, and if so, why?
The reason why I ask is that if there is a specific reason why data exists in a sub-table, and if that data can only exist in a particular sub-table, you could create a metadata reference table to determine which sub-table should be used, providing of course, that there is already existing data that would support that approach.

KMSigma.SWI

@Macknife - Each sub-table contains different field depending on the type of data it represents.

Think:

Chapters in Content1 with [Title], [Description], [Length], [SortOrder], etc.
Pages in Content2 with [Header], [Body], [Length], [SortOrder], etc.
Pictures in Content3 with [Title], [Caption], [PageLocation], etc.

There are some common fields between the two content types (or those that can be aliased to be the same), but none are identical.

Since this is a vendor database, I think some of these tables were from previous versions of the solution and adapted to work with new feature functionality.

E-Roc

Great question! I honestly don't know the answer here. Your assumption is intuitive and feels correct - both are technically binary values. Assuming they are both treated as binary for the purposes of a join, a GUID may actually be slightly faster than hashbytes depending on the output length of the hashbytes function. I'm hesitant to commit to that stance though because I know that GUIDs are considered strings when doing comparisons against character strings; usage in the where clause for instance always hits performance hard. For a straight join between GUIDs? Not sure.

ajith.securin