Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

Multi-target Query Assistance

I ran into a scenario with a database and I'm hoping the smart people here could help me out with a "best" way to handle the lookup.

The question: What would be the best way to query across different "target" tables based on a field from the source table.

I've got a few tables that look something like this:

ContentId	ContentTypeId	IsEnabled

Content Table (the source)

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content1 (one of the targets)

622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1234	2023-01-01 16:38:32.111
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	2345	2022-04-27 09:14:01.978

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content2 (one of the targets)

0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	91478	2012-09-27 22:45:51.521
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	4567	2014-04-27 11:22:33.444

The first "Content" table has millions upon millions of records. The Content1 and Content2 tables have a subset, but contain the details I need.

I'm trying to find the most efficient way to connect the Content table to both the Content1 AND Content2 tables so my output would look something like:

ContentId	ContentTypeId	IsEnabled	UserID	CreatedDateUtc

Desired Result Set

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1	1234	2012-01-06 12:00:00.00
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1	5678	etc...
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1	etc...
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

There are 28 possible connections from the Content table to other tables.

I've tried a host of things, but the performance is abysmal. I'm asking this clever bunch: What would be the ideal way to do this work?

Find more posts tagged with

Query Optimization

Best Practices

Accepted answers

itCanOnlyBeJared

Thanks for the clarification and the sample query. The LEFT JOINs in your query would be retrieving ALL values from the big table, and only matches from the smaller tables. Since you are only looking for the ones with matching keys (eliminating NULLs from non-matches with COALESCE), I think the INNER JOIN approach with UNIONs might be a lot more performant. Not sure if this is what you're looking for, but hopefully it is a step in the right direction. Let me know if this helps.

SELECT [Contents].[ContentId]
     , [custom_Content1].UserId
     , [custom_Content1].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content1] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content1].ContentId

UNION

SELECT [Contents].[ContentId]
     , [custom_Content2].UserId
     , [custom_Content2].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content2] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content2].ContentId
-- add UNIONs here for the INNER JOIN queries to other 26 tables

All comments

itCanOnlyBeJared

Is there a certain structure to the breakdown of data in Content1 and Content2, and what is the reasoning of them being two separate tables (they appear to have the same definition)? For example, are newer records (by CreatedDateUtc) in Content2 and older records in Content1? Also, what was the bottleneck with performance in your attempts so far? What do the queries look like? Understanding this context should help with solutioning.

Perhaps there is a reasonable partitioning scheme that would sufficiently optimize the lookup? There are some potential drawbacks, but maybe an indexed view would be worth consideration in this scenario as well.

KMSigma.SWI

I should have prefaced this with "This is a vendor database and we do not own the schema." But that % would have my first question if I saw this question in the wild, @itCanOnlyBeJared .

itCanOnlyBeJared

[quote userid="2109" url="~/groups/data-driven/f/forum/101141/multi-target-query-assistance/317197"]I should have prefaced this with "This is a vendor database and we do not own the schema."[/quote]

Ah, fun! While you don't own the schema, are you free to add indexes or views as long as there aren't any drastic changes that would impact the application? Or are you limited to coming up with the most efficient lookup query possible with exactly what you've been handed? With lookups over millions and millions of rows, the limited flexibility definitely hampers optimization attempts.

KMSigma.SWI

We can add anything as long as we don't touch the original table content. So yes, Indexes, views, functions, are all allowable.

I've tried a very ugly view (using the 28 connections with a LEFT JOIN), tried a multi-statement table function to be used as a CROSS APPLY, and a handful of other things. None of them seem particularly performant.

itCanOnlyBeJared

How often are these tables updated? Perhaps the indexed view option could still be worth a shot? Are the join column(s) indexed? Would it be possible to show the attempted view query and associated query plan for additional context?

KMSigma.SWI

That's a long story, but for the sake of argument let's say it's updated once a day. Like I said, we can add to the schema, but can't touch the data or fields in the data. But we can add indexes.

itCanOnlyBeJared

The indexed view option doesn't modify the data or fields of the underlying tables, but it does introduce schemabinding and prevents structural changes to the underlying tables that may impact the indexed view unless you drop the view first. Some additional info with sample queries in this article: https://www.sqlshack.com/sql-server-indexed-views/

The advantage is that the data in the view's index will be maintained for you, and if that is structured in the way you want to see your results then it should result in a simple and performant query plan.

There may be ways to optimize with indexes alone, but I'd have an easier time assessing further with example queries/indexes/plans.

A few clarifying questions...

Can you elaborate on the 28 possible connections between source and targets?
1. You mentioned an ugly view using these with a LEFT join...maybe I'm missing some context regarding the complexity of these.
Are the necessary JOINs already indexed well?
If you just need to pull in the additional column values from the target tables where you have a matching key with the source table, have you tried two INNER JOIN queries (one for each target table) with a UNION between them (potentially inside a view definition)? With indexes on the join columns, I'm curious if you'd get the performance you need with this approach.

KMSigma.SWI

Yes, the ContentID fields in each of the "connected" tables are indexed.

This is the framework I've done with that crazy join list...

SELECT [Contents].[ContentId]
     , COALESCE(
         [Content1].UserId
       , [Content2].UserId
       -- Additional fields here for the coalesce
       ) AS [UserId]
     , COALESCE(
         [Content1].CreatedDateUtc
       , [Content2].CreatedDateUtc
       -- Additional fields here for the coalesce
     ) AS [CreatedDateUtc]
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
LEFT JOIN [custom_Content1] AS [Content1] WITH (NOLOCK)
     ON [Contents].ContentId = [Content1].ContentId
LEFT JOIN [custom_Content2] AS [Content2] WITH (NOLOCK)
     ON [Contents].ContentId = [Content2].ContentId
-- add a while bunch of other joins here for the 26 other types

I should note: I only play a DBA on television. I'm a novice in many ways.

itCanOnlyBeJared

SELECT [Contents].[ContentId]
     , [custom_Content1].UserId
     , [custom_Content1].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content1] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content1].ContentId

UNION

SELECT [Contents].[ContentId]
     , [custom_Content2].UserId
     , [custom_Content2].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content2] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content2].ContentId
-- add UNIONs here for the INNER JOIN queries to other 26 tables

KMSigma.SWI

Wanted to test this out when I had fresh eyes. Today, I ran an experiment on the LEFT vs. INNER JOINs with my data set:

Returning 1,000,000 rows with the LEFT JOIN: 12 seconds
Returning 1,000,000 rows with the INNER JOIN: 0 seconds, but no rows returned.

Sadly, my initial hunch was correct, and I need to use the LEFT JOIN to eliminate bad lookups.

But thanks @itCanOnlyBeJared for the recommendation.

And oddly, the speed seems way up this morning. Might be the reindexing, might be the underlying IO, might be divine intervention, but I'll take whatever improvement I can.

E-Roc

You say content1, content2, etc. all contain a subsets of the rows found in Content. In the example rows you gave for content1 and content2, these appear to be partitioned by contenttype. Is that always the case? Are there overlaps in the data in any way?

KMSigma.SWI

They are (mostly) partitioned this way @E-Roc . There are a few that I can safely exclude in their entirety of the 28 target tables that don't support this description.

E-Roc

Cool. I'm wondering about unioning the partitioned tables together before doing a join. Have you tried that yet? I'm guessing a sizeable portion of your issue is due to the number of joins, as each one has to create a sub-table in memory before the next join can be processed. If you could union them all into one table (maybe using a temp table so you could index it?) getting down to the single join may help. It also should let you transition to an inner join and get the extra benefits there.

KMSigma.SWI

The problem with the target Content tables is they have varying different fields, so getting to an acceptable list (and renaming things along the way) would be my challenge. That said, I'm happy to give it a go. What's the worst that could happen?

Might co-mingle this with the hashing of the composite key. I'll keep playing. Thanks to everyone for the roads to review.

Macknife

Is there a specific reason for why each sub-table exists? Can data from the Content table appear in more than one sub-table, and if so, why?
The reason why I ask is that if there is a specific reason why data exists in a sub-table, and if that data can only exist in a particular sub-table, you could create a metadata reference table to determine which sub-table should be used, providing of course, that there is already existing data that would support that approach.

KMSigma.SWI

@Macknife - Each sub-table contains different field depending on the type of data it represents.

Think:

Chapters in Content1 with [Title], [Description], [Length], [SortOrder], etc.
Pages in Content2 with [Header], [Body], [Length], [SortOrder], etc.
Pictures in Content3 with [Title], [Caption], [PageLocation], etc.

There are some common fields between the two content types (or those that can be aliased to be the same), but none are identical.

Since this is a vendor database, I think some of these tables were from previous versions of the solution and adapted to work with new feature functionality.

ajith.securin