Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

Multi-target Query Assistance

I ran into a scenario with a database and I'm hoping the smart people here could help me out with a "best" way to handle the lookup.

The question: What would be the best way to query across different "target" tables based on a field from the source table.

I've got a few tables that look something like this:

ContentId	ContentTypeId	IsEnabled

Content Table (the source)

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content1 (one of the targets)

622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1234	2023-01-01 16:38:32.111
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	2345	2022-04-27 09:14:01.978

ContentID	ContentTypeID	UserId	CreatedDateUtc

Content2 (one of the targets)

0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	91478	2012-09-27 22:45:51.521
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	4567	2014-04-27 11:22:33.444

The first "Content" table has millions upon millions of records. The Content1 and Content2 tables have a subset, but contain the details I need.

I'm trying to find the most efficient way to connect the Content table to both the Content1 AND Content2 tables so my output would look something like:

ContentId	ContentTypeId	IsEnabled	UserID	CreatedDateUtc

Desired Result Set

E454820D-A695-47CC-9AFA-02FE581DCDD1	9262536B-49A3-4494-802F-04DFF10424ED	1	1234	2012-01-06 12:00:00.00
9E2D29B7-D8EA-4369-830C-0906A63A1B99	9262536B-49A3-4494-802F-04DFF10424ED	1	5678	etc...
3FBE607E-31A8-4D4D-A309-0BED544F6199	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1	etc...
EBF10469-83F0-4666-9477-1234BE10A841	F586769B-0822-468A-B7F3-A94D480ED9B0	1
3274AE49-0B3B-45AF-99EF-1AF71412A4EF	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1
5308ABB1-03E3-484A-943F-1BBEC44119C2	F586769B-0822-468A-B7F3-A94D480ED9B0	0
622B3BEC-8A74-4789-BC6E-213C4D14047F	F7D226AB-D59F-475C-9D22-4A79E3F0EC07	1
D1908CF8-1EF2-4F8E-B314-2A037557BEDE	9262536B-49A3-4494-802F-04DFF10424ED	0
6AF53B6C-1273-4154-9538-33B7E6C92BA8	A0753CFB-923B-4975-AD2A-42E5282A6D5D	0
0F814B78-7246-466B-9255-34F6BB2ADD48	46448885-D0E6-4133-BBFB-F0CD7B0FD6F7	1

There are 28 possible connections from the Content table to other tables.

I've tried a host of things, but the performance is abysmal. I'm asking this clever bunch: What would be the ideal way to do this work?

Find more posts tagged with

Query Optimization

Best Practices

Accepted answers

itCanOnlyBeJared

Thanks for the clarification and the sample query. The LEFT JOINs in your query would be retrieving ALL values from the big table, and only matches from the smaller tables. Since you are only looking for the ones with matching keys (eliminating NULLs from non-matches with COALESCE), I think the INNER JOIN approach with UNIONs might be a lot more performant. Not sure if this is what you're looking for, but hopefully it is a step in the right direction. Let me know if this helps.

SELECT [Contents].[ContentId]
     , [custom_Content1].UserId
     , [custom_Content1].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content1] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content1].ContentId

UNION

SELECT [Contents].[ContentId]
     , [custom_Content2].UserId
     , [custom_Content2].CreatedDateUtc
FROM [ContentsSourceTable] AS [Contents] WITH (NOLOCK)
INNER JOIN [custom_Content2] WITH (NOLOCK)
     ON [Contents].ContentId = [custom_Content2].ContentId
-- add UNIONs here for the INNER JOIN queries to other 26 tables

All comments

itCanOnlyBeJared

Is there a certain structure to the breakdown of data in Content1 and Content2, and what is the reasoning of them being two separate tables (they appear to have the same definition)? For example, are newer records (by CreatedDateUtc) in Content2 and older records in Content1? Also, what was the bottleneck with performance in your attempts so far? What do the queries look like? Understanding this context should help with solutioning.

Perhaps there is a reasonable partitioning scheme that would sufficiently optimize the lookup? There are some potential drawbacks, but maybe an indexed view would be worth consideration in this scenario as well.

KMSigma.SWI

I should have prefaced this with "This is a vendor database and we do not own the schema." But that % would have my first question if I saw this question in the wild, @itCanOnlyBeJared .

itCanOnlyBeJared

[quote userid="2109" url="~/groups/data-driven/f/forum/101141/multi-target-query-assistance/317197"]I should have prefaced this with "This is a vendor database and we do not own the schema."[/quote]

Ah, fun! While you don't own the schema, are you free to add indexes or views as long as there aren't any drastic changes that would impact the application? Or are you limited to coming up with the most efficient lookup query possible with exactly what you've been handed? With lookups over millions and millions of rows, the limited flexibility definitely hampers optimization attempts.

KMSigma.SWI

We can add anything as long as we don't touch the original table content. So yes, Indexes, views, functions, are all allowable.

I've tried a very ugly view (using the 28 connections with a LEFT JOIN), tried a multi-statement table function to be used as a CROSS APPLY, and a handful of other things. None of them seem particularly performant.

mesverrum

The trouble here is that given a particular inefficient table design there isn't a lot you can do via your query to make it more efficient.
Is there currently an index on all the relevant tables for the keys and where conditions you need to be able to use? Without indexes in the right places query performance is pretty doomed.

Is your key on all these tables effectively the combined contentid + contenttypeid? I've seen people make a lot of progress with silly large composite primary keys by creating a composite index with a hash. Downside of hashes is they are not good if you need to do order by or want to use range conditions in the filter, but looking at these kinds of GUID style keys I'm not sure you are going to be doing that. If a hash is not good then I think the indexed view built out of joins mentioned by @itCanOnlyBeJared makes good sense.

If the data set you actually intend to produce is really supposed to be in the millions of rows and no combination of clever indexes and where conditions is going to be able to reduce that you also just have to be prepared for the performance to be what it is. At that point your main bottleneck is likely to be read speeds and RAM and caching. How slow are you talking about? Like a minute or two or walk away and make dinner?

KMSigma.SWI

Yeah - speeds are bad.

For 100 rows < 1 sec
For 1000 rows < 8 sec
For 10000 rows < 72 seconds
For all rows, I'll let you know when it finishes.

Sounds like I just need to deal with this inefficiency at the moment.

My understanding (which is complete conjecture) is that the application layer "above" this database knows about the contextual connections and pivots to the correct table as necessary.

E-Roc

You say content1, content2, etc. all contain a subsets of the rows found in Content. In the example rows you gave for content1 and content2, these appear to be partitioned by contenttype. Is that always the case? Are there overlaps in the data in any way?

KMSigma.SWI

They are (mostly) partitioned this way @E-Roc . There are a few that I can safely exclude in their entirety of the 28 target tables that don't support this description.

E-Roc

Cool. I'm wondering about unioning the partitioned tables together before doing a join. Have you tried that yet? I'm guessing a sizeable portion of your issue is due to the number of joins, as each one has to create a sub-table in memory before the next join can be processed. If you could union them all into one table (maybe using a temp table so you could index it?) getting down to the single join may help. It also should let you transition to an inner join and get the extra benefits there.

KMSigma.SWI

The problem with the target Content tables is they have varying different fields, so getting to an acceptable list (and renaming things along the way) would be my challenge. That said, I'm happy to give it a go. What's the worst that could happen?

Might co-mingle this with the hashing of the composite key. I'll keep playing. Thanks to everyone for the roads to review.

Macknife

Is there a specific reason for why each sub-table exists? Can data from the Content table appear in more than one sub-table, and if so, why?
The reason why I ask is that if there is a specific reason why data exists in a sub-table, and if that data can only exist in a particular sub-table, you could create a metadata reference table to determine which sub-table should be used, providing of course, that there is already existing data that would support that approach.

KMSigma.SWI

@Macknife - Each sub-table contains different field depending on the type of data it represents.

Think:

Chapters in Content1 with [Title], [Description], [Length], [SortOrder], etc.
Pages in Content2 with [Header], [Body], [Length], [SortOrder], etc.
Pictures in Content3 with [Title], [Caption], [PageLocation], etc.

There are some common fields between the two content types (or those that can be aliased to be the same), but none are identical.

Since this is a vendor database, I think some of these tables were from previous versions of the solution and adapted to work with new feature functionality.

ajith.securin