Rewriting NOP_ProductLoadAllPaged with CTE to solve the performance problems in NOPCommerce 1.7

As a .NET developer, every time I was asked to create an online shop I turned to NopCommerce, since it was free and had every feature one could image. The code is very professional and the architecture a beauty.

Well the last few weeks I also saw the ugly side of NopCommerce. Apparently the devs only tested it on “Mom and pop’s” shops with few products. My project required integration with some distributors APIs and automatic importing of products. 27000 products later NopCommerce was shacking, and the VPS that was running it struggling with hard cpu usage and huge memory consumptions. Considering other e-commerce applications have no problem with this number of products my attention tured at profiling and to the SQL queries used.

The worst part was the admin interface. The entire product management is centered around the same stored procedure that searches and fetches products by a complicated set of criterias, such as categories, min and max prices, keywords, stock, etc. This proc, named NOP_ProductLoadAllPaged, is called on every single page and there is no caching on this. Not even per request (not saying that caching per request would help anything – I’d try sql dependency cache if I was NopCommerce dev team). So I open out sql profiler and to my horror the query took 21 seconds to run. Can you guess why I though the admin panel was unusable?

After some googleing it turns out I’m not the only one to notice this. Jared Nielsen wrote on his blog here http://www.fuzionagency.com/post/2010/09/18/NOP-Commerce-Product-Administration-Slow-with-Large-Number-of-Products-and-Variants.aspx

Jared didn’t have that many products, but he had a lot of variants. He noticed that the actual search query ( a very two page long query) joins too many tables for it’s own good, in an attempt to get all the required information right from the start, and then started to work on the paging with two temporary tables and some queries. The problem with joining every product with categories, variants, manufacturer mapping, product mapping, tags etc is that even with a few hundred products you quickly end up doing matches on 1 million rows. That’s a real performance killer. Jared’s solution was to do the search and only extract productId. He fetches the actual fields only after the paging and everything done, right before the rowset is returned.

Initially I was very happy with his idea, but after I’ve implemented it I realized how different our situations were. I didn’t have lots of variants (one for each product), nor tags, etc. Turns out that in my case it only shaved out 1 sec from the 21 seconds the stored procedure required 😦

After looking at the execution plan I immediately understood what was wrong. Let me explain how NOP_ProductLoadAllPaged works:

First there is a very large query that searches everything that matches the criteria. Going to product admin in the admin panel, calls this proc immediately with no criteria, returning all 27k products. All results from this query is entered in a temp table, with the purpose of executing a query on them and adding and extra column with a rownumber:

CREATE TABLE #PageIndex
	(
		[IndexID] int IDENTITY (1, 1) NOT NULL,
		[ProductID] int NOT NULL
	)

	INSERT INTO #PageIndex ([ProductID])
	SELECT ProductID
	FROM #DisplayOrderTmp with (NOLOCK)
	GROUP BY ProductID
	ORDER BY min([ID])

See the identity column? It’s their way of number the results, so that in the next query they can select the results from any page (for instance results from row nuber 20 to 30):

SELECT
		p.ProductId,
		p.Name,
		p.ShortDescription,
		p.FullDescription,
		p.AdminComment,
		p.ProductTypeId,
		p.TemplateId,
		p.ShowOnHomePage,
		p.MetaKeywords,
		p.MetaDescription,
		p.MetaTitle,
		p.SEName,
		p.AllowCustomerReviews,
		p.AllowCustomerRatings,
		p.RatingSum,
		p.TotalRatingVotes,
		p.Published,
		p.Deleted,
		p.CreatedOn,
		p.UpdatedOn
	FROM
		#PageIndex [pi]
		INNER JOIN Nop_Product p on p.ProductID = [pi].ProductID
	WHERE
		[pi].IndexID > @PageLowerBound AND
		[pi].IndexID < @PageUpperBound
	ORDER BY
		IndexID

So the stored procedure just created two temporary tables with 27000 products each, and all this just to create another column with the rownumber. The irony is that exactly this operation, of inserting huge amount of data on a table with an identity index is causing the performance problem. I’m not saying the other temp table isn’t a bad idea either but this not the way to number rows. That’s what the ROW_NUMBER() function is there for!!!

So if we don’t need a temp table to number rows we can rewrite the whole thing into one large query with either a derived tabled or CTE (Common table expressions). I rewrote everything without temp tables and i managed to get the time down to 3 seconds for a full not-pages result set, and a few milliseconds for a 15 item page of results. The problem was that now I didn’t have a way to return the total number of items before the paging, since you can mix data retrieval and variable assignments in the same query. This mean that I did have to split the query in two with a temp table. So I did, but the difference is that my temp table only contained the results from one page (by default only 15 results), and only the productId column with an extra ‘TotalResults’ column, with no indexes to make inserting slow.

The resulting stored proc executes on my server in 4 sec with a large page of 1 mil items and in 1 sec for 15 item page. This is a massive improvement from the original 21 sec.

So what I basically did is to put the large search query inside a CTE (Common Table Expression), added rownumbering with ROW_NUMBER() OVER (ORDER BY p.ProductID), counted the initial result set with COUNT(*) OVER(PARTITION BY NULL) and make a simple paging query on the CTE, that inserts only the content of the page in a small temp table. Then I retrieve the total results set from, which I now have as an extra column in the temp table and put it as the return value of the stored proc. At the end I simply return all the results from my temp table.

We started with two large 27000 items temp tables, and ended up with only one, 15 items table with no indexes to slow down inserts. The time dropped from 21 sec to 4 sec, so that’s a 500% reduction in time. Not bad. Of course it can still be cut in half (to 1-2 sec) if you implement proper paging in the C# code of the NopCommerce. Right now the gridview gets ALL results and does the paging himself, which is not optimal. In a future article I might show you how I implemented the custom paging in NopCommerce. In the mean time you can read this article about custom paging: http://www.codeproject.com/KB/aspnet/ASPNET_DataGrid_Paging.aspx

Here is the optimized stored procedure, that I wrote starting with from Jared Nielsen’s version. I’ve kept some of his ideas but decided that it’s better to join another table (Nop_ProductVariant) that to run 2 scalar functions on each row, each doing a select. I’m not sure why Jared though scalar functions get optimized by the server. As far as I know they don’t get included in the execution plan at all, so using them not only is an O(n*n) operation like the join he tried to avoid, but you loose the benefit of execution plan optimization and compiling.

-- =============================================
-- Author: www.bitstarsolutions.com - Tudor Carean
-- =============================================
ALTER PROCEDURE [dbo].[Nop_ProductLoadAllPaged]
(
	@CategoryID			int = 0,
	@ManufacturerID		int = 0,
	@ProductTagID		int = 0,
	@FeaturedProducts	bit = null,	--0 featured only , 1 not featured only, null - load all products
	@PriceMin			money = null,
	@PriceMax			money = null,
	@Keywords			nvarchar(MAX) = '',	
	@SearchDescriptions bit = 0,
	@ShowHidden			bit = 0,
	@PageIndex			int = 0, 
	@PageSize			int = 20, --2147483644,
	@FilteredSpecs		nvarchar(300) = null,	--filter by attributes (comma-separated list). e.g. 14,15,16
	@LanguageID			int = 7, --0,
	@OrderBy			int = 0, --0 position, 5 - Name, 10 - Price
	@TotalRecords		int = null OUTPUT
)
AS
BEGIN
	
	--init
	SET @Keywords = isnull(@Keywords, '')
	SET @Keywords = '%' + rtrim(ltrim(@Keywords)) + '%'

	SET @PriceMin = isnull(@PriceMin, 0)
	SET @PriceMax = isnull(@PriceMax, 2147483644)
	
	--DECLARE @PageIndexTable TABLE  
	CREATE TABLE #PageIndexTable
	(		
		[ProductID] int NOT NULL,
		[total] int
	)
	
	--filter by attributes
	SET @FilteredSpecs = isnull(@FilteredSpecs, '')
	CREATE TABLE #FilteredSpecs
	(
		SpecificationAttributeOptionID int not null
	)
	INSERT INTO #FilteredSpecs (SpecificationAttributeOptionID)
	SELECT CAST(data as int) FROM dbo.[NOP_splitstring_to_table](@FilteredSpecs, ',');
	
	DECLARE @SpecAttributesCount int	
	SELECT @SpecAttributesCount = COUNT(1) FROM #FilteredSpecs

	--paging
	DECLARE @PageLowerBound int
	DECLARE @PageUpperBound int
	DECLARE @RowsToReturn int
	
	SET @RowsToReturn = @PageSize * (@PageIndex + 1)	
	SET @PageLowerBound = @PageSize * @PageIndex
	SET @PageUpperBound = @PageLowerBound + @PageSize + 1;
	
		
	WITH DispOrderCTE (ProductIDCTE, /*RowNumber,*/total) AS
	(
		SELECT DISTINCT 
			p.ProductID, 
			--ROW_NUMBER() OVER (ORDER BY p.ProductID),
			COUNT(*) OVER(PARTITION BY NULL)				
			FROM Nop_Product p with (NOLOCK) 	
			INNER JOIN Nop_ProductVariant pv with (NOLOCK) ON p.ProductID = pv.ProductID		
			LEFT OUTER JOIN Nop_Product_Category_Mapping pcm with (NOLOCK) ON p.ProductID=pcm.ProductID 
			LEFT OUTER JOIN Nop_Product_Manufacturer_Mapping pmm with (NOLOCK) ON p.ProductID=pmm.ProductID 
			LEFT OUTER JOIN Nop_ProductTag_Product_Mapping ptpm with (NOLOCK) ON p.ProductID=ptpm.ProductID 
		WHERE	
		( 
			--P.ProductId = 9
			--AND 
			( 
				-- See if product is published
				@ShowHidden = 1 OR p.Published = 1 
			)
			AND
			( 
				-- See if at least one product variant is published
				@ShowHidden = 1 OR (SELECT Max(Convert(int,Published)) FROM nop_ProductVariant where ProductID=p.ProductID) = 1 
			) 
			AND 
			( 
				p.Deleted=0 
			) 
			AND
			(
				-- Make sure the product is listed between a certain price range
				(SELECT Min(Price) FROM nop_ProductVariant WHERE ProductID=p.ProductID GROUP BY ProductID)
				BETWEEN IsNull(@PriceMin,0) AND IsNull(@PriceMax,999999999999)
			)
			AND 
			( 
				@CategoryID IS NULL 
				OR @CategoryID=0 
				OR 
				(
					pcm.CategoryID=@CategoryID 
					AND 
					(
						@FeaturedProducts IS NULL 
						OR pcm.IsFeaturedProduct=@FeaturedProducts
					)
				) 
			) 
			AND 
			( 
				@ManufacturerID IS NULL 
				OR @ManufacturerID=0 
				OR 
				(
					pmm.ManufacturerID=@ManufacturerID 
					AND 
					(
						@FeaturedProducts IS NULL 
						OR pmm.IsFeaturedProduct=@FeaturedProducts
					)
				) 
			) 
			AND 
			( 
				@ProductTagID IS NULL 
				OR @ProductTagID=0 
				OR ptpm.ProductTagID=@ProductTagID 
			) 			
			AND 
		( 
	-- search standard content 
			patindex(@Keywords, isnull(p.name, '')) > 0 
			or patindex(@Keywords, isnull(pv.name, '')) > 0 
			or patindex(@Keywords, isnull(pv.sku , '')) > 0 
			or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(p.ShortDescription, '')) > 0) 
			or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(p.FullDescription, '')) > 0) 
			or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(pv.Description, '')) > 0) 
			-- search language content 
			-- commented out localized product variants to save speed
			--or patindex(@Keywords, isnull(pl.name, '')) > 0 
			--or patindex(@Keywords, isnull(pvl.name, '')) > 0 
			--or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(pl.ShortDescription, '')) > 0) 
			--or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(pl.FullDescription, '')) > 0) 
			--or (@SearchDescriptions = 1 and patindex(@Keywords, isnull(pvl.Description, '')) > 0) 
		) 
		AND
			(
				@ShowHidden = 1
				OR
				(getutcdate() between isnull(pv.AvailableStartDateTime, '1/1/1900') and isnull(pv.AvailableEndDateTime, '1/1/2999'))
			)			 
			AND 
			( 
		--filter by specs 
				@SpecAttributesCount = 0 
				OR 
				( 
					NOT EXISTS
					( 
						SELECT 1 FROM #FilteredSpecs [fs] 
						WHERE [fs].SpecificationAttributeOptionID NOT IN 
						( 
							SELECT psam.SpecificationAttributeOptionID 
							FROM dbo.Nop_Product_SpecificationAttribute_Mapping psam 
							WHERE psam.AllowFiltering = 1 AND psam.ProductID = p.ProductID 
						) 
					) 
				) 
			) 
		) 
	)		
	INSERT INTO #PageIndexTable (ProductId,total)
	SELECT templist.ProductIDCTE,templist.total from 
	   (select  fulllist.ProductIDCTE,
				fulllist.total, 
				ROW_NUMBER() OVER (ORDER BY fulllist.ProductIDCTE) as RowNumber
		from DispOrderCTE fulllist		
		) templist
		WHERE
			RowNumber > @PageLowerBound AND 
			RowNumber < @PageUpperBound
		ORDER BY
		RowNumber
	
		
	
	SELECT  
		p.ProductId,
		p.Name,
		p.ShortDescription,
		p.FullDescription,
		p.AdminComment,
		p.ProductTypeId,
		p.TemplateId,
		p.ShowOnHomePage,
		p.MetaKeywords,
		p.MetaDescription,
		p.MetaTitle,
		p.SEName,
		p.AllowCustomerReviews,
		p.AllowCustomerRatings,
		p.RatingSum,
		p.TotalRatingVotes,
		p.Published,
		p.Deleted,
		p.CreatedOn,
		p.UpdatedOn		
	FROM
		#PageIndexTable [pi]
		INNER JOIN Nop_Product p on p.ProductID = [pi].ProductID	
	
	if ((select COUNT(*) from #PageIndexTable)>0)
		SELECT @TotalRecords=max(total) from #PageIndexTable
	else
		select @TotalRecords = 0
	
		
	SET ROWCOUNT 0

	
	DROP TABLE #FilteredSpecs
	DROP TABLE #PageIndexTable
/*
--- Test Harness
exec [dbo].[Nop_ProductLoadAllPaged] @PageSize=2147483644
*/
END

servers with multiple ips and HttpWebRequest

I’ve been working on a windows service that uses HttpWebRequest to scrape some data from a website. It was however important to use a certain IP from the 5 that the server had, and never touch the rest of them. The HttpWebRequest class doesn’t seem to provide with an obvious way to do this, or expose the socket it’s using for the connections.

Right when I was about to write my own HttpWebRequest class using nothing but sockets, I stumbled on the  ServicePoint.BindIPEndPointDelegate property. Aparently you can use that to set a callback at the time the socket receives it’s IP, and change it.

You can do something like this:

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("http://MyServer");
request.ServicePoint.BindIPEndPointDelegate = new BindIPEndPoint(BindIPEndPointCallback);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Then just declare the callback, and return any local IP you want.

public static IPEndPoint BindIPEndPointCallback(ServicePoint servicePoint, IPEndPoint remoteEndPoint, int retryCount)
{
     return new IPEndPoint(IPAddress.Any,5000);
}

Also make sure you check the addressfamily of the remoteEndPoint. If your request goes to ipv6 don’t return an ipv4 ip :))