In this blog we will learn another sql server topics which is very important for every sql server developer, here the topics which we will cover in this: Summarizing Data, Grouping Data: Grouping, Filtering Groups:Filtering, Grouping and Sorting, Working with Subqueries:Queries, Filtering by Subquery, Subqueries as Calculated Fields:Queries, Inserting by Subquery, Update by Subquery, Delete by Subquery, Aggregation.
Summarizing Data
Data aggregation is the process of taking numerous records and collapsing them into a single summary record. When aggregating data, one must decide what records must be considered in the summary and how these records should be summarized. Data can be summarized based on certain fields (attributes) in a database or derived attributes. Once the data to summarize has been decided on, the next step is to decide how to aggregate the data.
The most commonly used aggregate functions are listed below.
Count(SomeField)
Avg(SomeField)
Sum(SomeField)
Max(SomeField)
Min(SomeField)
The following is the SELECT statement syntax used to get a count of the number of records in a table:
Basic Syntax
SELECT
Count(*) AS NewColumnName
FROM
tablename
Group By
columnName
The following is the SELECT statement syntax used to get a count the number of records in the ProductVendor table:
Write Exercise Code:
SELECT
Count(*) AS TotalNumberProducts
From
ProductVendor
The following is the SELECT statement syntax used to get a count the number of records in the ProductVendor table using the OnOrderQty field:
Write Exercise Code:
SELECT
Count(OnOrderQty) AS TotalNumberProducts
From
ProductVendor
This returns 155 records. That is because 305 records have NULL values.
The following is the SELECT statement syntax used to get the Average value of a specified column in a table:
Basic Syntax
SELECT
AVG(columnname) AS NewColumnName
FROM
tablename
The following is the SELECT statement syntax used to get the Average Standard Price in the ProductVendor table:
Write Exercise Code:
Select
AVG(StandardPrice) AS AvgPrice
From
ProductVendor
The following is the SELECT statement syntax used to get the total value of a column for a group of rows in a table:
Basic Syntax
SELECT
SUM(columnname) AS NewColumnName
FROM
tablename
WHERE
columnName = Value
The following is the SELECT statement syntax used to SUM of Order quantity for a specific Product in the SalesOrderDetail table:
Write Exercise Code:
Select
SUM(OrderQty) AS TotalOrder
From
SalesOrderDetail
WHERE
ProductID = 777
In the following SELECT statement returns the MIN item_price
Basic Syntax
SELECT
MIN(columnname) As AliasName
FROM
tablename
In the following SELECT statement returns the MIN unit price
Write Exercise Code:
SELECT
MIN(UnitPrice) As MinPrice
FROM
PurchaseOrderDetail
Result
MinPrice
---------------------
0.21
(1 row(s) affected)
In the following SELECT statement returns the Max unit price
Basic Syntax
SELECT
MAX(columnname) As AliasName
FROM
Tablename
In the following SELECT statement returns the Max unit price.
Write Exercise Code:
SELECT
Max(UnitPrice) As MaxPrice
FROM
PurchaseOrderDetail
Result
MaxPrice
---------------------
482.8345
In the following SELECT statement returns the MIN and Max unit price.
Write Exercise Code:
SELECT
MIN(UnitPrice) As MinPrice, Max(UnitPrice) As MAXPrice
FROM
PurchaseOrderDetail
In more common terms, this request could be similar to. “Can you help me get the highest price PO detail item, and the lowest priced one?”
Result
MinPrice MaxPrice
--------------------- ---------------------
0.21 82.8345
(1 row(s) affected)
Grouping Data:
Grouping Grouping allows you to divide data INTO group sets so you can perform some calculations on each group. The Group By clause can have many columns. The Group By is an example of an operation that allows you to return calculated data in groups based on the group By clause. The following is the SELECT statement syntax used for specific functions:
Basic Syntax
SELECT
columnname, Count(*) AS NewColumnName
FROM
tablename
Group By
columnName
The following is the SELECT statement syntax used to get a count the number of products each vendor has:
Write Exercise Code:
SELECT
VendorID, Count(*) AS TotalNumberProducts
From
ProductVendor
Group By
VendorID
In more common terms, this request could be similar to. “Can you help me get report for the number of products carried by each of our vendors?”
Result
VendorID TotalNumberProducts
---------- -------------------
1 1
2 3
3 1
4 2
5 1
(Not all rows displayed)
Filtering Groups:
Filtering Grouping allows you to divide data into group sets so you can perform some calculations on each group. Filtering allows you to include or exclude specific data. The WHERE clause does not work here because we are filtering by group and not rows. Instead of using the WHERE clause we use the Having Clause. The main difference between the two is WHERE clause filters rows and Having filters by groups. The Having clause is an example of an operation that allows you to return calculated data in groups but filtering for specific group totals.
Basic Syntax
SELECT
columnname, Count(*) AS NewColumnName
FROM
tablename
Group By
columnName
Having
Count(*) operator value
The following is the SELECT statement syntax that will not work because it uses the Having clause in the WHERE clause. Write this code and test it:
Write Exercise Code:
Select
EmployeeID, Sum(VacationHours)
FROM
Employee
WHERE
Sum(VacationHours) > 20
GROUP BY
EmployeeID
Result
N/A/
The following is the SELECT statement syntax that will work because it does not use the Having clause in the WHERE clause. Write this code and test it:
Write Exercise Code:
Select
EmployeeID, Sum(VacationHours)
FROM
Employee
GROUP BY
EmployeeID
HAVING
Sum(VacationHours) > 20
Result
EmployeeID (No column name)
1 21
2 42
4 48
6 40
(Not all rows displayed)
The following is the SELECT statement syntax used for excluding specific data using the Having clause:
Write Exercise Code
SELECT
VendorID, Count(*) AS TotalProducts
From
ProductVendor
Group By
VendorID
Having
Count(*) >=10
Result
VendorID TotalProducts
----------- -------------
7 30
23 23
25 16
(Not all rows displayed)
You could however, include a WHERE clause in the SELECT statement. You could specify that the result set only include those rows that standard price are at least 50 but having a group count of 10 or more:
Write Exercise Code
SELECT
VendorID, Count(*) AS TotalProducts
From
ProductVendor
WHERE
StandardPrice <=50
Group By
VendorID
Having
Count(*) >=10
In more common terms, this request could be similar to. “Can you help me get a Vendor report for the vendors that have 10 or more products. Oh and the products should be $50 or less in price.”
Result
VendorID TotalProducts
----------- -------------
7 30
23 23
25 16
32 16
(Not all rows displayed)
Grouping and Sorting
Grouping and Ordering do different but related functions. Grouping allows you to divide data into group sets so you can perform some calculations on each group.
Filtering allows you to include or exclude specific data. The WHERE clause does not work here because we are filtering by group and not rows. Instead of using the WHERE clause we use the Having Clause. The main difference between the two is WHERE clause filters rows and Having filters by group.
The Having clause is an example of an operation that allows you to return calculated data in groups but filtering for specific group totals.
Basic Syntax
SELECT
columnname, Count(*) AS NewColumnName
FROM
tablename
Group By
columnName
Having
Count(*) operator value
Order BY
columnName
The following SELECT statement is similar to previous example. Here you get a result set with TotalCostQuantity in default order where sum is >= 1000:
Write Exercise Code:
SELECT
SalesOrderID, Sum(OrderQty*UnitPrice) AS TotalCostQuantity
FROM
SalesOrderDetail
Group By
SalesOrderID
Having
Sum(OrderQty*UnitPrice)>=1000
In more common terms, this request could be similar to.
“Can you help me get a Sales order where the total cost of the order is $1000 or above? I’d like the report show show the Sales Order ID, and the total cost of the order.”
Result
SalesOrderID TotalCostQuantity
------------ ---------------------
43659 20565.6206
43660 1294.2529
43661 32726.4786
(Not all rows displayed)
Write Exercise Code:
The following SELECT statement is similar to previous example except we are changing the data type to decimal with two decimals to the right. Here we are using the Cast to change the data type. Here you get a result set with TotalCostQuantity in default order where sum is >= 1000:
SELECT
SalesOrderID, Cast(Sum(OrderQty*UnitPrice) As decimal(10,2)) AS TotalCostQuantity FROM
SalesOrderDetail
Group By
SalesOrderID
Having
Sum(OrderQty*UnitPrice)>=1000
Result
SalesOrderID TotalCostQuantity
------------ ---------------------------------------
43659 20565.62
43660 1294.25
43661 32726.48
(Not all rows displayed)
In this following SELECT statement is similar to previous example but here you add the Order By to order by Total order:
Write Exercise Code:
SELECT
SalesOrderID, Cast(Sum(OrderQty*UnitPrice) As decimal(10,2)) AS TotalCostQuantity FROM
SalesOrderDetail
Group By
SalesOrderID
Having
Sum(OrderQty*UnitPrice)>=2000
Order BY
SalesOrderID ,TotalCostQuantity
Result
SalesOrderID TotalCostQuantity
------------ ---------------------------------------
43659 20565.62
43660 1294.25
43661 32726.48
(Not all rows displayed)
Working with Subqueries:
Queries So far we have been using single statements to return data from tables. Queries refer to SELECT statements. Filtering by Subquery SQL allows you to embed sub queries INTO other queries. In order for you to assure you are returning correct data sets you tables should be in a Relational Database. If you wanted to retrieve customers who ordered a product you would have to do several steps.
In each of the steps you would use the result set to populate the WHERE clause. The following is a set of SELECT statements that retrieves the required data for the seceding SELECT statements.
The basic syntax is as follows:
Basic Syntax
SELECT
column_name
FROM
table
WHERE
column_name = 'VALUE'
The following SELECT statement syntax is the first step used to get the required data from the SalesOrderDetail table. First we want those order numbers from SalesOrderDetail table WHERE the product ID = 897
Write Exercise Code
SELECT
SalesOrderID
FROM
SalesOrderDetail
WHERE
ProductID = 897
Result
SalesOrderID
------------
51823
51875
(2 row(s) affected)
Next we use the results from previous SELECT statement to populate the WHERE clause in the following SELECT statement. Now, we use the IN keyword in the WHERE clause to return the customer IDs from the Sales Order Header table. This SELECT statement returns two customer IDs. We now know that these two customers are the only ones with Product ID 897 orders.
Write Exercise Code:
SELECT
CustomerID
FROM
SalesOrderHeader
WHERE
SalesOrderID
IN (51823,51875)
Result
CustomerID
-----------
520
393
(2 row(s) affected)
Now let’s combine the two previous SELECT statements using the first SELECT statement as a subquery.
This SELECT statement with a subquery returns the same two rows as did the previous two SELECT statements.
Write Exercise Code:
SELECT
CustomerID
FROM
SalesOrderHeader
WHERE
SalesOrderID
IN
( SELECT SalesOrderID FROM SalesOrderDetail WHERE ProductID = 897 )
Result
CustomerID
-----------
393
520
(2 row(s) affected)
Now we need to return the customer information from those customers who ordered product ID 897. The following SELECT statement retrieves the data we want using the result set from the previous SELECT statement.
These SELECT statements hard code the two customers who ordered product 897 as well as combines the two fields:
Write Exercise Code:
SELECT
(FirstName+' '+LastName) As CustomerName
FROM
Contact
WHERE
ContactID
IN (393,520)
Result
------------------------------------------------------------------------------------
Geneva Hill
Paulo Lisboa
(2 row(s) affected)
Let’s return the same data set but using a subquery. This statement performed three separate statements starting from the innermost subquery. This is what we call nested queries.
This SELECT statement uses a subquery to return the required customer data set:
Write Exercise Code:
SELECT
(FirstName+' '+LastName) AS CustomerName
FROM
Contact
WHERE
ContactID
IN
( SELECT ContactID FROM SalesOrderHeader WHERE SalesOrderID IN ( SELECT SalesOrderID FROM SalesOrderDetail WHERE ProductID = 897 ) )
In more common terms, this request could be similar to. “We need to do a product recall on Product ID 897. Can you pull a report for all customers (just First and Last name combined) who have an Sales order in our system?”
Result
CustomerName
-------------------------------------------------------------------
Mari Caldwell
Denis Taylor
(2 row(s) affected)
Subqueries as Calculated Fields:
Queries We can also use sub queries to create calculated fields. For example, how would we get the total number of orders placed by every customer in the customer table? Here, like in previous examples, we can use the SELECT Count(*) to count rows in a table. By adding a specific contact ID in the WHERE clause filter we can get the total orders for that customer.
EXAMPLE:
The following syntax is a SELECT statement that retrieves total orders for a specified customer:
Write Exercise Code:
SELECT
Count(*) AS Orders
FROM
SalesOrderHeader
WHERE
ContactID =395
Result
Orders
-----------
4
(1 row(s) affected)
Now, let’s provide the total orders for each customer in the contact table as a sub query
The following SELECT statement syntax uses a subquery to return total orders for each customer in the Contact table. Notice we also concatenate the First and Last name.
Write Exercise Code:
SELECT
FirstName+' '+LastName AS Customer, (SELECT Count(*)
FROM
SalesOrderHeader
WHERE
SalesOrderHeader.ContactID = Contact.ContactID)AS Orders
FROM
Contact
Order BY
Customer
Result
Cumstomer Orders
----------- -----------
A. Leonetti 3
A. Scott Wright 0
A. Wright 0
Aaron Adams 1
Aaron Alexander 1
Aaron Allen 1
Aaron Baker 1
Aaron Bryant 2
Aaron Butler 1
This subquery is different because we needed to use the fully qualified column names. We used fully qualified names because we referenced the outer query. This is called a Correlated Sub query – A subquery that refers to the outer query. This syntax is used to clear up any possible ambiguity about the column names.
The following syntax is a SELECT statement that retrieves customer total orders without using fully qualified names for columns:
Write Exercise Code
SELECT
FirstName+' '+LastName AS Customer, (SELECT Count(*)
FROM
SalesOrderHeader s
WHERE
s.ContactID = c.ContactID) AS Orders
FROM
Contact c
Order BY
Customer
In more common terms, this request could be similar to. “We’re doing a customer order audit. Can you provide a report of our customers order by name and the number of orders they’ve had with us to date?”
Result
Customer Orders
---------------------- -----------
A. Leonetti 31465
A. Scott Wright 31465
A. Wright 31465
Aaron Adams 31465
Aaron Alexander 31465
Aaron Allen 31465
Aaron Baker 31465
(Not all rows displayed)
Inserting by Subquery
Subqueries also can be used with INSERT statements. The INSERT statement uses the data returned from the subquery to insert into another table. The selected data in the subquery can be modified with any of the character, date or number functions.Basic
Syntax
INSERT INTO
table_name (column1 , column2 )
SELECT
column1, column2
FROM
table1
WHERE
VALUE OPERATOR
Example
Write Exercise Code
Insert Into Location
( Name, CostRate, Availability, ModifiedDate )
Values
( 'Test', 10, 10, GetDate())
Update by Subquery
The subquery can be used in conjunction with the UPDATE statement. Either single or multiple columns in a table can be updated when using a subquery with the UPDATE statement.
UPDATE
table
SET column_name = new_value
WHERE OPERATOR VALUE
(
SELECT COLUMN_NAME
FROM TABLE_NAME WHERE
)
Example
Write Exercise Code:
Update Location
Set Name = 'Test3'
Where LocationID =
(
Select
LocationID
FROM Location
Where LocationID = 2
)
Delete by Subquery
The subquery can be used in conjunction with the DELETE statement like with any other statements mentioned above.
Syntax:
DELETE
FROM
TABLE_NAME
WHERE
OPERATOR VALUE
(
SELECT COLUMN_NAME FROM TABLE_NAME WHERE
)
Example
Write Exercise Code:
Delete Location
Where LocationID =
(
Select LocationID FROM Location Where LocationID = 62
)
General Practice Example Aggregation
You are working with a team to develop a fun holiday pricing plan. You are asked to generate a product report that lists the average (called AVG), minimum(called MIN) and maximum(called MAX) list prices for products, by color... yes by color. The report should exclude all products that don't have prices(e.g.List Price = 0), and any products that don't have a color assigned (e.g. NULL). No subqueries in the answer
Write the SQL statement to accomplish this.
Answer
SELECT
Color, AVG(ListPrice) AS AVG, MIN(ListPrice) AS MIN, MAX(ListPrice) AS MAX
FROM
Product
WHERE
ListPrice >0 AND Color IS NOT NULL
GROUP BY
Color
More Practice with Aggregation
You are working with Sales to generate another sales report. This report needs to show the number of sales orders (called numOrders) for each product sold (Note: not the quantity of order items), and needs to exclude products that have generated fewer than 500 orders. Additionally, the report should be ordered so the products with the largest number of sales is at the top, and the lowest as at the bottom. No subqueries. Write the SQL statement to accomplish this.
Answer
SELECT
ProductID, COUNT(*) AS numOrders
FROM
SalesOrderDetail
GROUP BY
ProductID
HAVING
COUNT(*) >=500
ORDER BY
COUNT(*) DESC
Comments