“I know there's a proverb which that says 'To err is human,' but a human error is nothing to what a computer can do if it tries.”
- Agatha Christie, Hallowe'en Party

Top Ten Tags

Who's Online

I came across a question at dbforums.com regarding how to get the date and value of the previous month's record. X had a table that store a date and a value. The date field was entered as the last date of the month and there was only one date entered for every month. This date field would be used for the Month-Year. What X was looking for was a way to return all records for each 'month', as well the previous months record for each record.

This can be done using an SQL statement for a query field; i.e. a correlated subquery. A correlated subquery will look at the current record being return and execute the SQL statement for that record. There is a limitation that you need to understand about using a correlated subquery: A correlated subquery must return only one value, i.e. one field and one record.

Assuming we have a table, tblDate, that has the following two fields:

  • dtDate (Date/Time)
  • lngValue (Long Integer)

And let's also assume that only one date and number is entered for every month of the year.

The following SQL will return all the records from tblData and for each record returned it will also return the previous month's data. The one exception will be the very first month; since there is no previous month to return, there is value for our correlated subquery to return.

SELECT dtDate, lngValue, 
       (SELECT TOP 1 a.dtDate 
               FROM tblData As a 
               WHERE a.dtDate < tblData.dtDate 
               ORDER BY a.dtDate DESC) 
       AS PrevDate, 
       (SELECT TOP 1 a.lngValue 
               FROM tblData As a 
               WHERE a.dtDate < tblData.dtDate 
               ORDER BY a.dtDate DESC) 
       AS PrevValue, 
       (SELECT Sum(a.lngValue) 
               FROM tblData As a 
               WHERE (a.dtDate <= tblData.dtDate) 
                      And (Year(a.dtDate) = Year(tblData.dtDate))) 
       AS YTDValue
FROM tblData;

This query works by returning the first record from tblData where dtDate is less than the current record being returned by our SQL statement by use of the TOP 1 statement. A 'TOP N' statement returned only a specified number of records returned by an SQL statement. The WHERE clause returns only those records where dtDate is less then the current record's dtDate value. By ordering the returned records in a descending order, the first record will be the largest date. By using 'TOP 1' in our statement, we are returning the biggest date. There is a potential problem with using TOP 1 but I will return to that shortly.

Remember that we have made the assumption that every month of the year has data. Well, if there is a gap in data then the correlated subquery will return the most recent data where the date is less that the current record being returned by our query. This may or may not be what you want. Your situation may require additional criteria in the WHERE clause of your subquery.

If you want to only return date for the previous month for each returned record, we must change out WHERE clause to only return a record if the difference in months is 1.

SELECT Format(dtDate,"yyyy-mm") AS dtMonthYear,  Sum(lngValue) AS SumOflngValue, 
       (SELECT a.dtDate 
        FROM tblData As a 
        WHERE Format(a.dtDate,"yyyy-mm")=Format(tblData.[dtDate],"yyyy-mm")) 
       AS PrevMonth
FROM tblData
GROUP BY Format(dtDate,"yyyy-mm"), tblData.lngValue;

In this example, the correlated subquery will only return a value if there is a record in the previous month due to the use of DateDiff() in the WHERE clause.

A while back, I mentioned that there was a potential problem using a 'TOP 1' statement. Even though a 'TOP 1' statement should return only 1 record, if there are two records that have the same date as the largest date matching our WHERE clause, both of them would be returned. We can get around this by using a SELECT DISCTINT statement, this will cause the correlated subquery to only return unique dates.

However, this brings me to the next potential problem. Let's assume that two records with the same date are entered. Our correlated subquery (SELECT DISTINCT TOP 1) to return the previous date will work, by using SELECT DISTINCT to return only one date. BUT, if the two records have a different values of lngValue for each record, which one is correct one to return? You use the TOP 1 statement and order the correlated subquery by lngValue to return either the highest or lowest of lngValue. Or you can sum both of them using SUM(). But if it differs from circumstance to circumstance, you will need some other method to identify the correct one in the correlated subquery. If there should only be one value for each date, you can avoid this issue by creating an index of dtDate and maing it unique so that there can be no duplicates, or you could even use dtDate as your primary key.

In the situation where a date is stored and used for as a Month/Year field, it would be common for format the date field (which includes the day) as YYYY-MM using Format(MyDateField,"yyyy-mm") - this is commonly done in order to facilitate ordering of month-year. Now if we grouped our query by a the formatted date field and numeric field, you may return the same month twice but with different values for the numeric field, unless you sum the numeric field.

It is important that you are aware of how a correlated subquery works and what would cause it not to work. I recommend working with a small set of data and creating data that fits and does not fit what you want your data to look like. Remember, just because you want just one date to be entered in your data table for each month-year, it doesn't mean that it can't happen so you need to understand what happens when that is the case.

Another important consideration is the amount of data that you will be working with. If you have a lot of data, using a correlated subquery can be quite slow, so be sure to test it with a representative data set that has as many records as you anticipate working with. If you haven't already thought of it, you can use the DMax() function to accomplish almost the same thing. Keep in mind that DMax allows for only one criteria. Also, DMax() will be slower that using a subquery. Another way to accomplish this is to write code but that is another article!

Hope that helps! If you have another situation that I haven't touched on or if you have any questions on how to apply this to your own database, please feel free to post a question in the forum (in either the article or sample database sections)

Links

SQL subqueries (Microsoft Access SQL)