Saturday, November 18, 2017

How to Use Excel Pivot Tables to Summarize Continuous Data

Suppose you have a set of data points that represent some quantity in a volume.  It could be temperature in a solid, for example.  How can you use a Pivot Table find the average temperature in a region on the solid’s surface?

Well, if you’re lucky, the region is rectangular, bounded by some Xmin and Xmax and Ymin and Ymax.  Then you can use the built-in Label Filters.

I created a set of data for this article with -20<=X<=20, -20<=Y<=20, -20<=Z<=20.  The temperature is a function of X,Y,Z using the (arbitrary) formula =2.5*X+0.25*Y^2+2.5*SIN(Z).

First, set up the Pivot Table and chose the option to Allow multiple filters per field.


In the screenshots below, we define a Region with Label Filters in the PivotTable Rows within X=[-2,8] and Y=[5,7].  The function is not dependent on the Z coordinate – we’ll use a Page Filter to constrain Z to 0.  As can be seen in the status line of the fourth screenshot below, the average temperature in that region is 16.67 degrees C.






But suppose the region is more complicated than a rectangle, or you want to summarize multiple regions?

You can define a function in VBA to “discretize” the data, as shown in the code sample, below.
Option Explicit

Function Region(rngX As Range, rngY As Range, rngZ As Range) As String
Const regA_Xmin = -2#: Const regA_Xmax = 8#
Const regA_Ymin = 5#: Const regA_Ymax = 7#
Const regB_Xmin = -6#: Const regB_Xmax = -3#
Const regB_Ymin = 5#: Const regB_Ymax = 7#
Const regC_Xmin = -2#: Const regC_Xmax = 8#
Const regC_Ymin = -5#: Const regC_Ymax = -3#
Const regD_Xmin = 9#: Const regD_Xmax = 12#
Const regD_Ymin = 5#: Const regD_Ymax = 7#
Const regE_Xmin = 16#: Const regE_Xmax = 18#
Const regE_Ymin = -15#: Const regE_Ymax = -7#
Dim x As Double, y As Double, z As Double

x = rngX.Value: y = rngY.Value: z = rngZ.Value

If x >= regA_Xmin And x <= regA_Xmax And _
   y >= regA_Ymin And y <= regA_Ymax Then
        Region = "Region A"
ElseIf x >= regB_Xmin And x <= regB_Xmax And _
       y >= regB_Ymin And y <= regB_Ymax Then
        Region = "Region B"
ElseIf x >= regC_Xmin And x <= regC_Xmax And _
       y >= regC_Ymin And y <= regC_Ymax Then
        Region = "Region C"
ElseIf x >= regD_Xmin And x <= regD_Xmax And _
       y >= regD_Ymin And y <= regD_Ymax Then
        Region = "Region D"
Else
    Region = "NA"
End If

End Function

The function accepts three Range type parameters, each of which are intended to be the address of a single cell.  It returns one of five strings: Region A, Region B, Region C, Region D, NA.

The first four constants that are defined in the first two lines are minimum and maximum X and Y coordinates for the region that were defined in the previous example with Label Filters.  So the string “Region A” is returned if X and Y are within those X-Y intervals.

When we type in cell E2 =Region(A2, B2, C2) and then fill down, Excel calls the function for each of the cells that the formula occupies and then puts the result in that cell.  It can take several seconds for all the cells to be populated, even with an i5 processor.  And the calculation can increase the drain on the battery if you’re doing this unplugged / on the road.

But when it’s complete, you can set up your Pivot Table with the regions in the row.  The average of the temperature values for Region A is shown in cell B5 below as 16.67.



No comments: