Principle and Use of External Memory Grouping in esProc
After data are imported from a data table, they are usually grouped as required and grouping and summarizing result is needed to be worked out. In esProc, groups function is used to compute the result of data grouping and summarizing; or the function will first group the data, then further analysis and computation are to be performed later.
But the case will be different in processing huge data, for the records cannot be loaded to the memory all together and distributed into each group. Other times the number of groups is huge and the grouping and summarizing result cannot even be returned all at once. In these two occasions, the external memory grouping is required.
1. Grouping with cursor by directly specifying group numbers
Letâs create a big, simple data table containing employee information, which includes three fields: employee ID, state and birthday. The serial numbers are generated in order and the states are written in their abbreviated forms obtained arbitrarily from the STATES table of demo database; birthdays are the dates selected arbitrarily within 10,000 days before 1994-12-31.The data table will be stored as a binary file for convenience.









It is thus clear that a sequence consisting of temporary cursor files will be returned in grouping records of cursors using directly specified group numbers. Each cursor file contains the records of a group and the data in a cursor can be further processed.
2. Grouping and summarizing result sets of huge data
When grouping data of cursors, most of the time we neednât to know the detailed data of each group. What we only need is to get the grouping and summarizing result. To get the number of employees of each state from BirthStateRecord, for example, we use groups function to compute the grouping and summarizing result:








It can be seen that each temporary file is the grouping and summarizing result of a part of the data obtained according to employeesâ birthdays. A larger cursor composed of all temporary files will be merged and returned by esProc. When the temporary files are generated, esProc will select a group number suitable for computing, so the rows of data in the temporary files will be a little more than the number of rows we set in buffer area. Special attention is needed in this point.
Go on with the execution of cellsets in the previous cellset file. When cursors are closed in A5, the temporary files will be auotomatically deleted. A4 fetches the first 1,000 birthdays from the cursor generated in A3, and the numbers of employees of each birth date are as follows:
