MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Principle and Use of External Memory Grouping in esProc

Published on 19 September 14
192
0
0

After data are imported from a data table, they are usually grouped as required and grouping and summarizing result is needed to be worked out. In esProc, groups function is used to compute the result of data grouping and summarizing; or the function will first group the data, then further analysis and computation are to be performed later.

But the case will be different in processing huge data, for the records cannot be loaded to the memory all together and distributed into each group. Other times the number of groups is huge and the grouping and summarizing result cannot even be returned all at once. In these two occasions, the external memory grouping is required.


1. Grouping with cursor by directly specifying group numbers

Letâs create a big, simple data table containing employee information, which includes three fields: employee ID, state and birthday. The serial numbers are generated in order and the states are written in their abbreviated forms obtained arbitrarily from the STATES table of demo database; birthdays are the dates selected arbitrarily within 10,000 days before 1994-12-31.The data table will be stored as a binary file for convenience.

Principle and Use of External Memory Grouping in esProc - Image 1
Altogether 1,000,000 rows of data are generated. The result of reading the 50,001th ~51,000th rows of data with cursor can be seen in C10 as follows:
Principle and Use of External Memory Grouping in esProc - Image 2
In the following, weâll take the generated data file, BirthStateRecord, as an example to explore how to group in cursor computing by directly specifying group numbers. Because the data of the big data table cannot be loaded all together into the memory, we cannot perform grouping on it as we do on an ordinary table sequence. To solve this problem, esProc offers cs.groupx(x) function which can distribute the records in cursor cs according to the computed result of expression x into groups with specified serial numbers and return the sequence of cursor. For example:
Principle and Use of External Memory Grouping in esProc - Image 3
To explain the way in which cs.groupx(x) function performs grouping by using external memory to specify group numbers, the code will be executed step by step by clicking in the debugging area of the toolbar until A6. A2 creates a cursor with the binary data file BirthStateRecord. A4 creates a sequence using the statesâ abbreviations in STATES table.A5 uses groupx function to groupdata of the cursor; in this process, we need to find the corresponding serial numbers of the groups in A4 according to states and make them as the group numbers. During the execution of groupx function in A5,a temporary file will be generated for each group to record the grouping result and the sequence of temporary cursor files will be returned as follows:
Principle and Use of External Memory Grouping in esProc - Image 4
While the code in A5 is executed, external files are generated in the directory of temporary files:
Principle and Use of External Memory Grouping in esProc - Image 5
In the operation of groupx function, the number of temporary files equals that of the groups of records. We can import data from one of the temporary files:
Principle and Use of External Memory Grouping in esProc - Image 6
The data A2 imports are as follows:
Principle and Use of External Memory Grouping in esProc - Image 7
It can be seen that the data of a temporary file is, in fact, the employee information of a state. Here it is the data of the state of Missouri. Click on the toolbar in the previous cellset file and go on with the execution of this cellset. In A8, when all cursor files are closed, the temporary files will be deleted automatically. A6 reads from the first cursor file the employee information of the state of Alabama as follows:
Principle and Use of External Memory Grouping in esProc - Image 8
A7 works out the grouping and summarizing result of the 22nd group using groups function, that is, the number of employees from the state of Michigan:
Principle and Use of External Memory Grouping in esProc - Image 9

It is thus clear that a sequence consisting of temporary cursor files will be returned in grouping records of cursors using directly specified group numbers. Each cursor file contains the records of a group and the data in a cursor can be further processed.

2. Grouping and summarizing result sets of huge data

When grouping data of cursors, most of the time we neednât to know the detailed data of each group. What we only need is to get the grouping and summarizing result. To get the number of employees of each state from BirthStateRecord, for example, we use groups function to compute the grouping and summarizing result:

Principle and Use of External Memory Grouping in esProc - Image 10
Thus we can get the result in A3:
Principle and Use of External Memory Grouping in esProc - Image 11
Here we notice that the groups function for grouping and summarizing will return a table sequence of the result after the computation is completed. In the operation of processing massive data, sometimes it is required to produce a great many groups and the result set of grouping and summarizing itself is too big to be returned. Such as the telecom company makes statistics of each customerâs bill; online shopping malls make statistics by groups about the sales of each commodity, and the like. In these cases, the use of groups function may result in a memory overflow. We can use groupx(x:F,â¦;y:F,â¦;n) function instead to perform grouping and summarizing with the help of external memory. In the function, n represents the number of rows in buffer area. For example:
Principle and Use of External Memory Grouping in esProc - Image 12
Still, the code is executed step by step until A4. In A3, groupx function uses external memory to perform grouping and summarizing. In cursor computing, groupx function is used in the operations of both grouping and summarizing with external memory and grouping by directly specified group numbers. The difference of the two operations lies in the parameters. A3 performs grouping by employeesâ birthdays, then summate the number of employees born each day. In the operation, the number of rows in buffer area is set as 1,000. The result returned by A3 is a cursor as follows:
Principle and Use of External Memory Grouping in esProc - Image 13
After the code in A3 is executed, external files will be generated in the directory of temporary files:
Principle and Use of External Memory Grouping in esProc - Image 14
The data of one of the temporary files can be imported:
Principle and Use of External Memory Grouping in esProc - Image 15
The data A2 imports are as follows:
Principle and Use of External Memory Grouping in esProc - Image 16
The data of A3 is as follows:
Principle and Use of External Memory Grouping in esProc - Image 17

It can be seen that each temporary file is the grouping and summarizing result of a part of the data obtained according to employeesâ birthdays. A larger cursor composed of all temporary files will be merged and returned by esProc. When the temporary files are generated, esProc will select a group number suitable for computing, so the rows of data in the temporary files will be a little more than the number of rows we set in buffer area. Special attention is needed in this point.

Go on with the execution of cellsets in the previous cellset file. When cursors are closed in A5, the temporary files will be auotomatically deleted. A4 fetches the first 1,000 birthdays from the cursor generated in A3, and the numbers of employees of each birth date are as follows:
Principle and Use of External Memory Grouping in esProc - Image 18

After data are imported from a data table, they are usually grouped as required and grouping and summarizing result is needed to be worked out. In esProc, groups function is used to compute the result of data grouping and summarizing; or the function will first group the data, then further analysis and computation are to be performed later.

But the case will be different in processing huge data, for the records cannot be loaded to the memory all together and distributed into each group. Other times the number of groups is huge and the grouping and summarizing result cannot even be returned all at once. In these two occasions, the external memory grouping is required.

1. Grouping with cursor by directly specifying group numbers

Letâs create a big, simple data table containing employee information, which includes three fields: employee ID, state and birthday. The serial numbers are generated in order and the states are written in their abbreviated forms obtained arbitrarily from the STATES table of demo database; birthdays are the dates selected arbitrarily within 10,000 days before 1994-12-31.The data table will be stored as a binary file for convenience.

Principle and Use of External Memory Grouping in esProc - Image 1

Altogether 1,000,000 rows of data are generated. The result of reading the 50,001th ~51,000th rows of data with cursor can be seen in C10 as follows:

Principle and Use of External Memory Grouping in esProc - Image 2

In the following, weâll take the generated data file, BirthStateRecord, as an example to explore how to group in cursor computing by directly specifying group numbers. Because the data of the big data table cannot be loaded all together into the memory, we cannot perform grouping on it as we do on an ordinary table sequence. To solve this problem, esProc offers cs.groupx(x) function which can distribute the records in cursor cs according to the computed result of expression x into groups with specified serial numbers and return the sequence of cursor. For example:

Principle and Use of External Memory Grouping in esProc - Image 3

To explain the way in which cs.groupx(x) function performs grouping by using external memory to specify group numbers, the code will be executed step by step by clicking in the debugging area of the toolbar until A6. A2 creates a cursor with the binary data file BirthStateRecord. A4 creates a sequence using the statesâ abbreviations in STATES table.A5 uses groupx function to groupdata of the cursor; in this process, we need to find the corresponding serial numbers of the groups in A4 according to states and make them as the group numbers. During the execution of groupx function in A5,a temporary file will be generated for each group to record the grouping result and the sequence of temporary cursor files will be returned as follows:

Principle and Use of External Memory Grouping in esProc - Image 4

While the code in A5 is executed, external files are generated in the directory of temporary files:

Principle and Use of External Memory Grouping in esProc - Image 5

In the operation of groupx function, the number of temporary files equals that of the groups of records. We can import data from one of the temporary files:

Principle and Use of External Memory Grouping in esProc - Image 6

The data A2 imports are as follows:

Principle and Use of External Memory Grouping in esProc - Image 7

It can be seen that the data of a temporary file is, in fact, the employee information of a state. Here it is the data of the state of Missouri. Click on the toolbar in the previous cellset file and go on with the execution of this cellset. In A8, when all cursor files are closed, the temporary files will be deleted automatically. A6 reads from the first cursor file the employee information of the state of Alabama as follows:

Principle and Use of External Memory Grouping in esProc - Image 8

A7 works out the grouping and summarizing result of the 22nd group using groups function, that is, the number of employees from the state of Michigan:

Principle and Use of External Memory Grouping in esProc - Image 9

It is thus clear that a sequence consisting of temporary cursor files will be returned in grouping records of cursors using directly specified group numbers. Each cursor file contains the records of a group and the data in a cursor can be further processed.

2. Grouping and summarizing result sets of huge data

When grouping data of cursors, most of the time we neednât to know the detailed data of each group. What we only need is to get the grouping and summarizing result. To get the number of employees of each state from BirthStateRecord, for example, we use groups function to compute the grouping and summarizing result:

Principle and Use of External Memory Grouping in esProc - Image 10

Thus we can get the result in A3:

Principle and Use of External Memory Grouping in esProc - Image 11

Here we notice that the groups function for grouping and summarizing will return a table sequence of the result after the computation is completed. In the operation of processing massive data, sometimes it is required to produce a great many groups and the result set of grouping and summarizing itself is too big to be returned. Such as the telecom company makes statistics of each customerâs bill; online shopping malls make statistics by groups about the sales of each commodity, and the like. In these cases, the use of groups function may result in a memory overflow. We can use groupx(x:F,â¦;y:F,â¦;n) function instead to perform grouping and summarizing with the help of external memory. In the function, n represents the number of rows in buffer area. For example:

Principle and Use of External Memory Grouping in esProc - Image 12

Still, the code is executed step by step until A4. In A3, groupx function uses external memory to perform grouping and summarizing. In cursor computing, groupx function is used in the operations of both grouping and summarizing with external memory and grouping by directly specified group numbers. The difference of the two operations lies in the parameters. A3 performs grouping by employeesâ birthdays, then summate the number of employees born each day. In the operation, the number of rows in buffer area is set as 1,000. The result returned by A3 is a cursor as follows:

Principle and Use of External Memory Grouping in esProc - Image 13

After the code in A3 is executed, external files will be generated in the directory of temporary files:

Principle and Use of External Memory Grouping in esProc - Image 14

The data of one of the temporary files can be imported:

Principle and Use of External Memory Grouping in esProc - Image 15

The data A2 imports are as follows:

Principle and Use of External Memory Grouping in esProc - Image 16

The data of A3 is as follows:

Principle and Use of External Memory Grouping in esProc - Image 17

It can be seen that each temporary file is the grouping and summarizing result of a part of the data obtained according to employeesâ birthdays. A larger cursor composed of all temporary files will be merged and returned by esProc. When the temporary files are generated, esProc will select a group number suitable for computing, so the rows of data in the temporary files will be a little more than the number of rows we set in buffer area. Special attention is needed in this point.

Go on with the execution of cellsets in the previous cellset file. When cursors are closed in A5, the temporary files will be auotomatically deleted. A4 fetches the first 1,000 birthdays from the cursor generated in A3, and the numbers of employees of each birth date are as follows:

Principle and Use of External Memory Grouping in esProc - Image 18

Related Posts:
Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top