MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Principle and Use of External Memory Sorting in esProc

Published on 17 September 14
252
0
0

It is often required to sort records of tables during data analysis and computing. In esProc, sort function is used to sort data of sequences or table sequences. External memory sorting is required when data being sorted are massive and cannot be loaded into memory all together, for the ordinary sorting cannot handle this situation.

1. External memory sorting of massive data

In data statistics, cursors are usually used to fetch massive data. This applies in esProc, which also processes big data with the cursor. In esProc, the function of a cursor, which reads one or more records each time according to the position(s) marked by it and wonât return all data all at once, is similar to that in a database stored procedure.

A cursor can only fetch part of the data every time, thus operations like sorting and grouping all data in the cursor cannot be executed directly. esProc uses external memory to handle these operations on massive data. Each time it reads chunks of data and computes them and records the result temporarily in the external memory. Later it will merge all the sub-results into a cursor and works out the final result.

Letâs prepare a data table with huge data in which the dates and 8-digit phone numbers are generated arbitrarily. The data table will be stored in the format of a binary file for convenience.
Principle and Use of External Memory Sorting in esProc - Image 1
Altogether 100,000 rows of data are generated. Read the 50,001th ~51,000th rows of data using the cursor and the result can be seen in C10 as follows:
Principle and Use of External Memory Sorting in esProc - Image 2

Weâll take PhoneRecord, the generated data file, as an example to explore how to perform external sorting in esProc.

Using cs.sortx(xâ¦;n) function in external memory sorting, we can sort the data in cursor csin ascending order according to the computed result of expression xâ¦and set the number of rows in buffer area by defining n and determine the number of records fetched each time when generating a temporary file. For example:
Principle and Use of External Memory Sorting in esProc - Image 3
In order to know about how the external memory is used in esProc to sort data, we click in the debugging area of the toolbar to execute the code step by step until the code in A5 begins to be executed. A2 uses binary data file PhoneRecord to create a cursor. A3 uses sortx function to sort data of the cursor. The sorting result, in fact, will be a larger cursor merged orderly by many temporary cursor files. The result of A3 is as follows:
Principle and Use of External Memory Sorting in esProc - Image 4
A4 fetches the first 1,000 records from this cursor as follows after sorting:
Principle and Use of External Memory Sorting in esProc - Image 5
While the code in A3 is executed, external files, which are also called as temporary files, are generated in the directory of temporary files:
Principle and Use of External Memory Sorting in esProc - Image 6
Because the number of rows in buffer area was set as 20,000 by using sortx function, the 100,000 records in the cursor generated 5 temporary files. The data of one of the temporary files will be imported:
Principle and Use of External Memory Sorting in esProc - Image 7
A2 imports the data as follows:
Principle and Use of External Memory Sorting in esProc - Image 8
A3 works out the number of data in this temporary file as follows:
Principle and Use of External Memory Sorting in esProc - Image 9

By comparing the data in A2 with the final sorting result previously obtained, we can see that the result is, actually, one that obtained by sorting a part of the data. This indicates that each temporary file is the sorting result of some data fetched according to the number of rows in buffer area.

Then go on to execute the previous cellset file. We may find that the temporary files will be deleted automatically when the cursor is closed.

sortx function can also be used to sort multiple fields. For example:
Principle and Use of External Memory Sorting in esProc - Image 10
A3 uses Date and âPhoneNum to sort data, meaning sorting by date in ascending order, and then sorting the data of the same date by phone number in descending order. A4 reads the first 1,000 results as follows after sorting:
Principle and Use of External Memory Sorting in esProc - Image 11
2. Application of external memory sorting
In fact, from the operation of external memory sorting we can see one of the uses of the cursor-style sorting, that is, the sorted data can be used in orderly merging. The operation of orderly merging gets data from many cursors according to a rule that reading records from the cursor that currently contains the smallest ( or biggest) data. Apparently, this type of operation can only be used when the data of every cursor are properly ordered. In addition, joining records of a cursor in alignment with join@x() function also requires that data in every cursor should be sorted.
When the data of a cursor are properly ordered, it can be specified that we fetch data continuously from the cursor until the expression is changed. For example:
Principle and Use of External Memory Sorting in esProc - Image 12
A4 fetches data from the cursor in A3 until the Date is changed, meaning the data of the first day will be fetched; A5 skips data of consecutive three days; A6 fetches the data of the fifth day. The results of A4 and A6 are as follows:
Principle and Use of External Memory Sorting in esProc - Image 13
It is often required to sort records of tables during data analysis and computing. In esProc, sort function is used to sort data of sequences or table sequences. External memory sorting is required when data being sorted are massive and cannot be loaded into memory all together, for the ordinary sorting cannot handle this situation.

1. External memory sorting of massive data

In data statistics, cursors are usually used to fetch massive data. This applies in esProc, which also processes big data with the cursor. In esProc, the function of a cursor, which reads one or more records each time according to the position(s) marked by it and wonât return all data all at once, is similar to that in a database stored procedure.

A cursor can only fetch part of the data every time, thus operations like sorting and grouping all data in the cursor cannot be executed directly. esProc uses external memory to handle these operations on massive data. Each time it reads chunks of data and computes them and records the result temporarily in the external memory. Later it will merge all the sub-results into a cursor and works out the final result.

Letâs prepare a data table with huge data in which the dates and 8-digit phone numbers are generated arbitrarily. The data table will be stored in the format of a binary file for convenience.

Principle and Use of External Memory Sorting in esProc - Image 1

Altogether 100,000 rows of data are generated. Read the 50,001th ~51,000th rows of data using the cursor and the result can be seen in C10 as follows:

Principle and Use of External Memory Sorting in esProc - Image 2

Weâll take PhoneRecord, the generated data file, as an example to explore how to perform external sorting in esProc.

Using cs.sortx(xâ¦;n) function in external memory sorting, we can sort the data in cursor csin ascending order according to the computed result of expression xâ¦and set the number of rows in buffer area by defining n and determine the number of records fetched each time when generating a temporary file. For example:

Principle and Use of External Memory Sorting in esProc - Image 3

In order to know about how the external memory is used in esProc to sort data, we click in the debugging area of the toolbar to execute the code step by step until the code in A5 begins to be executed. A2 uses binary data file PhoneRecord to create a cursor. A3 uses sortx function to sort data of the cursor. The sorting result, in fact, will be a larger cursor merged orderly by many temporary cursor files. The result of A3 is as follows:

Principle and Use of External Memory Sorting in esProc - Image 4

A4 fetches the first 1,000 records from this cursor as follows after sorting:

Principle and Use of External Memory Sorting in esProc - Image 5

While the code in A3 is executed, external files, which are also called as temporary files, are generated in the directory of temporary files:

Principle and Use of External Memory Sorting in esProc - Image 6

Because the number of rows in buffer area was set as 20,000 by using sortx function, the 100,000 records in the cursor generated 5 temporary files. The data of one of the temporary files will be imported:

Principle and Use of External Memory Sorting in esProc - Image 7

A2 imports the data as follows:

Principle and Use of External Memory Sorting in esProc - Image 8

A3 works out the number of data in this temporary file as follows:

Principle and Use of External Memory Sorting in esProc - Image 9

By comparing the data in A2 with the final sorting result previously obtained, we can see that the result is, actually, one that obtained by sorting a part of the data. This indicates that each temporary file is the sorting result of some data fetched according to the number of rows in buffer area.

Then go on to execute the previous cellset file. We may find that the temporary files will be deleted automatically when the cursor is closed.

sortx function can also be used to sort multiple fields. For example:

Principle and Use of External Memory Sorting in esProc - Image 10

A3 uses Date and âPhoneNum to sort data, meaning sorting by date in ascending order, and then sorting the data of the same date by phone number in descending order. A4 reads the first 1,000 results as follows after sorting:

Principle and Use of External Memory Sorting in esProc - Image 11

2. Application of external memory sorting

In fact, from the operation of external memory sorting we can see one of the uses of the cursor-style sorting, that is, the sorted data can be used in orderly merging. The operation of orderly merging gets data from many cursors according to a rule that reading records from the cursor that currently contains the smallest ( or biggest) data. Apparently, this type of operation can only be used when the data of every cursor are properly ordered. In addition, joining records of a cursor in alignment with join@x() function also requires that data in every cursor should be sorted. When the data of a cursor are properly ordered, it can be specified that we fetch data continuously from the cursor until the expression is changed. For example:

Principle and Use of External Memory Sorting in esProc - Image 12

A4 fetches data from the cursor in A3 until the Date is changed, meaning the data of the first day will be fetched; A5 skips data of consecutive three days; A6 fetches the data of the fifth day. The results of A4 and A6 are as follows:

Principle and Use of External Memory Sorting in esProc - Image 13

Related Posts:
Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top