MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

A Code Example of External Memory Sorting in esProc

Published on 18 September 14
203
0
0

It is common to sort records in tables in data analysis. In esProc, sort function is used to sort sequences or data in table sequences. External memory sorting is required when the data being sorted is too huge to be loaded into the memory all together and the ordinary sorting method cannot be used.

For example, the text file Order_Foods.txt contains 50,000 pieces of information of food orders:
A Code Example of External Memory Sorting in esProc - Image 1

Now sort the data of these orders according to the following conditions:

â Sort data by food name in ascending order;

â¡ Sort data by order date in ascending order, then sort data of the same date by food name in descending order;

⢠Sort data by order date in descending order, then sort data of the same date by order amount in descending order.

In fact 50,000 rows of data cannot be counted as huge data. We just take it as an example to illustrate how to sort huge data. In practical business, we may need to handle several hundred million records or the data files of dozens of or a hundred GB size.

In esProc, cursors are used in computing huge data. The records in data tables wonât be fetched all at once. Only one or multiple records will be fetched each time according to the setting. By doing so, memory overflow can be avoided during data computing. Since only some of the data are fetched each time, operations of sorting and grouping on all data in cursors cannot be completed in one step. Using external memory, cs.sortx(xâ¦;n) function in esProc can fetch data in cursors step by step according to n,the set number of rows in buffer area, process them separately and store the results in the external files. In this way, memory overflow wonât happen. The result of sorting with cursors is still a cursor, whose method of fetching data is completely the same as that with an ordinary cursor.

Letâs look at how to sort data in Order_Foods.txt:
A Code Example of External Memory Sorting in esProc - Image 2
A2 uses text file Order_Foods.txtto generate a cursor and make the first row as column name. A3 uses sortx function to sort data of the cursor in A2 by food name in ascending order. The number of rows in buffer area is set as 1,000, which indicates sorting by the unit of 1,000 records. The result will be saved temporarily as an external file which is also a cursor:
A Code Example of External Memory Sorting in esProc - Image 3
While this cursor is being used to fetch data, all generated external files will be merged automatically, and the required data will be fetched thus. As in A4, the first 1,000 records after sorting are fetched as follows:
A Code Example of External Memory Sorting in esProc - Image 4

A5 closes the cursor and the external file resulting from sorting is automatically deleted. Since the cursor in A3 is generated by the cursor in A2, the latter will close if the former closes. If we are required to sort by order date in ascending order then by food name in descending order, a cursor need to be regenerated in A6.

A7 sorts data in cursors using external memory sorting. sortx function can specify multiple sorting fields or sorting expressions. If sorting in descending order is needed in an expression, a negative sign should be added before the expression. The sorting result in A7 is still a cursor:
A Code Example of External Memory Sorting in esProc - Image 5
This time, A8 skips 20,000 records; A9 fetches data from the 20,001th row to the 21,000th row:
A Code Example of External Memory Sorting in esProc - Image 6

Having done their job, cursors are closed in A10 to remove the external files instantly. If all data in cursors are fetched, cursors will close automatically and it is not necessary to call cs.close()instruction.

The method of adding a negative sign before a sorting expression in performing sorting in descending order applies to not only the string-type data in A7, but the data of date type or numerical data. In the operation of sorting in descending order, A12 uses fields of date type and numerical fields respectively. A13 fetches the first 1,000 records after sorting as follows:
A Code Example of External Memory Sorting in esProc - Image 7
It is common to sort records in tables in data analysis. In esProc, sort function is used to sort sequences or data in table sequences. External memory sorting is required when the data being sorted is too huge to be loaded into the memory all together and the ordinary sorting method cannot be used.

For example, the text file Order_Foods.txt contains 50,000 pieces of information of food orders:

A Code Example of External Memory Sorting in esProc - Image 1

Now sort the data of these orders according to the following conditions:

â Sort data by food name in ascending order;

â¡ Sort data by order date in ascending order, then sort data of the same date by food name in descending order;

⢠Sort data by order date in descending order, then sort data of the same date by order amount in descending order.

In fact 50,000 rows of data cannot be counted as huge data. We just take it as an example to illustrate how to sort huge data. In practical business, we may need to handle several hundred million records or the data files of dozens of or a hundred GB size.

In esProc, cursors are used in computing huge data. The records in data tables wonât be fetched all at once. Only one or multiple records will be fetched each time according to the setting. By doing so, memory overflow can be avoided during data computing. Since only some of the data are fetched each time, operations of sorting and grouping on all data in cursors cannot be completed in one step. Using external memory, cs.sortx(xâ¦;n) function in esProc can fetch data in cursors step by step according to n,the set number of rows in buffer area, process them separately and store the results in the external files. In this way, memory overflow wonât happen. The result of sorting with cursors is still a cursor, whose method of fetching data is completely the same as that with an ordinary cursor.

Letâs look at how to sort data in Order_Foods.txt:

A Code Example of External Memory Sorting in esProc - Image 2

A2 uses text file Order_Foods.txtto generate a cursor and make the first row as column name. A3 uses sortx function to sort data of the cursor in A2 by food name in ascending order. The number of rows in buffer area is set as 1,000, which indicates sorting by the unit of 1,000 records. The result will be saved temporarily as an external file which is also a cursor:

A Code Example of External Memory Sorting in esProc - Image 3

While this cursor is being used to fetch data, all generated external files will be merged automatically, and the required data will be fetched thus. As in A4, the first 1,000 records after sorting are fetched as follows:

A Code Example of External Memory Sorting in esProc - Image 4

A5 closes the cursor and the external file resulting from sorting is automatically deleted. Since the cursor in A3 is generated by the cursor in A2, the latter will close if the former closes. If we are required to sort by order date in ascending order then by food name in descending order, a cursor need to be regenerated in A6.

A7 sorts data in cursors using external memory sorting. sortx function can specify multiple sorting fields or sorting expressions. If sorting in descending order is needed in an expression, a negative sign should be added before the expression. The sorting result in A7 is still a cursor:

A Code Example of External Memory Sorting in esProc - Image 5

This time, A8 skips 20,000 records; A9 fetches data from the 20,001th row to the 21,000th row:

A Code Example of External Memory Sorting in esProc - Image 6

Having done their job, cursors are closed in A10 to remove the external files instantly. If all data in cursors are fetched, cursors will close automatically and it is not necessary to call cs.close()instruction.

The method of adding a negative sign before a sorting expression in performing sorting in descending order applies to not only the string-type data in A7, but the data of date type or numerical data. In the operation of sorting in descending order, A12 uses fields of date type and numerical fields respectively. A13 fetches the first 1,000 records after sorting as follows:

A Code Example of External Memory Sorting in esProc - Image 7

This blog is listed under Development & Implementations Community

Related Posts:

Java

 
Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top