esProc - The Class Library of Structured File Computing for Java
In some cases, data must be stored in the file system, rather than in a database. That requires handling file-based data computing manually. Since Java lacks the related class library, you have to hardcode the structured file computing, resulting in complicated and unreadable code. Many are asking on the internet about the Java class library for file computing. Some of their problems are:
http://www.coderanch.com/t/561180/java/java/read-text-file-perform-operation
http://stackoverflow.com/questions/26418282/get-text-file-through-sql-java
https://plus.google.com/112275790033275716955/posts/4p75NJYzMcv
http://stackoverflow.com/questions/28969897/file-based-datastructure-for-java
Yet, you can make up for this deficiency with esProc for free. esProc encapsulates plenty of functions for processing structured files and provides the JDBC interface. A Java application will treat an esProc script as a database stored procedure, execute it after passing parameter and get the result set via JDBC.
The structure of esProc script being integrated into Java application is:Through the example of conditional query on a text file, we’ll see the process of Java’s handling of structured file computing by calling the esProc script. Below is the source data:
Conditional filtering: To find orders in a specified period.
esProc code:Explanation:
A1: Import the file with tab being the default separator. @t means importing the first row as column headers.
A2: Execute the conditional filtering. startDate and endDate are input parameters, like a period from 2010-01-01 to 2010-12-31.
Result:Using JDBC, The main Java program can call an esProc script with the following code:
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// Call esProc script (similar to the stored procedure); orderQuery is the name of the dfx file.
st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call orderQuery (?,?)");
st.setObject(1,"2010-01-01");
st.setObject(2,"2010-12-31");
// Execute the script
st.execute();
// Get the result set
ResultSet rs = st.getResultSet();
……
The returned result is a ResultSet object in accordance with JDBC standards. The method of calling an esProc script is the same as that of accessing a database. Programmers can master it fast as long as they are familiar with JDBC.
If the script is as simple as the above, you can write it right into the JDBC calling program, with lines of statements separated by \n. This is analogous to executing a complex SQL statement. It can save you the trouble of saving a script file.
st = (com. esproc.jdbc.InternalCStatement)con.createStatement();
ResultSet rs1 = st.executeQuery("=file(\"D:\\sOrder.txt\").import@t()\n" + "=A1.select(OrderDate>=date(\"2010-01-01\") && OrderDate<=date(\"2010-12-31\"))");
esProc will return the value of the last expression.
For more details about deploying esProc JDBC and calling script through it, see esProc Integration & Application: Java Invocation .
As a specialized class library for structured computing, esProc can handle more than this. Examples will show its capabilities.
Sorting: To sort records by client numbers in descending order and by year and month in ascending order.
esProc code: =A1.sort(-Client,year(OrderDate),month(OrderDate))
Explanation: Use “-†to sort data in descending order. You need to do some calculation to get the year and the month.
Related information: To perform sorting based on query, you can use =A2.sort(…), or =A1.select(…).sort(…)
Result:Grouping and aggregation: To calculate each seller’s sales amount and number of orders every year.
esProc code: =A1.groups(SellerId,year(OrderDate);sum(Amount),count(~))
Explanation: group function can perform aggregate while grouping data. ~ represents each group or the current group. count(~) is equivalent to count(OrderID).
Result:Getting distinct value: To make a client list.
esProc code: =A1.id(Client)
Result:Removing overlapped values: To get the first record for each client and for each seller.
esPrco code: =A1.group@1(Client,SellerId)
Explanation: group function is used to group records (allowing no aggregation). @1 means getting the first record from each group.
Result:TopN: To get the 3 orders with the greatest sales amount for each seller.
esProc code: =A1.group(SellerId;~.top(3,-Amount):t).conj(t)
Explanation: top function filter records for TopN, in which “-†means reversed order. conj function is used for concatenation.
Related information: To get the order with the greatest sales amount, you can use maxp function.
Result:Related computing: Align Name, Dept and Gender field in emp.txt with sOrder.txt.
esProc code:Explanation:
A3: join function performs a join and changes names of the two tables into s and e respectively. @1 means left join. The result is:Related information: @1 means left join; @f means full join. By default, join function performs inner join.
A4: Get desired fields from the joined table to create a new two-dimensional structured table.
Result:All the above examples assume that the size of the file is relatively small. If the file is too big to be entirely loaded into memory, you can use the esProc cursor to handle it. See related documents for detailed information.
One point worth noting is that all these esProc functionalities are free of charge, so programmers can embed esProc engine into their applications at no cost.