Split a Huge CSV File into Multiple Smaller CSV Files #eg69

Judy - Oct 31 - - Dev Community

Problem description & analysis

Below is CSV file sample.csv:

v2aowqhugt,q640lwdtat,8cqw2gtm0g,ybdncfeue8,3tzwyiouft,…

f0ewv2v00z,x2ck96ngmd,9htr2874n5,fx430s8wqy,tw40yn3t0j,…

p2h6fphwco,kldbn6rbzt,8okyllngxz,a8k9slqfms,bqz5fb7cm9,…

st63tcbfv8,2n862vqzww,2equ0ydeet,0x5tidunc6,npis28avpj,…

bn1u58s39a,mg7064jlrb,edyj3t4s95,zvuf9n29ai,1m0yn8uh0n,…

The file contains a huge volume of data that cannot be wholly loaded into the memory. 100000 rows at most can be loaded at a time into the available memory space. So we need to split the file into multiple smaller CSV files containing 100000 rows each, as shown below:

sample1.csv  100000 rows

sample2.csv  100000 rows

sample[n].csv  less than or equal to 100000 rows

Solution

Write the script p1.dfx below in esProc:
Explanation

A1  Create a cursor for the original CSV file.

A2  Loop through A1’s cursor to read in 100000 rows at one time.

B2  Export A2’s rows to sample[n].csv. #A2 represents the loop number which starts from 1.

Read How to Call an SPL Script in Java to learn how to integrate the script code into a Java program.

SPL open source address

Download

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .