Loading data from a delimited .dat file and transforming it into columned dataframe scala
My .dat file contains a custom header that is of format 'ABCYYYYMMDD' and footer of format 'A1234'. There is no column header.
The records are delimited by "|" and have 12 fields. To remove footer and header im using the following code:
val fileDF = sc.texfFile(filedirectory) val total = fileDF.count() val fileRdd = fileDF.zipWithIndex().filter(x=> x._2 != 0).filter(x => x._2 != total-1).map(x => x._1)
After this if i try to split the data using
.map(x => x.split("|"))
each character of the string in each columns gets split too.
I want to ultimately convert the rdd into a dataframe and then perform a duplicate check on the combination of first and second column.