Your Location is: Home > Scala

How to filter duplicates in an RDD scala?

From: Laos View: 1136 nancy 

Question

I have an rdd with different fields namely a, b, c, d. I would like to filter on one of the field which has duplicate values in it. For example

inputRdd = [(1,2,3,4), (1,2,4,5), (2,3,4,5), (2,6,4,8), (2,0,3,7), (3,5,6,7), (9,1,5,6)]

resultRdd = [(1,2,3,4), (1,2,4,5), (2,3,4,5), (2,6,4,8), (2,0,3,7)]

is there a function that I can apply to do this?

something like

resultRDD = inputRdd.filter(x => x.a.contains("identify duplicates"))

Best answer