Sunday 22 August 2021

Deduplicating Collection Items

Deduplication refers to a method of eliminating a dataset's redundant data, here in Kotlin we normally use the toSet() function to get the unique datasets, now the distinct() and distinctBy{} joins the club. Please refer the below code snippet for more details, the below example uses both primitive typed object and data class object (Be careful on the data order, this will give an impact).

data class Tez(val id: String)

val users = listOf("AAA", "aaa", "BBB", "bbb", "CCC", "ccc", "AAA", "aaa", "BBB", "bbb", "CCC", "ccc")
    val tezUsers = listOf(
        Tez("AAA"),
        Tez("aaa"),
        Tez("BBB"),
        Tez("bbb"),
        Tez("CCC"),
        Tez("ccc"),
        Tez("AAA"),
        Tez("aaa"),
        Tez("BBB"),
        Tez("bbb"),
        Tez("CCC"),
        Tez("ccc")
    )
    println("toSet: ${users.toSet()}")
    println("Data Object toSet: ${tezUsers.toSet()}")
    println("distinct : ${users.distinct()}")
    println("Data object distinct: ${tezUsers.distinct()}")
    println("distinctBy: ${users.distinctBy { it.lowercase() }}")
    println("Data object distinctBy: ${tezUsers.distinctBy { it.id.lowercase() }}")

    val revUsers = users.reversed()
    val revTezUsers = tezUsers.reversed()
    println("\n\nreversed toSet: ${revUsers.toSet()}")
    println("reversed Data object toSet: ${revTezUsers.toSet()}")
    println("reversed Distinct : ${revUsers.distinct()}")
    println("reversed Data object distinct: ${revTezUsers.distinct()}")
    println("reversed object distinctBy: ${revUsers.distinctBy { it.lowercase() }}")
    println("reversed Data object distinctBy: ${revTezUsers.distinctBy { it.id.lowercase() }}")

Output:

toSet: [AAA, aaa, BBB, bbb, CCC, ccc]
Data Object toSet: [Tez(id=AAA), Tez(id=aaa), Tez(id=BBB), Tez(id=bbb), Tez(id=CCC), Tez(id=ccc)]
distinct : [AAA, aaa, BBB, bbb, CCC, ccc]
Data object distinct: [Tez(id=AAA), Tez(id=aaa), Tez(id=BBB), Tez(id=bbb), Tez(id=CCC), Tez(id=ccc)]
distinctBy: [AAA, BBB, CCC]
Data object distinctBy: [Tez(id=AAA), Tez(id=BBB), Tez(id=CCC)]


reversed toSet: [ccc, CCC, bbb, BBB, aaa, AAA]
reversed Data object toSet: [Tez(id=ccc), Tez(id=CCC), Tez(id=bbb), Tez(id=BBB), Tez(id=aaa), Tez(id=AAA)]
reversed Distinct : [ccc, CCC, bbb, BBB, aaa, AAA]
reversed Data object distinct: [Tez(id=ccc), Tez(id=CCC), Tez(id=bbb), Tez(id=BBB), Tez(id=aaa), Tez(id=AAA)]
reversed object distinctBy: [ccc, bbb, aaa]
reversed Data object distinctBy: [Tez(id=ccc), Tez(id=bbb), Tez(id=aaa)]

Please refer the above highlighted output, produces the different output due to dataset order.  Also, in the lambda it.lowercase() that won't change the dataset value, it just use for the comparison purpose only.  

Happy Coding :-)