Converting A Spark Dataframe To A Scala Map Collection


Answer :

I don't think your question makes sense -- your outermost Map, I only see you are trying to stuff values into it -- you need to have key / value pairs in your outermost Map. That being said:



val peopleArray = df.collect.map(r => Map(df.columns.zip(r.toSeq):_*))


Will give you:



Array(
Map("age" -> null, "name" -> "Michael"),
Map("age" -> 30, "name" -> "Andy"),
Map("age" -> 19, "name" -> "Justin")
)


At that point you could do:



val people = Map(peopleArray.map(p => (p.getOrElse("name", null), p)):_*)


Which would give you:



Map(
("Michael" -> Map("age" -> null, "name" -> "Michael")),
("Andy" -> Map("age" -> 30, "name" -> "Andy")),
("Justin" -> Map("age" -> 19, "name" -> "Justin"))
)


I'm guessing this is really more what you want. If you wanted to key them on an arbitrary Long index, you can do:



val indexedPeople = Map(peopleArray.zipWithIndex.map(r => (r._2, r._1)):_*)


Which gives you:



Map(
(0 -> Map("age" -> null, "name" -> "Michael")),
(1 -> Map("age" -> 30, "name" -> "Andy")),
(2 -> Map("age" -> 19, "name" -> "Justin"))
)


First get the schema from Dataframe



val schemaList = dataframe.schema.map(_.name).zipWithIndex//get schema list from dataframe


Get the rdd from dataframe and mapping with it



dataframe.rdd.map(row =>
//here rec._1 is column name and rce._2 index
schemaList.map(rec => (rec._1, row(rec._2))).toMap
).collect.foreach(println)


Comments

Popular posts from this blog

Converting A String To Int In Groovy

"Cannot Create Cache Directory /home//.composer/cache/repo/https---packagist.org/, Or Directory Is Not Writable. Proceeding Without Cache"

Android SDK Location Should Not Contain Whitespace, As This Cause Problems With NDK Tools