Create Substring Column In Spark Dataframe

March 27, 2004

Answer :

Such statement can be used

import org.apache.spark.sql.functions._

dataFrame.select(col("a"), substring_index(col("a"), ",", 1).as("b"))

Suppose you have the following dataframe:

import spark.implicits._
import org.apache.spark.sql.functions._

var df = sc.parallelize(Seq(("foobar", "foo"))).toDF("a", "b")

+------+---+
|     a|  b|
+------+---+
|foobar|foo|
+------+---+

You could subset a new column from the first column as follows:

df = df.select(col("*"), substring(col("a"), 4, 6).as("c"))

+------+---+---+
|     a|  b|  c|
+------+---+---+
|foobar|foo|bar|
+------+---+---+

You would use the withColumn function

import org.apache.spark.sql.functions.{ udf, col }
def substringFn(str: String) = your substring code
val substring = udf(substringFn _)
dataframe.withColumn("b", substring(col("a"))

Search This Blog

Dev Stack Source

Create Substring Column In Spark Dataframe

Answer :

Comments

Post a Comment

Popular posts from this blog

530 Valid Hostname Is Expected When Setting Up IIS 10 For Multiple Sites

C Perror Example

Android Studio Design Editor Is Unavailable Until After A Successful Project Sync