I realize it's probably a long-shot that anyone is monitoring these issues but just in case...
I'm using the connector to retrieve data from OrientDB in which my column out is an EmbeddedList:
val dfOrient = sqlContext.read
.format("org.apache.spark.orientdb.graphs")
.option("dburl", dbUrl)
.option("user", user)
.option("password", password)
.option("vertextype", "character")
.option("query", "select name, out('appears_in').title from character where outE('appears_in').size() > 0")
.schema(struct)
.load()
I know the data in out is an array of strings like:
["Iron Man"]
["Captain America: The First Avenger","Captain America: The Winter Soldier","Captain America: Civil War","Avengers: Infinity War"]
["Captain America: The Winter Soldier"]
I'm not sure how to convert the out column to an Array<String> such that I can run something like:
val vertices = df
.select(explode(concat(array('name), 'out)) as "x")
.distinct.rdd.map(_.getAs[String](0))
.zipWithIndex.map(_.swap)
Any hints would be appreciated - new to both Spark/OrientDB
I realize it's probably a long-shot that anyone is monitoring these issues but just in case...
I'm using the connector to retrieve data from OrientDB in which my column
outis anEmbeddedList:I know the data in
outis an array of strings like:I'm not sure how to convert the
outcolumn to anArray<String>such that I can run something like:Any hints would be appreciated - new to both Spark/OrientDB