0
Pyspark glom()
I do understand that it returns RDD coalescing all elements within each partition into a list. What happens when we don’t specify the num of partition, is there is a default? where do we actually use it?
1 Antwort
+ 1
Have you tried looking at the documentation? The glom() method does not have any arguments.
https://spark.apache.org/docs/latest/api/JUMP_LINK__&&__python__&&__JUMP_LINK/reference/api/pyspark.RDD.glom.html
https://stackoverflow.com/questions/24996302/setting-sparkcontext-for-pyspark
https://stackoverflow.com/questions/65489387/whats-the-meaning-of-num-slices-parameter-in-sc-parallelize