Skip to content

Ankitp1342/spark-copy-job

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-copy-job

This code will help you batch copy Cassandra tables using Spark Jobs. This code has rate-limiters which will prevent from copying too fast which can take down a cluster. Also, the retry policy is not implemented, as it is left to the implementer to do that. The advantage of using this vs doing a dataframe copy is that you can iterate through particular partition ranges and copy parts of a table slowly (very usefuly for large tables).

To compile the code simply run: "sbt assembly" To run the code on DSE: dse spark-submit --class com.spark.copyjob.SparkCopyJob /Spark-Copy-Job-assembly-1.0.jar

** Please node the code will not work as is, as you are expected to fill in the table details in the com.spark.copyjob.CopyJobSession class.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published