Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running Serverless Spark jobs with Dataproc Batches Codelab #224

Open
natewh opened this issue Feb 16, 2023 · 1 comment
Open

Error running Serverless Spark jobs with Dataproc Batches Codelab #224

natewh opened this issue Feb 16, 2023 · 1 comment

Comments

@natewh
Copy link

natewh commented Feb 16, 2023

Codelab - Dataproc Serverless
3. Run Serverless Spark jobs with Dataproc Batches](https://codelabs.developers.google.com/dataproc-serverless#2)

When running getting error:

SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
Traceback (most recent call last):
File "/tmp/srvls-batch-1994fce7-ed4a-4ec9-bb67-a01b7c9bb72d/citibike.py", line 32, in
df = spark.read.format("bigquery").load(table)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 177, in load
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o73.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403)
at scala.collection.convert.JavaCollectionWrappers$JIteratorWrapper.next(JavaCollectionWrappers.scala:38)
at scala.collection.StrictOptimizedIterableOps.filterImpl(StrictOptimizedIterableOps.scala:226)
at scala.collection.StrictOptimizedIterableOps.filterImpl$(StrictOptimizedIterableOps.scala:222)
at scala.collection.convert.JavaCollectionWrappers$JIterableWrapper.filterImpl(JavaCollectionWrappers.scala:58)
at scala.collection.StrictOptimizedIterableOps.filter(StrictOptimizedIterableOps.scala:218)
at scala.collection.StrictOptimizedIterableOps.filter$(StrictOptimizedIterableOps.scala:218)
at scala.collection.convert.JavaCollectionWrappers$JIterableWrapper.filter(JavaCollectionWrappers.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:49)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:789)
... 25 more
Caused by: java.lang.ClassNotFoundException: scala.Serializable
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
... 42 more

ERROR: (gcloud.dataproc.batches.submit.pyspark) Batch job is FAILED.

@emix14
Copy link

emix14 commented Jun 16, 2023

This is because the version of spark or bigquery lib. Easy fix, add to command line --version=1.0
Ex: gcloud dataproc batches submit pyspark citibike.py --version=1.0 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants