Spark Logs Collector is a simple utility tool used for collecting YARN Application logs and Event logs. The collected log data is used to troubleshoot the Spark applications.
The following are the advantages of Spark Logs Collector
- We can collect both Application logs and Event logs.
- Collected logs are in compressed (
[tar|zip]
) format. - No need to run any commands to collect the logs.
- We can collect the logs even in a Kerberized cluster.
Step1: Download the spark_logs_collector.sh script to any location (for example /tmp) and give the execute permission.
wget https://raw.githubusercontent.com/rangareddy/spark-logs-collector/main/spark_logs_collector.sh
chmod +x spark_logs_collector.sh
Step2: While running the spark_logs_collector.sh script, you need to provide the application_id
.
sh spark_logs_collector.sh <application_id>
Replace application_id with your spark application id.
Example:
sh spark_logs_collector.sh application_1658141526730_0004
By default, this utility will collect both application and event logs. If you don't want to collect any one of the logs you can disable it.
Disabling the Event logs:
export EVENT_LOGS_ENABLED=fase
Disabling the Application logs:
export APPLICATION_LOGS_ENABLED=fase
Even you can change the different user, to collect the Spark logs.
export APPLICATION_USER=rangareddy
Issue: Permission denied: user=<USER_NAME>, access=READ, inode="<DIRECTORY_PATH>":exam:spark:-rwxrwx
Description: User <USER_NAME> don't have hdfs permission to access the <DIRECTORY_PATH> directory.
Solution:
- Provide the correct permission to the user to access the directory.
- Run the script who has permisson to access the directory.
Issue: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
Description: You don't have proper kerberos ticket or kerberos ticket got expired.
Solution:
- Run with
kinit <principal name> [<password>]
Any suggestions or questions, please create an issue to feedback.
Do you want to contribute to this project, please connect with me on Linkedin.
Copyright ©2022 Ranga Reddy, https://github.com/rangareddy