Calculating Resource Needs:
The Links below all provide similar 'recipes' for determining the amount of resources your Spark Job will need, or how to set parameters to maximize 'bang for the buck'.
https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/
Calculating the Spread of a security at the time of a transaction: https://databricks.com/blog/2019/10/09/democratizing-financial-time-series-analysis-with-databricks.html
Setting up a Stream of Tweets: https://www.linkedin.com/pulse/apache-spark-streaming-twitter-python-laurent-weichberger/
Analyzing hashtags on a Stream of Tweets: https://towardsdatascience.com/hands-on-big-data-streaming-apache-spark-at-scale-fd89c15fa6b0
Real-Time ingestion and ETL of unstructured Logs into a Data Warehouse: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2070341989008551/3601578643761083/latest.html
Hooking Spark up on an AWS Kinesis Stream: https://aws.amazon.com/blogs/big-data/querying-amazon-kinesis-streams-directly-with-sql-and-spark-streaming/
**NLP on Spark to summarize Strategic Reports: ** https://databricks.com/notebooks/esg_notebooks/01_esg_report.html
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PySpark_Cheat_Sheet_Python.pdf
https://intellipaat.com/mediaFiles/2019/03/PySpark-SQL-cheat-sheet.jpg