1)Make sure your terminal at the main app folder directory
Pull data from API
PS : Use docker images to check Images ID
- docker run -v C:\Users\USER\Desktop\FypApp\DataStorage\data:/app/data (Python Script Image ID)
- file should be in FypApp\DataStorage\data
Store Data in NameNode
PS : Use docker ps to check container ID
- docker cp C:\Users\USER\Desktop\FypApp\DataStorage\data\data_australia.csv (namenode Container ID):/tmp
Make hdfs directory
docker exec -it namenode /bin/bash
hdfs dfs -mkdir -p /data
Store Data in hdfs
- hdfs dfs -put /tmp/data_australia.csv /data/data_australia.csv
Check if Data is stored in hdfs
- hdfs dfs -ls /data
PS: to leave from root of namenode use Ctrl-Z
PS : Use docker ps to check container ID
To go into spark
docker exec -it (spark Container ID) bash
inside bash use spark-bash
Load Data from HDFS
- val df = spark.read.csv("hdfs://namenode:9000/data/data_australia.csv")
PS: to leave from Scala cmd use Ctrl-D PS: to leave from bash cmd use Ctrl-Z
3 BAT files, startup.bat, startup2.bat, shutdown.bat
- startup.bat
- Run this code in your terminal
- (your directory)/startup.bat
- This bat files runs the processes of
- Compose file
- Data ingestion
- Transferring of files from storage container to hdfs to spark container through volume
- Processes the data through spark container, outputs a cleaned csv
- Transfer clean csv to volume
- Start and run the visualization app container
- startup2.bat
- Run this code in a separate terminal from startup.bat, as previous terminal is running streamlit
- (your directory)/startup.bat
- This bat file runs the processes of
- Transferring output file from volume to visualapp container
- Open localhost:8501 for streamlit
- shutdown.bat
- Run this code in the first terminal
- (your directory)/shutdown.bat
- This bat file runs docker-compose down to clear the containers and cleans up docker.
- main.py cannot run, mainV2.py can run, but format is different from process.py, so process.py need to change
- find the function "def getHist(self)" and change the directory of the command to your own spark command. Using old csv first to test because of 1) do the same for the docker-compose files 'def startpage()', 'def closeEvent()', 'def stopCommand()'. Have to do these for prototype first.
- Make sure your host machine has python and pyqt5 installed.
- pip install pyqt5
- pip install pyqt5-tools
- python testui.py ( to run )
- 'Start' will take awhile to load after clicked, starting containers, will put a loading bar in the future
- 'Main Page' will bring you to main page without starting containers
- So far only historical data is done, archive in historical data not done yet.
- docker-compose up first to download the images first