Prerequisite

1)Make sure your terminal at the main app folder directory

Data Source

Pull data from API

PS : Use docker images to check Images ID

docker run -v C:\Users\USER\Desktop\FypApp\DataStorage\data:/app/data (Python Script Image ID)
file should be in FypApp\DataStorage\data

Data Storage

Store Data in NameNode

PS : Use docker ps to check container ID

docker cp C:\Users\USER\Desktop\FypApp\DataStorage\data\data_australia.csv (namenode Container ID):/tmp

Make hdfs directory

docker exec -it namenode /bin/bash
hdfs dfs -mkdir -p /data

Store Data in hdfs

hdfs dfs -put /tmp/data_australia.csv /data/data_australia.csv

Check if Data is stored in hdfs

hdfs dfs -ls /data

PS: to leave from root of namenode use Ctrl-Z

Batch Processing

PS : Use docker ps to check container ID

To go into spark

docker exec -it (spark Container ID) bash
inside bash use spark-bash

Load Data from HDFS

val df = spark.read.csv("hdfs://namenode:9000/data/data_australia.csv")

PS: to leave from Scala cmd use Ctrl-D PS: to leave from bash cmd use Ctrl-Z

**UPDATED

3 BAT files, startup.bat, startup2.bat, shutdown.bat

startup.bat

Run this code in your terminal
- (your directory)/startup.bat

This bat files runs the processes of
- Compose file
- Data ingestion
- Transferring of files from storage container to hdfs to spark container through volume
- Processes the data through spark container, outputs a cleaned csv
- Transfer clean csv to volume
- Start and run the visualization app container

startup2.bat

Run this code in a separate terminal from startup.bat, as previous terminal is running streamlit
- (your directory)/startup.bat

This bat file runs the processes of
- Transferring output file from volume to visualapp container

Open localhost:8501 for streamlit
shutdown.bat

Run this code in the first terminal
- (your directory)/shutdown.bat

This bat file runs docker-compose down to clear the containers and cleans up docker.

**GUI

main.py cannot run, mainV2.py can run, but format is different from process.py, so process.py need to change
find the function "def getHist(self)" and change the directory of the command to your own spark command. Using old csv first to test because of 1) do the same for the docker-compose files 'def startpage()', 'def closeEvent()', 'def stopCommand()'. Have to do these for prototype first.
Make sure your host machine has python and pyqt5 installed.
- pip install pyqt5
- pip install pyqt5-tools
python testui.py ( to run )
'Start' will take awhile to load after clicked, starting containers, will put a loading bar in the future
'Main Page' will bring you to main page without starting containers
So far only historical data is done, archive in historical data not done yet.
docker-compose up first to download the images first

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
FypApp		FypApp
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisite

Data Source

Data Storage

Batch Processing

About

Releases

Packages

Contributors 2

Languages

chanzs415/Changes

Folders and files

Latest commit

History

Repository files navigation

Prerequisite

Data Source

Data Storage

Batch Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages