Skip to content

Latest commit

 

History

History

car-processing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Car Counter DataFlow

The car processing dataflow reads car events from the cars and licenses topics and performs a series of operations:

  • Identify speeding vehicles (filter-map).
  • Divide the events based on location (split).
  • Count the cars based on color (tumbling windows).
  • Identify stolen license plates (ref state).

The following diagram is a visual representation generated by sdf:

Event Generation

There are two event types generated by a couple of HTTP-source connectors streaming synthetic data from data.infinyon.com.

Car Events Connector

The first connector reads car events at a rate of 2 events per second. The events model two semaphores, Saratoga and Sunnyvale, and produce the following information:

{
  "maker": "Fiat",
  "model": "124 Spider",
  "category": "Convertible",
  "color": "beige",
  "license": "7WPK493",
  "fuel": "gasoline",
  "location": "Saratoga",
  "mph": 43,
  "timestamp": "2024-03-14T16:12:10.493Z"
}

This connector sends events to cars topic.

License Plates Connector

The second connector reads car licenses every hour to emulate a database read of license places from an official organization such as the Department of Motor Vehicles (DMV).

[
  {
    "maker": "Toyota",
    "model": "Camry",
    "license": "0VTI452"
  },
  {
    "maker": "Ford",
    "model": "Fusion",
    "license": "7YHM485"
  },
  ...
]

The connector sends events to licenses topic.

Step-by-step

Take a look at the dataflow.yaml to get an idea of what we're doing.

Make sure to [Install SDF Start a Cluster].

1. Run the Dataflow

Use sdf command line tool to run the dataflow:

sdf run --ui
  • Use --ui to generate the graphical representation and run the Studio.
  • Using sdf run as opposed to sdf deploy will run the dataflow with an ephemeral worker, which will be cleaned up on exit.

Note: The check-license-plates service uses Polars, a library that has not yet been optimized, and will it make take a while the compile. We'll optimize this in a future release.

2. Start the connectors:

In a new terminal change direcotry to ./connectors, download the connector binary, and start 2 connectors:

cd ./connectors
cdk hub download infinyon/[email protected]
cdk deploy start --ipkg infinyon-http-source-0.4.3.ipkg -c license-connector.yaml
cdk deploy start --ipkg infinyon-http-source-0.4.3.ipkg -c car-connector.yaml

To see the events, run fluvio consume cars.

For additional context, checkout connectors.

3. Check the results

In the sdf terminal, checkout the state maintained by the dataflow:

show state

We are intersted in the following:

 Namespace                                  Keys  Type
 save-license-plates/licence-plates/state   100   u32
 count-by-color/count-by-color/state        5     u32
 count-by-color/count-by-color/watermark    1     timestamp

License Plates State

License Plates state stores the license plates read from the licenses topic. This is used to check for stolen plates later in the dataflow.

show state save-license-plates/licence-plates/state
 Key      maker          model
 0FDR715  Buick          Encore
 0FJV738  Ford           Explorer
 0FQP572  Tesla          Model S
 0GFX265  Volvo          V90
 ...

Count by Color State

Count by color state keep strack of the number of cars for each color.

show state count-by-color/count-by-color/state

You'll notice the number of cars for each color in the active window. This window is still collecting data and it has not yet been closed and flushed.

 Key     Window                                     Value
 beige   2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  6
 black   2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  4
 green   2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  3
 grey    2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  4
 orange  2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  2
 perl    2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  4
 red     2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  2
 white   2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  4
 yellow  2024-04-16T01:38:00Z:2024-04-16T01:38:30Z  3

The window is flushed to the car-colors topic every 30 seconds.

Watermark

Watermark tracks the cutoff timestamp for tumbling windows.

show state count-by-color/count-by-color/watermark
 Key      Value                     Window
 default  2024-03-15T13:44:30.043Z  *

This is used to determine when to flush window state for the tumbling window that counts the cars by color, flushing state to the 'car-colors' topic.

Car Makers

Each car event triggers an http call-out to gather additional information about the car maker. Consume from the makers topic to see the results:

fluvio consume makers -Bd -O json

The maker continent is returned:

{"continent":"Europe","country":"Germany","maker":"Audi"}
{"continent":"Europe","country":"United Kingdom","maker":"McLaren"}
{"continent":"Europe","country":"Italy","maker":"Fiat"}
...

Speeding Cars

Consume from speeding to see all cars driving faster than 45 mph:

fluvio consume speeding -O json
Consuming records from 'speeding'
{
  "color": "beige",
  "license": "6ZVM807",
  "location": "Sunnyvale",
  "maker": "Lincoln",
  "model": "Navigator",
  "mph": 63,
  "timestamp": "2024-04-20T17:07:09.851Z"
}
...

Hit to exit.

Split Traffic

The dataflow splits the traffic into 2 locations, sunnyvale, and saratoga. Consume from sunnuvale to see the cars:

fluvio consume sunnyvale
Consuming records from 'sunnyvale'
{"car":"Tesla Model Y","color":"yellow","location":"Sunnyvale"}
{"car":"Smart Fortwo","color":"yellow","location":"Sunnyvale"}
{"car":"Honda Accord","color":"orange","location":"Sunnyvale"}
...

Hit to exit.

Consume from Saratoga to see the cars:

fluvio consume saratoga
Consuming records from 'saratoga'
{"car":"Subaru BRZ","color":"green","location":"Saratoga"}
{"car":"Mazda Mazda3","color":"beige","location":"Saratoga"}
{"car":"Chevrolet Blazer","color":"beige","location":"Saratoga"}
...

Hit to exit.

Count Cars

Count all cars seen at the Sunnyvale location by colors. As mentioned above, the cars are counted and flushed to the car-colors topic every 30 seconds.

fluvio consume car-colors -B -O json
Consuming records from 'car-colors'
[
  {
    "color": "perl",
    "count": 3
  },
  {
    "color": "green",
    "count": 2
  },
  ...

Hit to exit.

Detect Annomalies

The dataflow also detects anomalies by check if a license plate is seen in 2 different cars. To demonstrate this functionality, we are ingesting a data set that send an annomaly every 10 records (~5 sec).

fluvio consume violations -B -O json
Consuming records from 'violations'
{
  "license": "1KFR357",
  "owner": "Maserati Quattroporte",
  "violator": "Chevrolet Equinox"
}
{
  "license": "0FQP572",
  "owner": "Land Rover Discovery",
  "violator": "Tesla Model S"
}
{
  "license": "2RFQ375",
  "owner": "Audi Q7",
  "violator": "Infiniti Q50"
}

Hit to exit.

For simplicity, the code assumes that the first car is always the owner, and the 2nd is the violator.

Congratulations! You've successfully built and run a dataflow!

Clean-up

Note that your connectors will continue to run in the background until you shut them down:

cdk deploy shutdown --name car-connector
cdk deploy shutdown --name license-connector

Exit sdf terminal and clean-up. The --force flag removes the topics:

sdf clean --force

Dockerfile

To run the above with a container, a Dockerfile is included. The docker container will install all the required files and packages. The included shell script starts a fluvio cluster as well as the connectors required for the car dataflow to work. You can alter the consume command in the run.sh to view different topics. The dataflow does take a long time to run.

docker build -t carproc .
docker run carproc

Expected Output

📝 Running pre-flight checks
🎉 All checks passed!
🎯 Successfully installed Local Fluvio cluster
📝 Running cluster status checks with profile local
... checking package
Log file: /workspace/connectors/license-connector.log
Connector runs with process id: 54
Started connector `license-connector`
... checking package
Log file: /workspace/connectors/car-connector.log
Connector runs with process id: 68
Started connector `car-connector`
Waiting for SDF to Start...
Welcome to SDF
Running SDF
{"color":"orange","license":"6WLM817","location":"Saratoga","maker":"Mercedes-Benz","model":"GT AMG","mph":63,"timestamp":"2024-08-14T19:22:49.722Z"}
{"color":"beige","license":"6FKL683","location":"Sunnyvale","maker":"Mini","model":"Cooper","mph":52,"timestamp":"2024-08-14T19:22:51.222Z"}
{"color":"perl","license":"1KFR357","location":"Saratoga","maker":"Chevrolet","model":"Equinox","mph":50,"timestamp":"2024-08-14T19:22:52.222Z"}
(...)

Note: docker has a tendency to use a lot of space for volumes. If you ever end up using a lot of disk space, run the following command to clean out past containers

docker system prune -a --volumes -f

Visualizer

Packed in the repo is a demo in TS with express + fluvio's npm package. The demo dumps all the topics associated with the dataflow. As topics get populated, the express server updates the demo.

To Run

To run the demo, the connectors and dataflow must be running.

  1. Navigate to the visualizer after starting the required clusters and dataflow.
cd visualizer
  1. Install all the packages.
npm install
  1. Start the express server with node
node app.js
  1. Open the web application via a browser. Should be localhost:3000