Activity Prediction
Problem Statement
Designing a service that predicts user activities like running, walking, sitting etc. from smartphone sensors data.
Client is an electronics company who wants to develop a smartphone app to predict user’s activity.
Data
The data consists of sensor’s data consisting of users performing activities of daily living, a 561-feature vector with time and frequency domain variables and a ground truth label of performing activity.
DATA PIPELINE
Requirements
Data stored in csv files stored in AWS S3 Bucket
Sensor’s data consists of streaming data so appropriate pipeline for it should be created.
DATA PIPELINE ARCHITECTURE
Previous data stored in csv files are used to build the initial pipeline model.
Sensor's Data from Smartphone are connected to Apache Kafka Stream which takes the data and puts in in collection in Mongo DB
MODEL TRAINING PIPELINE
Requirements
Create a model to predict activities
MODEL TRAINING ARCHITECTURE
Data Ingestion Component: - Responsible for ingesting the data from Mongo DB to our application and Split into Train, Validation, Test parts.
Data Validation Component: - Responsible for taking ingested data and checking validity (Data Missing, Numerical Columns Checking, Drift Calculation).
Data Transformation Component: - Responsible for creating and saving a transformation pipeline ( Imputer, Scaling, Dimensionality Reduction) also Performs Target Encoding.
Model Trainer Component: - Uses a Linear SVM for Classification to train, evaluate train metric and save model, reports.
Model Evaluation Component: - Evaluates the model on Test Data and save reports.
Model Pusher Component: - Stores the best model to be used in production.
Model Prediction Component: - This part is used for prediction of Input Data.
For Model Deployment Docker File is created and deployed in an AWS EC2 machine using AWS ECR and GitHub Actions for CI/CD Pipeline.