Activity Prediction

The aim of this project is to predict human activities such as walking, running, standing, sitting from smartphone sensors data.

Problem Statement

Designing a service that predicts user activities like running, walking, sitting etc. from smartphone sensors data.

Client is an electronics company who wants to develop a smartphone app to predict user’s activity.

Data

The data consists of sensor’s data consisting of users performing activities of daily living, a 561-feature vector with time and frequency domain variables and a ground truth label of performing activity.

DATA PIPELINE

Requirements

Data stored in csv files stored in AWS S3 Bucket

Sensor’s data consists of streaming data so appropriate pipeline for it should be created.

DATA PIPELINE ARCHITECTURE

Previous data stored in csv files are used to build the initial pipeline model.

Sensor's Data from Smartphone are connected to Apache Kafka Stream which takes the data and puts in in collection in Mongo DB

MODEL TRAINING PIPELINE

Requirements

Create a model to predict activities

MODEL TRAINING ARCHITECTURE

Data Ingestion Component: - Responsible for ingesting the data from Mongo DB to our application and Split into Train, Validation, Test parts.

Data Validation Component: - Responsible for taking ingested data and checking validity (Data Missing, Numerical Columns Checking, Drift Calculation).

Data Transformation Component: - Responsible for creating and saving a transformation pipeline ( Imputer, Scaling, Dimensionality Reduction) also Performs Target Encoding.

Model Trainer Component: - Uses a Linear SVM for Classification to train, evaluate train metric and save model, reports.

Model Evaluation Component: - Evaluates the model on Test Data and save reports.

Model Pusher Component: - Stores the best model to be used in production.

Model Prediction Component: - This part is used for prediction of Input Data.

For Model Deployment Docker File is created and deployed in an AWS EC2 machine using AWS ECR and GitHub Actions for CI/CD Pipeline.

All code was written using Python and the mentioned technologies / frameworks following OOPs Principles and guidelines.

Check out code down here

https://github.com/bsb4018/activity_pred_main_proj