>

>

>

Predicting Bus Delays

Predicting Bus Delays

Predicting Bus Delays

Enhancing Commuter Reliability through Data and Machine Learning

Enhancing Commuter Reliability through Data and Machine Learning

Inconsistent bus arrival times are a common source of frustration for commuters, leading to missed connections and prolonged travel times. Our project addresses this issue by developing a predictive model that forecasts bus delays based on various factors such as time of day, weather conditions, and the specific day. By leveraging historical data and machine learning, our solution aims to provide delay predictions, helping users plan their journeys more efficiently. This not only enhances the commuting experience but also promotes the use of public transportation, which is crucial for reducing traffic congestion and lowering the environmental impact of urban mobility.

Introduction

Public transportation is a cornerstone of sustainable urban mobility, yet its reliability is often challenged by unpredictable delays.

In Münster, bus line 2 serves as a critical route for many commuters, but frequent delays caused by various factors have been a persistent issue. Our project set out to tackle this problem by developing a predictive model that can forecast potential delays. The goal was to create an MVP that could be tested and iterated upon, ultimately helping commuters better plan their journeys and reducing the uncertainty in their daily travel.

Methodology

We began our project by analyzing and processing historical data on the busses in Münster. This was data provided by our mentor, Tom, which detailed when each bus arrived at each station throughout the months of December (2023) and January (2024). These months were especially relevant to us, as this is the time of year that typically sees a lot of delays due to weather conditions and holidays. We compared this to the scheduled bus times collected from the website of the Stadt Münster. Further data we used included weather data for the specific times from open meteo.


Cleaning and preprocessing the data to ensure its usability was a big part of our project and we invested a significant amount of time into it. The knowledge we gained from the learning track enabled us to code this in Google Colab, using pandas to create a single dataframe with our essential data, including type of day, the time, and the type of weather as well as the delay. Due to our lack of experience with coding we encountered some challenges, but with the help of our mentors managed to overcome them.

We then explored various regression models and machine learning techniques to identify the most effective method for predicting bus delays. After carefully analyzing our data and considering the unique challenges of this problem, we decided to use the Random Forest model.

To gain deeper insights into our model's performance and understand which factors most significantly impact bus delays, we created several datasets. Initially, we used a dataset that included only the time of day and the corresponding delay, but this model showed poor performance, as indicated by a high Mean Squared Error (MSE).

Next, we examined how the type of day (Weekday, Saturday, Sunday) affected delay predictions. We then categorized the day into segments like Rush Hour 1, Lunch Break, and Rush Hour 2, and tested the model again. This approach alreadyshowed improved performance, which was in line with our expectations. Finally, we included weather conditions (rain, snow, or dry) in the dataset, leading to the best performance and the lowest MSE of all our models.

As a last step, we proceeded to develop an initial user interface with gradio that showcases the functionality of our predictive tool, allowing users to input their location, destination, and current conditions to receive delay predictions in real-time.

Project Results

The MVP we developed is a functional predictive tool that enables users to anticipate delays on bus line 2 based on real-time inputs. While the model is still in its early stages, initial tests have shown promising results, demonstrating the potential of our approach to improve the reliability of public transportation in Münster. The next steps involve further refining the model with more data, expanding its applicability to other routes, and enhancing the user interface to make it more accessible and user-friendly.

Relevance to TechLabs Mission

This project aligns perfectly with TechLabs e.V.'s mission to foster digital innovation with a positive social and environmental impact. By addressing a common public transportation issue through data-driven solutions, we hope to contribute to a more reliable and efficient urban mobility system. Our work can help enhance the daily lives of commuters and support broader sustainability goals by encouraging the use of public transit, thereby reducing traffic congestion and environmental impact.

We hope that through the power of data and the skills we've gained here, we can continue to build a future where technology drives positive change in our communities.

We are deeply grateful to TechLabs for the opportunity to learn and for the support we have received throughout this journey!

Team & Rollen

Henning Ebner

Datascraping and processing, coordination, regression model, interface, blogpost

Luis Burges

Datascraping and processing, coordination, regression model, interface, blogpost

Alina Kallenbach

Datascraping and processing, coordination, regression model, interface, blogpost

Mentor:in

Thomas Viehmann

Unsere Partner

Unsere Partner

Unsere Partner