Picturing Carbon Footprints - Projekt @ TechLabs Münster

Picturing Carbon Footprints

Inspiring Environmental Awareness with Innovative Use of AI Analysis

Climate change is humanity's biggest issue. The goal of our project was to make the information of a person's impact on the environment easily accessible and to provide immediate suggestions for improvement. So, we developed an app that estimates and comprehensively visualizes the carbon footprint before allowing the user to interactively explore how changes in their behavior could improve their score. To determine an individual’s carbon footprint, we estimated models based on a dataset of 10,000 simulated subjects. These models take various factors into account, including spending behavior, demographics, and travel habits. The app offers the user several methods to estimate their carbon emissions: The first applies a multiple linear regression and includes all variables. The second model provides a quick estimation of the carbon footprint via a Light Gradient Boosting Machine (LGBM) model requiring only seven questions to be answered. Furthermore we included an image classifier to determine the mode of transportation from an uploaded picture to be used in the seven-question-model. Lastly, we also explored predicting the carbon footprint using a decision tree to create an offline version of the questionnaire.

When you don’t know what to do, do something. I know you don’t know what to do and I know you can’t do everything right now, but you can do something.”
– Andy Andrews (New York Times best-selling author)

Climate change is one of, if not the most pressing issue humanity has to face. Solving this problem can only be achieved with cooperation (Kutner et al., 2021). While there are many attempts at reducing carbon emission to fight climate change (Fawzy et al., 2020) there also are many differences between countries (Richie & Roser, 2020) and even individuals (Barros & Wilk, 2021; Lee et al., 2021). As a single individual, you might ask yourself, what can I even do?

The individual carbon footprint is a useful tool to visualize a person’s environmental impact that is not only dependent on direct carbon emission but also considers spending behavior and waste produced (Lee et al., 2021). Calculating the carbon footprint can show people how they contribute to global carbon emission and lets them compare themselves to others as well as the global average. Additionally, it can raise awareness not only to climate change in general but also maybe more indirect impact factors, like internet usage (Jack et al., 2023). Thus, we suggest that making the calculation of the carbon footprint conveniently accessible and the results engaging and easy to understand is a step towards the societal rethinking necessary to tackle climate change. Raising awareness could be especially important in rapidly growing economies, like for example India, that might have decent carbon emission now (Ritchie et al., 2023), but their economic growth will also see their individual emissions skyrocket in the next few years (Lee et al., 2021). The carbon footprint calculation is however not without its flaws. It is criticized to turn the focus from the far more serious impact of industry to the individuals (McManus, 2022) and was shown to leave especially people with high carbon footprints resigned and frustrated (Jack et al., 2023). We seek to change that by not simply visualizing the carbon footprint but also providing alternatives that can be intuitively explored.

Consequently, the objective of this project was twofold: First, we wanted to create a simple tool, so everybody can find out what their personal carbon footprint looks like and how their current behavior could influence the environment in the long term. Secondly, to not leave the users hopeless, we sought to directly demonstrate how changing one’s behavior affects their emissions.

In practice, our project had four major aspects:

Predicting the carbon footprint from personal data that can quickly be entered into a survey
Visualizing the carbon footprint in an easily understandable but very striking way
Identifying areas of improvement and directly visualizing the impact of implementing these
alternatives
Combining all these aspect in one easy-to-use application
The following sections will detail how we tackled and, in the end, successfully accomplished all of
these tasks.

The data

To predict an individual carbon footprint from daily behavior we needed to estimate the effect different behaviors would have on the carbon footprint. Thus, we required data to determine this relationship. The data we used for this project came from a kaggle challenge on calculating the individual carbon footprint (https://www.kaggle.com/datasets/dumanmesut/individual-carbon-footprint-calculation).

Please note, however, that the data was simulated and did not stem from an investigation of an actual sample. Nevertheless, the dataset contained data on the demographic (body type, sex, diet), the daily life and activities (shower habits, social activities, spending behavior, produced waste and recycling, internet and tv usage), the energy efficiency of their household (heating source, household appliances and their efficiency), the means of transportation (mode of transportation, distance, air travel) as well as the dependent variable of carbon emission in kg CO2 emission per month of 10,000 individuals.

Exploratory Data Analysis

We found a strong correlation (r = .78) between car usage and kilometers traveled by vehicle in the last month. Both variables also show strong correlations with the target variable carbon emission (r = .59 and r = .49, respectively). However, since all correlations were below the threshold signaling multicollinearity (r > 0.80; Field, 2024) we continued with the analysis.

Data processing

Handling Missing Data: We created a new variable by combining two existing ones to handle the NaN values.
Encoding: To handle the different types of variables, we utilized the
‘ColumnTransformer’ from scikit-learn. For the linear regression, we dummy encoded
the categorical variables (OneHotEncoder) and scaled the discrete variables to a range
between 0 and 1 (MinMaxScaler). For the Decision Tree model, we mapped ordinal
variables to numerical values based on their logical order. Notably, LightGBM efficiently
handles categorical encoding internally, eliminating the need for one-hot encoding. The
encoders were exported for use in Streamlit.
Data Validation: While carefully examining the dataset, we identified that "cooking with a
grill" always coincided with "cooking with an airfryer", suggesting a possible unintended
linkage in the simulation. Consequently, we excluded airfryer, as this redundancy did not
contribute additional information.

Modelling

Linear Regression: Testing the model’s assumptions we identified problems with heteroscedasticity caused probably by a bias for underestimating extreme values. All other assumptions were satisfied. We achieved an R² 93.3% using all 19 variables.
Decision Tree Regression: We used GridSearchCV with RepeatedKFold cross-validation to fine-tune hyperparameters and select the best model configuration. We achieved an R² 85% using all
19 variables.
LightGBM Regression Model: Answering 20 questions is time-consuming for the user, so we aimed to reduce the variables and select a better-performing model to maintain prediction accuracy.
Model Selection: We used the Lazy Predict library to quickly evaluate several machine-learning models. The LGBM regressor emerged as the top performer with R² 98% when
using all variables.
Feature Evaluation: We identified the 10 most impactful variables on Carbon Emissions using LGBM feature importance scores. We calculated the R² for all subsets of these 10 variables and selected the model that showed a good R² (~ 90%) with the fewest possible number of features. We achieved an R² 88.6% using only 6 variables (equivalent to 7 questions).
As an alternative to the questionnaire we made a printable A4-sized flowchart of a decision tree,
Decision Tree Classification: To create a binary classification model, we median split the carbon emission data. The model achieved an accuracy of 85% using 18 variables.

Image Classification

Our initial vision was to enable users to take pictures of their homes and use image classification to extract relevant variables for calculating their carbon emissions. However, many variables in our dataset could not be accurately captured through images. For example, classifying body type as 'overweight' from a photo could be uncomfortable, and variables like kilometers driven cannot be determined from images alone. Therefore, while image classification remains a key component, users must also complete a questionnaire to ensure accurate predictions. We used the `duckduckgo_search library` to search for and download images, which were then resized for efficiency. We subsequently applied Fast.ai for model training.

Questionnaire/Streamlit Application (For finished app click here)

We used the Streamlit tool in Python to create an application that collects all the necessary information from the user in a questionnaire and then applies the estimated models to the individual carbon footprint. For demonstration purposes only we included an option to choose between our different models at the start. Note, this option would not be part of a finished product. The questionnaire part uses Streamlit’s interactive input widgets (e.g., streamlit.radio) to gather the user’s carbon emission data and stores the answers within the application’s “session state” (streamlit.session_state) which works similar to a Python dictionary. The stored data is then put into a Pandas.DataFrame with the same structure as the cleaned data used for modeling. To encode and model the individual data, we import the estimated encoders and models from pkl(“pickle”)-files into the app. With their help the app can estimate the user’s carbon footprint, which is displayed in kg of CO2 emission per month.

The app then further visualizes the estimated carbon footprint: First, we depict the estimated carbon emission in comparison to the sample. We used plotly (and steamlit.plotly_chart) for creating extremely customizable and interactive plots. We considered kernel density estimate (KDE) plots to visualize the distributions of carbon footprints within different categories (e.g., different modes of transportation, see Image). Ultimately, we chose a histogram that can be dynamically explored and contains a vertical line for the user’s current carbon footprint. Secondly, we depicted how many earths are needed to recoup the produced carbon emission if everybody would have the same carbon footprint as the user. This number can be calculated by first dividing the annual carbon footprint by the carbon sequestration rate (2100 kg CO2/gha/year) to get the ecological footprint (in gha [global hectares of biological productive area] per person) and dividing that by the biocapacity of earth (1.63 gha/Person; Global Footprint Network, 2024)

As a more striking display, we also visualized the resulting amount as a number of pictures of earth. Lastly, the app gives the option to learn how to improve the carbon footprint. Therefore, we let users interactively explore how reducing any of their top seven most influential answers would affect their carbon footprint. We visualize the change again in the form of a histogram now marking the old and improved carbon footprint in comparison to the sample.

Limitations/Outlook

The biggest downside of the project is that the underlying dataset is only simulated, possibly not capturing the true effect of variables on the carbon footprint. Partially, the simulation also resulted in unrealistic data, like, e.g., the number of clothes a person buys in a month being simply drawn from an uniform distribution between 0 and 50 pieces of clothing. However, the directions of the estimated effects are still plausible. A person that travels a lot, does not recycle and buys a lot of clothes will still have a higher carbon footprint in our models than somebody who uses the bicycle and tries to minimize their waste. So, even though the absolute value of the carbon footprint might be inaccurate, as a relative measure it is valuable. Also, the suggestions for improvement will indeed improve a person's actual carbon footprint despite numeric inaccuracies. Nevertheless, to improve upon our project, we would collect data of actual individuals and calculate their impact on the environment and re-estimate the models using real data. Additionally, we would consider modifying some questions. The present variables depict a rather westernized image of carbon emission not including certain types of behavior that is relevant for the carbon footprint in other cultures. Taking India as an example, the traditional stoves, “chulha” using cow dung, have terrible carbon emission (Fleming et al., 2018) that could not be captured in this questionnaire/model. Consequently, when we want to make the carbon footprint estimation accessible to everyone the estimation needs to be adjusted to capture all possible living situations.

GitHub Repositories

Code: https://github.com/e-prossinger/CarbonFootprint_TechLab
App-Code: https://github.com/FalkMeck/CarbonFootprint_Group4

References

Barros, B., & Wilk, R. (2021). The outsized carbon footprints of the super-rich. Sustainability: Science, Practice and Policy, 17(1), 316-322. https://doi.org/10.1080/15487733.2021.1949847

Fawzy, S., Osman, A. I., Doran, J., & Rooney, D. W. (2020). Strategies for mitigation of climate change: a review. Environmental Chemistry Letters, 18, 2069-2094. https://doi.org/10.1007/s10311-020-01059-w

Field, A. (2024). Discovering statistics using IBM SPSS statistics. Sage publications limited. Fleming, L. T., Weltman, R., Yadav, A., Edwards, R. D., Arora, N. K., Pillarisetti, A., ... &

Nizkorodov, S. A. (2018). Emissions from village cookstoves in Haryana, India, and their potential impacts on air quality. Atmospheric Chemistry and Physics, 18(20), 15169-15182. https://doi.org/10.5194/acp-18-15169-2018

Global Footprint Network (2024). Retrieved from: https://data.footprintnetwork.org/_ga=2.173857696.289089879.1723995078-1940628939.1723995078#/

Jack, T., Bååth, J., Heinonen, J. T., & Gram-Hanssen, K. (2024). How individuals make sense of their climate impacts in the capitalocene: mixed methods insights from calculating carbon footprints. Sustainability Science, 19(3), 777-791. https://doi.org/10.1007/s11625-023-01435-9

Kutner, M., Mackay, B., Bills, A., Clandillon, S., Bravery, M., Bush, G., & Roberts, G. (2021).
Working together to beat the climate crisis. CDP. https://cdn.cdp.net/cdp-production/cms/reports/documents/000/005/885/original/CDP_Collaborative_City_State_Regions_Report.pdf?1633362006

Lee, J., Taherzadeh, O., & Kanemoto, K. (2021). The scale and drivers of carbon footprints in households, cities and regions across India. Global Environmental Change, 66, 102205. https://doi.org/10.1016/j.gloenvcha.2020.102205

McManus, M (2022, May 27) The ‘carbon footprint’ was co-opted by fossil fuel companies to shift climate blame – here’s how it can serve us again. The Conversation. https://theconversation.com/the-carbon-footprint-was-co-opted-by-fossil-fuel-companies-to-shift-climate-blame-heres-how-it-can-serve-us-again-183566

Ritchie, H. & Roser, M. (2020). CO₂ emissions. Our World in Data. https://ourworldindata.org/co2-emissions

Ritchie, H., Rosado, P., &Roser, M. (2023) CO₂ and Greenhouse Gas Emissions. Our World in Data. https://ourworldindata.org/grapher/co-emissions-per-capita