Last updated on June 14, 2026 by the editorial team
Author: Sai Bhargav Rallapalli
Originally published on Towards AI.
Understanding Car Pollution: A Data Science Perspective
When considering how to tackle vehicle emissions, the Global Automotive Council faces a complex dilemma. Should the focus be on fuel types, engine displacements, or vehicle classes? The answer is not straightforward. Through data science, we can uncover insights that might defy common intuition and lead to more effective strategies.
Building a Predictive Model to Analyze CO₂ Emissions
The article details the creation of a CO₂ emissions prediction model with a remarkable 98.8% accuracy. Starting with a comprehensive dataset of over 7,000 vehicles, the process involved meticulous cleaning of duplicates and an examination of the target distribution. To address multicollinearity, redundant fuel consumption columns were removed using variance inflation factor and Ridge regression, ensuring stability in the model.
Interestingly, high-emissions outliers were retained in the dataset. These “top 1 percent” vehicles are crucial for policymakers aiming to regulate emissions effectively. The analysis uncovers a significant reversal in fuel type assessments due to Simpson’s Paradox. While Ethanol (E85) appears to have higher emissions on average, controlling for engine size and fuel consumption reveals it as the cleanest fuel. This insight is often obscured because Ethanol is predominantly used in larger, more fuel-consuming engines.
The Role of Data Science in Policy Recommendations
By constructing a scikit-learn pipeline with one-hot encoding and evaluating model performance (achieving high R² with low error), the study highlights areas for policy improvement. The model’s weaknesses, particularly in rare alternative fuel categories, suggest targeted actions against super-emitters. Recommendations include linking fuel mandates to vehicle and engine constraints, providing a nuanced approach to reducing emissions.
For a comprehensive understanding, read the full blog for free on Medium.
Published via Towards AI
Empowering AI Education
We are building enterprise-grade AI solutions and teaching mastery through Towards AI Academy. With 15 engineers and over 100,000 students, our courses focus on practical, production-ready AI skills.
Begin your AI journey for free:
→ 6-Day Agentic AI Engineering Email Guide — One Practical Lesson Per Day
→ Agents Architecture Cheatsheet — 3 years of architectural decisions in 6 pages
Our course offerings:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product, offering the most comprehensive practical LLM course available.
→ Agent Engineering Course — Hands-on experience with production agent architectures, memory, routing, and evaluation frameworks, built from real-world enterprise engagements.
→ AI for Work — Learn to understand, evaluate, and apply AI in complex work tasks.
Note: The article reflects the views of the contributing authors and not necessarily those of Towards AI.
For further insights and detailed analysis, visit the full article Here.
“`

