Don’t Struggle with Kaggle: Build your First Data Science Project!

Home » Others » Don’t Struggle with Kaggle: Build your First Data Science Project!

Don’t Struggle with Kaggle: Build your First Data Science Project!

Are you a beginner wanting to start your very first data science or machine learning project, but don’t have the right hardware or enough storage capacity? Well, Kaggle is the perfect platform to start your journey! 

Don’t Struggle with Kaggle: Build your First Data Science Project!

What is Kaggle?

Kaggle is a powerful web-based platform that provides opportunities for data scientists/analysts and machine learning enthusiasts to collaborate with the community, find and publish datasets, and grow their skills through competitions. 

Why Kaggle?

Just like Google Colab, this platform provides cloud-based notebooks so you can run your code directly without installing Python, Jupyter or other heavy dependencies/libraries. Kaggle also offers GPU and TPU acceleration to give high-performance access for free.

In short, Kaggle eliminates the hardware barrier that limits anyone to learn, practice and showcase their data science or machine learning skills.

Features of Kaggle

  • Notebooks

Kaggle Notebooks are complete interactive environments where you can write and execute code in Python or R. They come with pre-installed libraries such as NumPy, Pandas, TensorFlow, PyTorch, and scikit-learn.

Kaggle Notebook

  • Datasets

Kaggle hosts over 100,000 public datasets across industries from healthcare to finance to sports. You can explore, visualize, and use these datasets in your personal projects or competitions. 

Datasets 2

  • Models

Under the “Models” section, Kaggle provides a hub where users can build, train, and publish machine learning models. You can browse models built by others, reuse them, or share your own for collaboration.

Models

  • Competitions and Leaderboards

Kaggle competitions are what made the platform famous. Participants from all over the world compete to solve data-driven problems, often for cash prizes or recognition from top companies. Leaderboards let you see how your model ranks globally. This is a great way to learn from others and improve your skills.

Competitions

  • Benchmarks

Tutorials dojo strip

Kaggle Benchmarks allow you to evaluate the performance of your models against public baselines. It helps you gauge how well your approach performs compared to others in the community.

Benchmarks

  • Code 

The Code section lets you explore and fork other users’ code notebooks. This is an important resource for beginners. You can learn directly from real-world examples and adapt them for your own projects.

Codes

  • Discussion

The community. The Discussion forums are the heart of Kaggle’s collaborative environment. You can ask questions, share ideas, or seek feedback from experienced data scientists and peers.

Discussions

  • Learn

Kaggle Learn provides short, hands-on courses in Python, machine learning, data visualization, and AI fundamentals. It’s perfect for beginners who want structured, bite-sized lessons.

Learn

Walkthrough: Build Your First Project

Difficulty: Beginner

In this walkthrough, we’ll create a simple data analysis and perform a linear regression prediction, covering the basic workflow of data science.

Don’t worry if you’re a beginner in data science, machine learning, or even Python,because everything will be explained step by step.


Step 1: Create or Sign in to Your Kaggle Account

Visit Kaggle.com and either create an account or sign in to your existing one.


Step 2: Create a New Notebook

Once logged in, click “New Notebook.” This opens a cloud workspace where you can write and execute Python code directly in your browser.

Add Notebook


Step 3: Download the Dataset

We’ll use a sample dataset of restaurant sales:

🔗 Restaurant Sales Sample Dataset

You can download it via:

  • The Kaggle API, or
  • As a .zip file directly.

Step 4: Upload the Dataset

If you downloaded the dataset as a zip file, extract it and upload the .csv file to the Input section of your Kaggle Notebook. You’ll find this option in the right panel of the notebook interface.

Upload Dataset

Dataset Created


Step 5: Import Libraries and Load the Dataset

In this step, we’ll import the required libraries and load the dataset into a DataFrame. Replace the default code.

import pandas as pd
#imports the Pandas library

import matplotlib.pyplot as plt
#imports the Matplotlib plotting library's Pyplot module for creating visualizations

import seaborn as sns
#imports the Seaborn library, which provides a high-level interface for drawing attractive statistical graphics

df = pd.read_csv('/kaggle/input/restaurant-sales-sample/Restaurant_Sales - Sample.csv')
#this is just loading your dataset into a dataframe using panda’s read_csv function

Step 5.1: (Optional) Check the Dataset

You can inspect your dataset to confirm that it loaded properly. Click the play button on the top-left corner of the code cell to run it and view results.

print(df.head())
#this prints the first five rows of the dataframe,allowing you to quickly check if the data loaded correctly and see its structure.

print(df.info())
#this prints a concise summary of the dataframe

Optional 1

Optional 2


Step 6: Visualize the Revenue by Restaurant

Now that your data is ready, let’s visualize revenue by location using Seaborn. Add another code cell.

location_revenue = df.groupby('Location')['Revenue (P)'].sum().reset_index()
#this just groups the data by the 'Location' column, calculates the sum of the 'Revenue (P)' for each restaurant, and converts the result back into a DataFrame.

sns.barplot(x='Location', y='Revenue (P)', data=location_revenue)
#creates a seaborn bar graph

plt.title('Total Revenue: Jollibee vs Mcdonalds')
#gives your graph a title

plt.show()
#displays the graph

Step 7: Train a Simple Linear Regression Model

Next, we’ll train a simple linear regression model to predict a restaurant’s revenue. Add another code cell.

A Linear Regression model simply helps summarize and study the relationship between two continuous variables.

from sklearn.model_selection import train_test_split
#imports the train_test_split function from scikit-learn to split data

from sklearn.linear_model import LinearRegression
#imports the Linear Regression model class from scikit-learn

from sklearn.metrics import mean_absolute_error
#Imports the Mean Absolute Error (MAE) metric for model evaluation

x = df[['Number_of_Orders']]
#we store the Number_of_Orders column into the x variable

y = df['Revenue (P)']
#we set the revenue as the target, storing it in the y variable, because it is the value we want to predict

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
#we split the data into training (80%) and testing (20%) sets. random_state=42 ensures the split is the same every time you run the code. this is commonly used but you are free to change the integer to test.

model = LinearRegression()
#this is just creating a linear regression model by calling it

model.fit(x_train, y_train)
#in here, we train the model using the percentage allocated for training data. x_train is for features and y_train contains correct answers

y_pred = model.predict(x_test)
#now the model has been trained, we will now make predictions on the test set (remember the percentage allocated for test datasets above)

predictions_df = pd.DataFrame({'Actual_Revenue': y_test, 'Predicted_Revenue': y_pred})
print(predictions_df)
print()
#this justs display the predictions by table

mean_abs_err = mean_absolute_error(y_test, y_pred)
#this computes the mean absolute error by comparing the actual test revenue (y_test) with the model’s prediction (y_pred)

print(f'Mean Absolute Error: P{mean_abs_err:.2f}')
#this just displays mean_abs_err with 2 decimal places
Free AWS Courses

Mean Absolute Error (MAE) shows how accurate your predictions are. In this example, an MAE of ₱90.92 means the model’s predictions differ from the actual revenue by about ₱90.92 on average.

Final Output


Step 8: Congratulations!

You’ve successfully created your first data science project on Kaggle!

You can now:

  • Experiment more with your Kaggle notebook environment
  • Explore other datasets
  • Publish and share your work to your social media accounts

Try beginner competitions such as the Titanic Machine Learning Competition


Quick Recap

Here’s what you accomplished in this walkthrough:

  1. Set up your Kaggle Notebook
  2. Loaded and explored your dataset
  3. Visualized your data
  4. Trained a simple regression model
  5. Evaluated your model using Mean Absolute Error

Tip: Continue practicing by testing new datasets, adding more features, or trying different algorithms to strengthen your machine learning foundation.

Conclusion

Kaggle is more than just a data science platform. It’s an entire ecosystem for learning, experimenting, and competing. Whether you’re a beginner exploring your first dataset or an experienced developer refining your models, Kaggle provides the resources, tools, and community to help you grow, all without worrying about hardware requirements!

Resources:

Tutorials Dojo portal

Learn AWS with our PlayCloud Hands-On Labs

🧑‍💻 CodeQuest – AI-Powered Programming Labs

FREE AI and AWS Digital Courses

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Join Data Engineering Pilipinas – Connect, Learn, and Grow!

Data-Engineering-PH

Ready to take the first step towards your dream career?

Dash2Career

K8SUG

Follow Us On Linkedin

Recent Posts

Written by: DearahMaeBarsolasco

Dearah Mae Barsolasco is an AWS Certified Cloud Practitioner and a Tutorials Dojo Intern. She's also a UI/UX Design and Frontend Development enthusiast, currently pursuing her Bachelor of Science in Computer Science at Cavite State University-Main Campus. She is a one-of-a-kind driven by a commitment to share knowledge and empower women in tech.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?