Ends in
00
days
00
hrs
00
mins
00
secs
ENROLL NOW

🎁 Get 20% Off - Christmas Big Sale on All Practice Exams, Video Courses, and eBooks!

Data Concepts in Azure Machine Learning

Last updated on August 14, 2023

Azure Machine Learning Data Concepts

Data Concepts in Azure Machine Learning Cheat sheet

URI

  • A Uniform Resource Identifier (URI) represents a storage location on a local computer, Azure storage, or a publicly available http(s) location.

  • URIs can be used as inputs or outputs to an Azure Machine Learning job and can be mapped to the compute target filesystem in different modes: read-only mount, read-write mount, download, or upload.

  • URIs use identity-based authentication to connect to storage services, with options for Azure Active Directory ID or Managed Identity.

Data types

  • Azure Machine Learning supports three data types: File, Folder, and Table.

  • File: References a single file and can have any format.

  • Folder: References a single folder and is useful for deep-learning tasks with various file types such as images, text, audio, and video.

  • Table: References a data table and is suitable for a complex schema with frequent changes or large tabular data subsets.

Data runtime capability

  • Azure Machine Learning uses its own data runtime for mounts, uploads, downloads, and materialization of tabular data into pandas/spark.

  • The data runtime is built with Rust language for high speed and efficiency.

  • It has no dependencies on other technologies, allowing for quick installation on compute targets.

  • Tutorials dojo strip
  • It supports multi-process data loading and pre-fetching to enhance GPU utilization in deep-learning operations.

  • Provides seamless authentication to cloud storage.

Datastore

  • An Azure Machine Learning datastore is a reference to an existing Azure storage account.

  • It provides a common API for interacting with different storage types (Blob/Files/ADLS) and facilitates team operations.

  • Datastore creation and use offer easier discovery of useful datastores and secure connection information for credential-based access.

  • Authentication methods include credential-based (service principal/SAS/key) and identity-based (Azure Active Directory or managed identity).

Data asset

  • An Azure Machine Learning data asset allows users to create a reference to frequently used data sources with a friendly name.

  • Data asset creation includes metadata and a reference to the data source location without incurring extra storage costs or risking data source integrity.

  • Data assets can be created from Azure Machine Learning datastores, Azure Storage, public URLs, or local files.

 

Data splits & cross-validation (Python)

 

Data Splits

  • In Azure Automated Machine Learning, the recommended approach is to randomly split the data into training and evaluation sets based on rows.

  • The AutoMLConfig object represents the configuration for submitting an automated ML experiment in Azure Machine Learning, containing parameters and training data for the experiment run.

Provide validation data

  • Provide a separate validation set by specifying the validation data in your machine learning process to assess the model’s performance on unseen data during training.

Provide validation set size

  • Control the size of the validation set by specifying the desired percentage or number of samples to be allocated for validation to fine-tuning the model and evaluating its generalization ability.

K-fold cross-validation

  • Dividing the data into K subsets, or “folds,” and using each fold as a validation set while training on the remaining data to provide a robust evaluation by averaging the results across multiple iterations.

Monte Carlo cross-validation

  • A technique where multiple random training and validation splits are generated to mitigate bias in the model evaluation caused by a particular split.

Specify custom cross-validation data folds

  • Specify custom cross-validation data folds using CV split columns in the model configuration, giving you control over the data divisions for validation.

Metric calculation for cross-validation in machine learning

  • Calculates metrics on each validation fold and aggregates them for comprehensive model performance evaluation, ensuring reliable assessment.

 

References:

Get 20% Off – Christmas Big Sale on All Practice Exams, Video Courses, and eBooks!

Tutorials Dojo portal

Learn AWS with our PlayCloud Hands-On Labs

Tutorials Dojo Exam Study Guide eBooks

tutorials dojo study guide eBook

FREE AWS Exam Readiness Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Recent Posts

Written by: Maine Cruz

Charmaine is a DevOps engineer and a Cloud instructor at Tutorials Dojo. She is also an AWS BuildHers+ Mentor in AWS User Group Philippines. Certified in both AWS and Azure Cloud platforms. Charmaine specializes in automating solutions and CI/CD.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?