Theory, Hand-ons and 200 Practice Exam QnA β All Hands-Ons in 1-Click Copy-Paste Style, All Material in Downloadable PDF
What you will learn
Designing data processing systems
Building and operationalizing data processing systems
Operationalizing machine learning models
Ensuring solution quality
Designing data pipelines
Designing a data processing solution
Migrating data warehousing and data processing
Building and operationalizing storage systems
Building and operationalizing pipelines
Building and operationalizing processing infrastructure
Leveraging pre-built ML models as a service
Deploying an ML pipeline
Measuring, monitoring, and troubleshooting machine learning models
Designing for security and compliance
Ensuring scalability and efficiency
Ensuring reliability and fidelity
Ensuring flexibility and portability
Description
Designing data processing systems
Selecting the appropriate storage technologies. Considerations include:
βΒ Mapping storage systems to business requirements
βΒ Data modeling
βΒ Trade-offs involving latency, throughput, transactions
βΒ Distributed systems
βΒ Schema design
Designing data pipelines. Considerations include:
βΒ Data publishing and visualization (e.g., BigQuery)
βΒ Batch and streaming data (e.g., Dataflow, Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Pub/Sub, Apache Kafka)
βΒ Online (interactive) vs. batch predictions
βΒ Job automation and orchestration (e.g., Cloud Composer)
Designing a data processing solution. Considerations include:
βΒ Choice of infrastructure
βΒ System availability and fault tolerance
βΒ Use of distributed systems
βΒ Capacity planning
βΒ Hybrid cloud and edge computing
βΒ Architecture options (e.g., message brokers, message queues, middleware, service-oriented architecture, serverless functions)
βΒ At least once, in-order, and exactly once, etc., event processing
Migrating data warehousing and data processing. Considerations include:
βΒ Awareness of current state and how to migrate a design to a future state
βΒ Migrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)
βΒ Validating a migration
Building and operationalizing data processing systems
Building and operationalizing storage systems. Considerations include:
βΒ Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Datastore, Memorystore)
βΒ Storage costs and performance
βΒ Life cycle management of data
Building and operationalizing pipelines. Considerations include:
βΒ Data cleansing
βΒ Batch and streaming
βΒ Transformation
βΒ Data acquisition and import
βΒ Integrating with new data sources
Building and operationalizing processing infrastructure. Considerations include:
βΒ Provisioning resources
βΒ Monitoring pipelines
βΒ Adjusting pipelines
‘;
}});
βΒ Testing and quality control
Operationalizing machine learning models
Leveraging pre-built ML models as a service. Considerations include:
βΒ ML APIs (e.g., Vision API, Speech API)
βΒ Customizing ML APIs (e.g., AutoML Vision, Auto ML text)
βΒ Conversational experiences (e.g., Dialogflow)
Deploying an ML pipeline. Considerations include:
βΒ Ingesting appropriate data
βΒ Retraining of machine learning models (AI Platform Prediction and Training, BigQuery ML, Kubeflow, Spark ML)
βΒ Continuous evaluation
Choosing the appropriate training and serving infrastructure. Considerations include:
βΒ Distributed vs. single machine
βΒ Use of edge compute
βΒ Hardware accelerators (e.g., GPU, TPU)
Measuring, monitoring, and troubleshooting machine learning models. Considerations include:
βΒ Machine learning terminology (e.g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)
βΒ Impact of dependencies of machine learning models
βΒ Common sources of error (e.g., assumptions about data)
Ensuring solution quality
Designing for security and compliance. Considerations include:
βΒ Identity and access management (e.g., Cloud IAM)
βΒ Data security (encryption, key management)
βΒ Ensuring privacy (e.g., Data Loss Prevention API)
βΒ Legal compliance (e.g., Health Insurance Portability and Accountability Act (HIPAA), Childrenβs Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))
Ensuring scalability and efficiency. Considerations include:
βΒ Building and running test suites
βΒ Pipeline monitoring (e.g., Cloud Monitoring)
βΒ Assessing, troubleshooting, and improving data representations and data processing infrastructure
βΒ Resizing and autoscaling resources
Ensuring reliability and fidelity. Considerations include:
βΒ Performing data preparation and quality control (e.g., Dataprep)
βΒ Verification and monitoring
βΒ Planning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)
βΒ Choosing between ACID, idempotent, eventually consistent requirements
Ensuring flexibility and portability. Considerations include:
βΒ Mapping to current and future business requirements
βΒ Designing for data and application portability (e.g., multicloud, data residency requirements)
βΒ Data staging, cataloging, and discovery