Scheduling a data pipeline

Batch data pipelines can be set to run on a specified schedule and frequency, such as every 4 hours or weekly on Monday at 1:30 AM. 

After you create and deploy a pipeline, you can create a schedule.

Note: The timezone for schedules is UTC.

Creating a schedule

To schedule a pipeline, complete the following steps:

  1. From the Deploy Pipeline page, click on the Schedule button.

     

  2. Configure the schedule. The schedule can be configured using the Basic or Advanced interface. The Basic interface allows you to set:

  • Frequency

  • Start time (and date, if needed)

  • Max concurrent runs: Up to 10. If the max number of runs is already running, the scheduled run will be skipped.

  • Compute profile (optional): If no profile is set for the schedule, the default Dataproc profile is used.

3. After all options are selected, click on Save and Start Schedule to save and start the schedule or Save Schedule to save the schedule without starting it. To start saved schedules, click Schedule and then click Start Schedule.

Alternatively, the Advanced interface can be used to create a schedule using Cron syntax.

After you create a schedule, it can be modified, started, or suspended. From the Deploy Pipeline page, after you Start a schedule, you can click the Unschedule button to Suspend the schedule. 

Likewise you can edit a schedule by clicking the Schedule button and changing the schedule properties.

Created in 2020 by Google Inc.