An overview of targets (2024)

An overview of targets (1)

Source: vignettes/overview.Rmd

overview.Rmd

This vignette is a high-level overview of targets andits educational materials. The goal is to summarize the major featuresof targets and direct users to the appropriate resources.It explains how to get started, and then it briefly describes eachchapter of the usermanual.

What is targets?

The targets R package is a Make-like pipeline toolkitfor Statistics and data science in R. targets acceleratesanalysis with easy-to-configure parallel computing, enhancesreproducibility, and reduces the burdens of repeated computation andmanual data micromanagement. A fully up-to-date targetspipeline is tangible evidence that the output aligns with the code anddata, which substantiates trust in the results.

How to get started

The top of the referencewebsite links to a number of materials to help new users startlearning targets. It lists online talks, tutorials, books,and workshops in the order that a new user should consume them. The restof the main page outlines a more comprehensive list of resources.

The walkthrough

The user manualstarts with a walkthroughchapter, a short tutorial to quickly started with targetsusing a simple example project. That project also has a repository withthe source code and an RStudio Cloud workspacethat lets you try out the workflow in a web browser. Sign up for a freeRStudio Cloud account, click on the link, and try out functionstar_make() and tar_read() in the Rconsole.

Help

The helpguide explains how to best get help using targets,including reproducible examples and where to post.

Debugging

The debuggingchapter describes two alternative built-in systems fortroubleshooting errors. The first system uses workspaces, which let youload a target’s dependencies into you R session. This way is usuallypreferred, especially with large pipelines on computing clusters, but itstill may require some manual work. The second system launches aninteractive debugger while the pipeline is actually running, which maynot be feasible in some situations, but can often help you reach theproblem more quickly.

Functions

targets expects users to adopt a function-oriented styleof programming. User-defined R functions are essential to express thecomplexities of data generation, analysis, and reporting. The user manual has a wholechapter dedicated to user-defined functions for data science, and itexplains why they are important and how to use them intargets-powered pipelines.

Target construction

The targetconstruction chapter explains best practices for creating targets:what a good target should do, how much work a target should do, andguidelines for thinking about side effects and upstream dependencies(i.e.other targets and global objects).

Packages

The packageschapter explains best practices for working with packages intargets: how to load them, how to work with packages asprojects, target factories inside packages, and automaticallyinvalidating targets based on changes inside one or more packages.

Projects

The projectschapter explains best practices for working withtargets-powered projects: the recommended file structure,recommended third-party tools, multi-project repositories, andinterdependent projects.

Data and files

The chapter at https://books.ropensci.org/targets/data.html describeshow the targets package stores data, manages memory, allows you tocustomize the data processing model. When a target finishes runningduring tar_make(), it returns an R object. Those returnvalues, along with descriptive metadata, are saved to persistent storageso your pipeline stays up to date even after you exit R. By default,this persistent storage is a special _targets/ foldercreated in your working directory by tar_make(). However,you can also interact with files outside the data store and send targetdata to the cloud.

Literate programming

The chapter at https://books.ropensci.org/targets/literate-programming.htmlcovers literate programming: how to render an R Markdown or Quartoreport as part of a targets pipeline. A report can dependon other targets and take advantage of long computation alreadycompleted upstream.

Distributed computing

targets is capable of distributing the computation in apipeline across multiple cores of a laptop or multiple jobs on acomputing cluster. The orchestration and scaling mechanisms areautomatic, and only high-level configuration is required. Visit https://books.ropensci.org/targets/crew.html to learnmore. Configuration happens through the crew package: https://wlandau.github.io/crew/. The appendix at https://books.ropensci.org/targets/hpc.html describeshow to use targets with legacy backendsclustermq and future.

Performance

https://books.ropensci.org/targets/performance.htmlexplains how to monitor the progress of a running pipeline and optimizeyour pipeline for performance. targets haseasy-to-configure efficiency settings at the level oftar_target() and tar_option_set().

Dynamic branching

Sometimes, a pipeline contains more targets than a user cancomfortably type by hand. For projects with hundreds of targets,branching can make the _targets.R file more concise and easier to readand maintain. Dynamic branching is a way to create new targets while thepipeline is running, and it is best suited to iterating over a largernumber of very similar tasks. The dynamicbranching chapter outlines this functionality, including how tocreate branching patterns, different ways to iterate over data, andrecommendations for batching large numbers of small tasks into acomfortably small number of dynamic branches.

Static branching

Staticbranching is the act of defining a group of targets in bulk beforethe pipeline starts. Whereas dynamic branching uses last-minutedependency data to define the branches, static branching usesmetaprogramming to modify the code of the pipeline up front. Whereasdynamic branching excels at creating a large number of very similartargets, static branching is most useful for smaller number ofheterogeneous targets. Some users find it more convenient because theycan use tar_manifest() and tar_visnetwork() tocheck the correctness of static branching before launching the pipeline.Read more about it in the static branchingchapter.

An overview of targets (2024)
Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 6100

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.