[R Notes] Thorough Analysis of rlang tidyeval with Examples

Published Jul 17, 2022
Updated Nov 3, 2025
13 minutes read
Note

This old post is translated by AI.

###Introduction

Last time I wrote a tidyeval explanation article for people who aren't hardcore R programmers.

This time, I'll attempt to understand tidyeval with concrete examples! 🧑‍🎓 I'll explain by patterns thinking "maybe it can be categorized like this."

Like last time, I'll proceed with the policy that most people don't need to understand everything about tidyeval, and if you just use the parts you need, it's not that hard to understand.

In the previous article I explained the image of what tidyeval does, so if you haven't read it yet, reading it together will make understanding easier 😃

This article uses the starwars dataset included with dplyr and rlang package's tidyeval API. ℹ️ Don't forget to load the packages ℹ️ library(dplyr) library(rlang)

##📗 Patterns Using tidyeval (Single Variable Case)

Tidyeval is broadly divided into whether you pass a single variable for NSE or multiple variables for NSE.

This chapter further divides single variable cases into the following patterns.

If you just want to quickly know how to use it, just learn the {{}} (curly-curly bracket) method.

For those who want to know the principles, I've also written supplementary information, so please use it as reference!

  • Receive symbol, unquote (!!) then pass to function
    • {{}}
    • enquo() & !!
  • Receive symbol, unquote (!!) then make it function argument name
    • {{}} :=
  • Receive expression, unquote (!!) then pass to function
    • {{}}
    • enexpr() & !!
    • enquo() & !!
  • Receive string, convert to symbol, then unquote (!!) and pass to function
    • ensym() & !!

###📗 Receive symbol🌟, unquote (!!) then pass to function

This is probably the pattern you'll use most often 😐

For explanation purposes, I'll show the method using enquo() before the more commonly used {{}}.

myfunction <- function(df, var) {
    var_quo <- enquo(var)
    # print(var_quo)
    # <quosure>
    # expr: ^name
    # env:  global
    df |>
        select(!!var_quo)
}
myfunction(starwars, name) # Provide `name` as symbol
 
# # A tibble: 87 × 1
#    name
#
#  1 Luke Skywalker
#  2 C-3PO
#  3 R2-D2
#  4 Darth Vader
#  5 Leia Organa
#  6 Owen Lars
#  7 Beru Whitesun lars
#  8 R5-D4
#  9 Biggs Darklighter
# 10 Obi-Wan Kenobi
# # … with 77 more rows

Explaining the operations happening here in detail:

  • Give name as a symbol to the var argument
  • Convert to quosure with enquo()
  • Unquoting with !! turns var into name
  • starwars |> select(name) is evaluated

Functions like dplyr::select() and dplyr::group_by() evaluate variables in the tibble context, enabling NSE.

In other words, you need to convert to quosure form first, and unquote just before being evaluated inside the select() function.

Using {{}} works exactly the same

{{}} is a shorthand for the flow of "enquo() then !!" (!!enquo(var)). Actually it supports other operations too, so it's become a magic-like operation🪄.

When writing code using tidyeval in practice, I think using {{}} is standard.

So I'll make this important tidyverse pattern #1 👊

myfunction <- function(df, var) {
    df |>
        select({{ var }})
}
myfunction(starwars, name)
 
# # A tibble: 87 × 1
#    name
#
#  1 Luke Skywalker
#  2 C-3PO
#  3 R2-D2
#  4 Darth Vader
#  5 Leia Organa
#  6 Owen Lars
#  7 Beru Whitesun lars
#  8 R5-D4
#  9 Biggs Darklighter
# 10 Obi-Wan Kenobi
# # … with 77 more rows

The tidyverse style guide recommends putting spaces inside {{}}. This is to explicitly show it's a "special behavior."

{{}} supports various operations and frees you from the hassle of confusing !!. However, rather than having a consistent function, it's more like multiple functions crammed into one {{}} 🛠️ If you want minimal understanding, I think just knowing {{}} is fine, but if you want to understand from principles, definitely understand the enquo() and !! methods.

###📗 Receive symbol🌟, unquote (!!) then make it function argument name

What I call function argument name here is the left side of = when doing new column name = in dplyr::mutate() or dplyr::transmute(). Since tidyeval with a quosure in a function argument name is functionally impossible with the assignment operator, you'll use the := operator provided by rlang.

myfunction <- function(df, var) {
    quo_var <- enquo(var)
    df |>
        mutate(!!quo_var := name, .before = name)
}
myfunction(starwars, newname)
 
 
# # A tibble: 87 × 15
#    newname      name  height  mass hair_color skin_color
#
#  1 Luke Skywal… Luke…    172    77 blond      fair
#  2 C-3PO        C-3PO    167    75 NA         gold
#  3 R2-D2        R2-D2     96    32 NA         white, bl…
#  4 Darth Vader  Dart…    202   136 none       white
#  5 Leia Organa  Leia…    150    49 brown      light
#  6 Owen Lars    Owen…    178   120 brown, gr… light
#  7 Beru Whites… Beru…    165    75 brown      light
#  8 R5-D4        R5-D4     97    32 NA         white, red
#  9 Biggs Darkl… Bigg…    183    84 black      light
# 10 Obi-Wan Ken… Obi-…    182    77 auburn, w… fair
# # … with 77 more rows, and 9 more variables:
# #   eye_color , birth_year , sex ,
# #   gender , homeworld , species ,
# #   films , vehicles , starships

When I didn't understand tidyeval well, I had no idea when to use :=, but think of := as an = alternative to solve the problem that you can't unquote in argument names. As an extreme example, you can normally use := even when not using unquote:

starwars |>
    mutate(newcol := name, .before = name)
 
starwars |>
    mutate(newcol := height * mass, .before = name)

To be precise, := can only be used when arguments are defined as ... in dplyr functions etc. This ... is called dynamic dots, a feature overridden by rlang. Note that it's not exactly the same as baseR's variable-length arguments.

Using {{}} for argument name tidyeval too

Just like tidyeval for regular variables earlier, you can use {{}} to skip the enquo() hassle.

The pattern of using {{}} for argument names with the := operator is standard, so I'll make this tidyeval's most important pattern #2.

myfunction <- function(df, var) {
    df |>
        mutate({{ var }} := name, .before = name)
}
myfunction(starwars, newname)
 
# # A tibble: 87 × 15
#    newname      name  height  mass hair_color skin_color
#
#  1 Luke Skywal… Luke…    172    77 blond      fair
#  2 C-3PO        C-3PO    167    75 NA         gold
#  3 R2-D2        R2-D2     96    32 NA         white, bl…
#  4 Darth Vader  Dart…    202   136 none       white
#  5 Leia Organa  Leia…    150    49 brown      light
#  6 Owen Lars    Owen…    178   120 brown, gr… light
#  7 Beru Whites… Beru…    165    75 brown      light
#  8 R5-D4        R5-D4     97    32 NA         white, red
#  9 Biggs Darkl… Bigg…    183    84 black      light
# 10 Obi-Wan Ken… Obi-…    182    77 auburn, w… fair
# # … with 77 more rows, and 9 more variables:
# #   eye_color , birth_year , sex ,
# #   gender , homeworld , species ,
# #   films , vehicles , starships

There's a somewhat stylish way to use {{}} in this case 📿 You can define argument names using notation similar to the glue package.

Post-submission note: When using glue notation, you need to surround it with " to make it a string. (@yutannihilation thank you for the quick reply 🙏)

myfunction <- function(df, var) {
    df |>
        mutate( "meter_{{ var }}" := {{ var }} / 10 ,
        .before = height)
}
myfunction(starwars, height)
 
# # A tibble: 87 × 15
#    name  meter_height height  mass hair_color skin_color
#
#  1 Luke…         17.2    172    77 blond      fair
#  2 C-3PO         16.7    167    75 NA         gold
#  3 R2-D2          9.6     96    32 NA         white, bl…
#  4 Dart…         20.2    202   136 none       white
#  5 Leia…         15      150    49 brown      light
#  6 Owen…         17.8    178   120 brown, gr… light
#  7 Beru…         16.5    165    75 brown      light
#  8 R5-D4          9.7     97    32 NA         white, red
#  9 Bigg…         18.3    183    84 black      light
# 10 Obi-…         18.2    182    77 auburn, w… fair
# # … with 77 more rows, and 9 more variables:
# #   eye_color , birth_year , sex ,
# #   gender , homeworld , species ,
# #   films , vehicles , starships

Note that you can't write glue notation style without using {{}} (!!):

myfunction2 <- function(df, var) {
    quo_var <- enquo(var)
    df |>
        mutate( meter_!!var  := {{ var }} / 10 ,
        .before = height)
}
# Error:   unexpected '!' at:
#  "            df |>
#                      mutate( meter_!"

###📗 Receive expression➗, unquote (!!) then pass to function

In filter functions and such, you pass expressions like species == "Human" rather than single variables[ref]More specifically, call class objects. See expression in the previous article.[/ref].

For expression operations, use enexpr() to make it an expression, then unquote just before evaluation.

my_func <- function(df, var) {
    quo_var <- enexpr(var)
    df |>
        filter(!!quo_var)
}
my_func(starwars, species == "Human" & height > 150)
 
# # A tibble: 29 × 14
#    name  height  mass hair_color skin_color eye_color birth_year
#
#  1 Luke…    172    77 blond      fair       blue            19
#  2 Dart…    202   136 none       white      yellow          41.9
#  3 Owen…    178   120 brown, gr… light      blue            52
#  4 Beru…    165    75 brown      light      blue            47
#  5 Bigg…    183    84 black      light      brown           24
#  6 Obi-…    182    77 auburn, w… fair       blue-gray       57
#  7 Anak…    188    84 blond      fair       blue            41.9
#  8 Wilh…    180    NA auburn, g… fair       blue            64
#  9 Han …    180    80 brown      fair       brown           29
# 10 Wedg…    170    77 brown      fair       hazel           21
# # … with 19 more rows, and 7 more variables: sex ,
# #   gender , homeworld , species , films ,
# #   vehicles , starships

However, it seems {{}} doesn't support expressions, as you can't pass expressions like species == "Human" to the filter function.

myfunc <- funciton(df, var) {
    df |>
        dplyr::filter({{ var }})
}
## Error:   unexpected '}' at "}"

I couldn't find documentation explaining {{}}'s (curly-curly bracket) behavior in detail, so I don't know why it has this specification 🤔. However, according to Lionel's comment on Stackoverflow, it has the property that it "only works inside functions," and according to Yutani's blog, it "can only be used in cases where you pass arguments as-is."

If anyone knows the accurate explanation, please let me know!

Since {{}} can handle single variables well, there's a workaround like "pass only the variables of the inequality":

my_func <- function(df, var1, var2) {
    df |>
        filter({{ var1 }} == "Human" & {{ var2 }} > 150)
}
my_func(starwars, species, height)
 
# # A tibble: 29 × 14
#    name  height  mass hair_color skin_color eye_color birth_year
#
#  1 Luke…    172    77 blond      fair       blue            19
#  2 Dart…    202   136 none       white      yellow          41.9
#  3 Owen…    178   120 brown, gr… light      blue            52
#  4 Beru…    165    75 brown      light      blue            47
#  5 Bigg…    183    84 black      light      brown           24
#  6 Obi-…    182    77 auburn, w… fair       blue-gray       57
#  7 Anak…    188    84 blond      fair       blue            41.9
#  8 Wilh…    180    NA auburn, g… fair       blue            64
#  9 Han …    180    80 brown      fair       brown           29
# 10 Wedg…    170    77 brown      fair       hazel           21
# # … with 19 more rows, and 7 more variables: sex ,
# #   gender , homeworld , species , films ,
# #   vehicles , starships

filter() works with quosure too

This is also a confusing story 🙃 but apparently expressions like species == "Human" & height > 150 can be given as quosures, so enquo() works the same as enexpr().

my_func <- function(df, var) {
    quo_var <- enquo(var)
    df |>
        filter(!!quo_var)
}
my_func(starwars, species == "Human" & height > 150)
 
# # A tibble: 29 × 14
#    name  height  mass hair_color skin_color eye_color birth_year
#
#  1 Luke…    172    77 blond      fair       blue            19
#  2 Dart…    202   136 none       white      yellow          41.9
#  3 Owen…    178   120 brown, gr… light      blue            52
#  4 Beru…    165    75 brown      light      blue            47
#  5 Bigg…    183    84 black      light      brown           24
#  6 Obi-…    182    77 auburn, w… fair       blue-gray       57
#  7 Anak…    188    84 blond      fair       blue            41.9
#  8 Wilh…    180    NA auburn, g… fair       blue            64
#  9 Han …    180    80 brown      fair       brown           29
# 10 Wedg…    170    77 brown      fair       hazel           21
# # … with 19 more rows, and 7 more variables: sex ,
# #   gender , homeworld , species , films ,
# #   vehicles , starships

In every example, data-mask information is assigned just before evaluation, so the reality of tidyeval in dplyr functions is probably that the filter() function handles things nicely 🤔

I have some questions about this feature assignment, but I'll decide not to think too deeply about it.

###📗 Receive string🔤, convert to symbol then unquote and pass to function

So far we've been putting values into custom function arguments without quotation marks[ref]Also called bare. From "bare" without quotation marks.[/ref], but there are also ways to pass strings and process them well.

If you're passing variables between functions without quotation marks, there may be cases where it's easier to handle as strings rather than nervously worrying about when they'll be evaluated.

In this case, use ensym() instead of enquo(). ensym() performs an operation like removing " and surrounding with `. Remember it as "ensym() because the string becomes a symbol."

my_func <- function(df, string_var) {
    var <- ensym(string_var)
    df |>
        select(!!var)
}
my_func(starwars, "name")
 
# # A tibble: 87 × 1
#    name
#
#  1 Luke Skywalker
#  2 C-3PO
#  3 R2-D2
#  4 Darth Vader
#  5 Leia Organa
#  6 Owen Lars
#  7 Beru Whitesun lars
#  8 R5-D4
#  9 Biggs Darklighter
# 10 Obi-Wan Kenobi
# # … with 77 more rows
my_func <- function(df, string_var) {
    var <- ensym(string_var)
    df |>
        group_by(!!var) |>
        summarise(mean(height))
}
my_func(starwars, "species")
 
# # A tibble: 38 × 2
#    species   `mean(height)`
#
#  1 Aleena               79
#  2 Besalisk            198
#  3 Cerean              198
#  4 Chagrian            196
#  5 Clawdite            168
#  6 Droid                NA
#  7 Dug                 112
#  8 Ewok                 88
#  9 Geonosian           183
# 10 Gungan              209.
# # … with 28 more rows

Different method when processing strings inside the function

When using strings generated inside the function rather than as custom function arguments, use sym() instead of ensym().

myfunc <- function(df) {
    var <- sym("species")
    df |>
        group_by(!!var) |>
        summarise(mean(height))
}
myfunc(starwars)
 
# # A tibble: 38 × 2
#    species   `mean(height)`
#
#  1 Aleena               79
#  2 Besalisk            198
#  3 Cerean              198
#  4 Chagrian            196
#  5 Clawdite            168
#  6 Droid                NA
#  7 Dug                 112
#  8 Ewok                 88
#  9 Geonosian           183
# 10 Gungan              209.
# # … with 28 more rows

I introduced a new function sym() here, so let me think about the two similar functions ensym() and sym().

Listing the evaluation flow:

  • [When using function arguments] string_var"species"species
  • [When using strings inside the function] "species"species

Like this, it's one step different 💡 I think of it roughly as: use ensym() or enquo() (things with "en") when passing through custom function arguments, and use sym() or quo() when not using arguments.

For the above reason, if you try to understand tidyeval behavior using enquo() etc. outside of custom functions, it won't behave as intended.

As another explanation, enquo() corresponds to baseR's substitute(), and quo() corresponds to baseR's quote(). The former evaluates the given variable once before making it a quosure, while the latter creates a quosure exactly as written.

##📚 Patterns Using tidyeval (Two or More Variables)

So far I've introduced methods for passing single variables/expressions, but when there are multiple, the processing method changes completely.

If the ... (variable-length arguments, dynamic-dots) pattern works, it's easy, but for more complex cases, the effort increases a bit and code readability might decrease.

This time I'll introduce just the following 2 patterns.

  • Receive multiple symbols and pass to a single dplyr function
  • Receive multiple expressions and pass to a single function
  • Receive multiple expressions and pass to multiple functions

###📚 Receive multiple symbols🌟 and pass to a single dplyr function

Simply receiving multiple variables and passing them to a single function is easy. Given how difficult the previous operations were, it's worrying like "wait, is this really OK?"

It's simple. Just use the variable-length argument .... No {{}} or !! needed.

Using ... to pass arguments as-is is tidyeval important pattern #3.

myfunc <- function(df, ...) {
    df |>
        select(...)
}
 
myfunc(starwars, name, height, mass)
 
# # A tibble: 87 × 3
#    name               height  mass
#
#  1 Luke Skywalker        172    77
#  2 C-3PO                 167    75
#  3 R2-D2                  96    32
#  4 Darth Vader           202   136
#  5 Leia Organa           150    49
#  6 Owen Lars             178   120
#  7 Beru Whitesun lars    165    75
#  8 R5-D4                  97    32
#  9 Biggs Darklighter     183    84
# 10 Obi-Wan Kenobi        182    77
# # … with 77 more rows

Also, for simple use cases like just one argument, using it as an alternative to {{}} might be fine.

myfunc <- function(df, ...) {
    df |>
        group_by(...)
}
myfunc(starwars, species) |>
    summarise(mean(height))
 
## A tibble: 38 × 2
#   species   `mean(height)`
#
# 1 Aleena               79
# 2 Besalisk            198
# 3 Cerean              198
# 4 Chagrian            196
# 5 Clawdite            168
# 6 Droid                NA
# 7 Dug                 112
# 8 Ewok                 88
# 9 Geonosian           183
#10 Gungan              209.
## … with 28 more rows

Note that you can't give argument names to ..., so you inevitably have to rely on argument position (nth argument) for specification ⚠️

From a function design perspective, it might end up being hard to use unless there's a docstring.

###📚 Receive multiple expressions➗ and pass to a single function

The earlier ... is quite convenient and can pass argument name=value sets.

myfunc <- function(df, ...) {
    df |>
        group_by(species, sex) |>
        summarise(...)
}
myfunc(
    starwars,
    mean_height = mean(height),
    max_height = max(height),
    min_height = min(height)
    )
 
# `summarise()` has grouped output by 'species'. You can override
# using the `.groups` argument.
# # A tibble: 41 × 5
# # Groups:   species [38]
#    species   sex    mean_height max_height min_height
#
#  1 Aleena    male           79          79         79
#  2 Besalisk  male          198         198        198
#  3 Cerean    male          198         198        198
#  4 Chagrian  male          196         196        196
#  5 Clawdite  female        168         168        168
#  6 Droid     none           NA          NA         NA
#  7 Dug       male          112         112        112
#  8 Ewok      male           88          88         88
#  9 Geonosian male          183         183        183
# 10 Gungan    male          209. 224        196
# # … with 31 more rows

After all, there might only be opportunities to use it with mutate() or summarise() or so 😇

argument name=value can be expanded from exprs

This is one of the rare opportunities for !!! to shine. If you need to prepare many argument name=value for custom function arguments, you can prepare them as a list of expressions.

However, simply making a list would start value evaluation, so you need to quote with exprs().

myfunc <- function(df, attrs) {
    df |>
        group_by(species, sex) |>
        summarise(!!!attrs)
}
 
attrs_exprs <- exprs(
    mean_height = mean(height),
    max_height = max(height),
    min_height = min(height)
)
myfunc(starwars, attrs_exprs)
 
# `summarise()` has grouped output by 'species'. You can override
# using the `.groups` argument.
# # A tibble: 41 × 5
# # Groups:   species [38]
#    species   sex    mean_height max_height min_height
#
#  1 Aleena    male           79          79         79
#  2 Besalisk  male          198         198        198
#  3 Cerean    male          198         198        198
#  4 Chagrian  male          196         196        196
#  5 Clawdite  female        168         168        168
#  6 Droid     none           NA          NA         NA
#  7 Dug       male          112         112        112
#  8 Ewok      male           88          88         88
#  9 Geonosian male          183         183        183
# 10 Gungan    male          209. 224        196
# # … with 31 more rows

The list created by exprs() is easier to understand if you think of it as the same as the following: it creates exactly the same thing as combinations of argument names and expression-ified values.

attrs <- list(
    mean_height = expr(mean(height)),
    max_height = expr(max(height)),
    min_height = expr(min(height))
)
 
myfunc(starwars, attrs)

This helps understanding, but making lists with exprs() is easier 😅

###📚 Receive multiple argument=value sets➗ and pass to different functions

Just use exprs() multiple times from before.

It's not super important but has high versatility, so I'll make it tidyeval pattern #4.

myfunc <- function(df, vars1, vars2) {
       df |>
        group_by(!!!vars1) |>
        summarise(!!!vars2)
}
myfunc(starwars,
    exprs(species, sex),
    exprs(
        newcol = mean(height),
        secondcol = height / mass))

##📊 (Aside) ggplot2 also largely follows this logic

So, the above is a tidyeval pattern collection, but all explanations were with dplyr functions.

I'll show ggplot2 cases too, but I think most things are solved with {{}} in ggplot2, so I'll just show a few examples.

###Passing to aes()

For aesthetic parameters with variables, {{}} is OK.

myfunc <- function(df, xvar, yvar) {
    df |>
        ggplot(aes(x = {{ xvar }}, y = {{ yvar }}))+
        geom_point()
}
myfunc(starwars, height, mass) # OK
myfunc(starwars, height/mass, mass) # OK

Calculation operations like reorder() are also supported.

myfunc <- function(df, xvar, yvar) {
    df |>
        ggplot(aes(x = {{ xvar }}, y = {{ yvar }}))+
        geom_point()
}
myfunc(starwars, reorder(name, height), height) # OK

By the way 👆 this isn't tidyeval, but you can also pass the mapping parameter as-is. (Just learned this)

myfunc <- function(df, mapping) {
    df |>
        ggplot(mapping)+
        geom_point()
}
myfunc(starwars, aes(x = reorder(name, height), y = height))

###Passing to facet_wrap()

It also supports facet_wrap() and facet_grid() etc.

Note that anonymous function format starting with ~ doesn't tidyeval properly ⚠️

myfunc <- function(df, var) {
    df |>
        ggplot(aes(x = name, y = height)) +
        geom_point() +
        facet_wrap(vars({{ var }}))
}
myfunc(starwars, species) # OK

###Creating a complex ggplot

library(ggforce)
library(gghighlight)
 
ggstarwars <- function(data, x, y, zoom_var, highlight) {
    data |>
        ggplot(aes({{ x }}, {{ y }})) +
        geom_point() +
        gghighlight({{ highlight }}) +
        facet_zoom(x = {{ zoom_var }})
}
 
ggstarwars(starwars, mass, height, species == "Human", height > 150 & height < 200)

##Grand Summary

So 🤚 this became a very long entry, but I've compiled a collection of likely-useful tidyeval patterns!

Looking at various patterns again, I found that after all, most cases are solved with {{}} 💡

The following 4 patterns I arbitrarily named "important patterns" are good to be able to use 👍

However, it also silently supports quosures and expressions, and its actual functionality is hard to grasp, so if you want to use tidyeval without failing in the future, I strongly felt you should understand the principles too 💪

I wrote an article for image understanding of tidyeval principles, so please reference that previous article too.

https://excel2rlang.com/rlang-tidyeval-abstract/

I also introduced methods for tidyeval with multiple variables for information completeness, but consistent function design is really difficult, and creating highly reusable functions seems hard. I wish someone would come up with something like tidyeval design patterns...

Well, I'll end it here for now!

Recently there are many gods creating beginner content, so maybe I'll do maniac topics from next time 😇

##References

###Official Materials

The most detailed is Advanced R 2nd edition. It's painful there's no Japanese translation...

https://amzn.to/3PijWM0

Advanced R is also published online. The Metaprogramming chapter covers this content.

https://adv-r.hadley.nz/metaprogramming.html

The rlang package official documentation is somewhat helpful, but information coverage is low, so it's good to read with Advanced R.

https://rlang.r-lib.org/index.html

Old tidy evaluation explanation materials have some outdated information, but some explanations are easier to understand than now.

https://tidyeval.tidyverse.org/

###ggplot2 and tidyeval

https://notchained.hatenablog.com/entry/2021/01/20/222608

https://www.tidyverse.org/blog/2018/07/ggplot2-tidy-evaluation/