[R Notes] Understanding tidyeval from rlang - Only What Users Need to Know

Published Jul 3, 2022
Updated Nov 15, 2025
11 minutes read
Note

This old post is translated by AI.

###Introduction: tidyeval Is Difficult Because of "This"

If you've studied R language to some degree, you'll understand - tidyeval from the rlang package is something you know exists but is extremely difficult to understand.

If you seriously want to learn tidyeval, you should read Hadley's Advanced R, but the already difficult content being in English makes it extra hard 🤯 Even if you understand English, there's a parade of unfamiliar words. It's like a certain web service's top page.

Looking for Japanese articles, there's yutannihilation's 2017 slides, yutannihilation's 2018 blog post, teramonagi's 2018 blog post, and uri's 2019 Qiita article - I was helped by them at the time.

However, no matter how many Japanese articles I read, I still can't use !! without mistakes 😰

Not being able to write code without looking things up means I still don't understand the essence. Recently I got a task to create an in-house R library, so it became a situation where not understanding tidyeval is really bad for work, so I studied centered on Advanced R.

After serious study, I somewhat saw the big picture, and from a bird's eye view I noticed there are too many beginner traps. I think this is what makes tidyeval difficult.

tidyeval Beginner Traps

  • rlang 1.0.0 major release was only this year (2022/1/26). Some past information is outdated. ⇒ Reading the latest info is surprisingly easier to understand 👍
  • When trying to study systematically, there's lots of information, but more than half is info general users don't need ⇒ This article explains only what's necessary 👍
  • Too many unfamiliar words kills motivation ⇒ You really only need to remember about 5 things 👐
  • Hard to distinguish between "baseR specifications" and "tidyeval specifications" ⇒ Unless you're a tidyeval expert, you don't need to distinguish 👉

This article aims for people who have no plans for proper custom package development and won't leave general user territory, with the goal of understanding principles with minimum understanding so you can create custom functions using tidyeval ⛳

I'll summarize specific "use this function in this case" patterns in the next article! This time is the "vaguely understand tidyeval principles" edition.

###Notes

This article mixes baseR functions and rlang functions. This is to avoid complicating the discussion by bringing up functions with low probability of use.

This article outputs while organizing my own understanding. If there are mistakes, please let me know on Twitter.

The information in this article mainly comes from the latest version of Advanced R published on the web and the rlang reference site.

Once you organize the information to some degree, you should be able to understand the above content. When wanting even more detailed knowledge, I strongly recommend using Advanced R as your textbook since it has no information gaps.

https://adv-r.hadley.nz/metaprogramming.html

The rlang package reference might be better to read after Advanced R.

https://rlang.r-lib.org/index.html

##Basic Information About tidyeval

###tidyeval Is Technology for Metaprogramming

If surrounded by quotation marks ("), it's a string; otherwise (if declared bare), it's treated as a variable - this concept exists in any programming language.

However, in R there are cases where things can be evaluated the same whether quoted or not. The most obvious example is the library() function.

library("MASS")
 
library(MASS)

The fact that it works the same with or without quotation marks suggests there's a mechanism in baseR to convert between them.

Making maximum use of this mechanism can eliminate the hassle of repeatedly quoting with ", allowing creation of user-friendly APIs.

The rlang package fills holes in these features that originally exist in baseR and further expands functionality, provided as a feature called tidyeval. Tidyverse packages like dplyr and ggplot2 achieve intuitive, easy-to-use function design by fully utilizing tidyeval provided by the rlang package.

From a user's perspective, making maximum use of tidyeval enables programming to manipulate and change the code itself, allowing more flexible programming. What I mentioned here, is called metaprogramming.

What tidyeval wants to achieve is "enabling flexible programming through metaprogramming implementation." "Programming that manipulates or changes code itself" is called metaprogramming. This content isn't explicitly stated in current Advanced R, but there was such a description in previous tidyeval materials. What benefit does metaprogramming provide? I think the biggest benefit is being able to implement user-friendly functions like dplyr and ggplot2.

So, if you're an R programming expert including package developers, I'd like you to know all of tidyeval, but I think general R users don't need to know everything about tidyeval (my opinion).

However, even general users, if you incorporate dplyr functions that fully utilize tidyeval into custom functions, you're stepping into metaprogramming. This article is written for people in this area.

###rlang Finally Reached version 1.0.0 in 2022 After Much Trial and Error

The rlang package embodies the concept of tidyeval and is one of the r-lib packages.

Development began in 2016 as a "lazy" package by Hadley Wickham, and after 8 years it reached major version 1.0.0 release 🎉

There were apparently many twists and turns until the major release, with terminology and functions being renamed and functionality changed.

There were previously names and functions like lazyeval and UQ(), but they're no longer used. Be careful.

[ad]

##Step 1: Understanding expression and evaluation

###Non-standard evaluation

Before understanding tidyeval principles, let's know about "Non-standard evaluation (NSE)", a somewhat unusual value evaluation method 💡

Let's look at the library() function again.

There's no variable called MASS, but this function interprets MASS as a package name and converts it to "MASS".

# Example of NSE existing in baseR
 
library(MASS)
##  Works
MASS
##  Error: object 'MASS' not found

Using rlang package functions, you can create custom functions with similar functionality.

For example, let's create an NSE version of the paste() function.

paste() lets you concatenate strings to make sentences. But when it gets tedious to put double quotation marks on every element, I thought I'd change it to a function that doesn't need double quotation marks like the library() function.

paste("Good", "morning", "Snitch")
## [1] "Good morning Snitch"

Here's what was created using rlang functions:

paste_nse <- function(...) {
    args <- ensyms(...)
    paste(purrr::map(args, rlang::as_string), collapse = " ")
}
 
paste_nse(Good, morning, Snitch)
## [1] "Good morning Snitch"

The same functionality was achieved without using "!

Here we're giving Good and morning like variable names, but actually not seeing them as variables. Such a special state is expressed as quoted, and evaluating this special state with a special dedicated method is called non-standard evaluation (NSE).

I'll introduce what functions like ensyms() are in the next article. This time it's just "vibes understanding" 😙!

###Stopping Just Before Evaluation ~ expression and evaluation

From here I want to explain NSE while including detailed code explanations.

As I explained earlier, in NSE a "quoted" situation is created before special evaluation. Among these methods of "quote" then "evaluation", the simplest is creating an expression then evaluating it.

As an example, let's look at the simplest expr() and eval().

library(rlang)
 
expr_pi <- expr(pi * 100)
expr_pi
## pi * 100
 
eval(expr_pi)
## [1] 314.1593

expr() is a function that puts given R code in a waiting-to-be-evaluated state (quoted). You can give complex R code too.

library(tidyverse)
library(rlang)
 
rlang::expr(
    starwars %>%
        group_by(species) %>%
        summarise(mean_height = mean(height))
)
 
## starwars %>% group_by(species) %>% summarise(mean_height = mean(height))

In practice, the combination of expr() and eval() isn't everything, but the policy of making things waiting-to-be-evaluated (quoted) then evaluating continues to be common going forward.

Info

As an aside, using expr() removes syntactic sugar. This is because expression is in a state after the abstract syntax tree analysis mentioned later.

rlang::expr(
    starwars %>%
        group_by(species) %>%
        summarise(mean_height = mean(height))
)
 
## summarise(group_by(starwars, species), mean_height = mean(height))

Objects created with expr() are in a waiting-to-be-evaluated state, and values aren't calculated until evaluated by the eval() function.

First, grasp the point that "it's possible to save R code itself as an object and evaluate it later" 🖐️

###What expr() Creates Is an expression

Earlier, expr() function created some mysterious object just before evaluation. What was that 🤔?

To conclude, what expr() creates is called an "expression".

As we've seen, expression is an object stopped just before evaluation, but it's actually a data structure where R code is put into a list ➰😲⁉️

Amazingly, it can be referenced and replaced exactly like a list.

my_function_call <- expr(my_function(x = a, y = b, na.rm = TRUE))
 
my_function_call
## my_function(x = a, y = b, na.rm = TRUE)
 
my_function_call[[1]]
## [1] my_function
 
my_function_call[["x"]]
## [1] a
 
my_function_call[["na.rm"]]
## [1] TRUE
 
my_function_call$y = "new_strings"
my_function_call
## my_function(x = a, y = "new_strings", na.rm = TRUE)
 
my_function_call$z = expr(df)
my_function_call
## my_function(x = a, y = "new_strings", na.rm = TRUE, z = df)

As you can see, expressions can do various operations just like lists, and you can really see that R programming can program programming (philosophy)

That said, I don't think there are many occasions to modify and evaluate expressions through list operations like this 🤔. The point I wanted to convey here was that "expression is a very special object".

"Abstract Syntax Tree (AST)" is representing language in tree form, used in programming languages for understanding processing syntax. As far as I know, tree-sitter achieves syntax highlighting by analyzing AST.

AST can be easily checked in R too. Use the ast() function from the lobstr package.

lobstr::ast(my_function(x = a, y = b, na.rm = TRUE)) ## █─my_function ## ├─x = a ## ├─y = b ## └─na.rm = TRUE # Results are actually output colorfully

With the ast() function, you can check the operation order of expressions, and it's also convenient for checking data about unquoting mentioned later 👀

This time I won't dig deep into expressions so I'll keep ast() as a column, but please use it as supplementary information for understanding when you want to verify if created expressions match expectations.

Note that there are three types of parts appearing in ast(). These are units composing expressions. ①constants (numbers and strings), ②symbols (objects themselves), and ③call (groups of code expressions) - for details see Advanced R.

###Using expression Inside Functions Requires Extra Steps

We verified expr() function earlier outside functions, but how do you implement it inside custom functions?

For example, let's try making a function capture_it() that simply receives an argument and returns an expression. We're trying to capture a + b, but unfortunately it doesn't work 😕

capture_it <- function(x) {
    expr(x)
}
capture_it(a + b)
## x

Because expr() faithfully returns the given expression, you can't change the expression with arguments.

When incorporating expressions using arguments into functions, "replacing variables with arguments" is an extra step, so imagine it's one more step than before.

In the earlier example, there's rlang::enexpr() which evaluates x once as a variable then converts to expression, so using this works.

capture_this <- function(x) {
    enexpr(x)
}
capture_this(a + b + c)
## a + b + c

You might be confused here with expr()?? enexpr()???? but let's leave the differences in these functions aside. I'll explain next time. What I want to emphasize here is when incorporating NSE into custom functions, there's more steps than doing it outside functions.

[ad]

##Step 2: Know unquote in quasiquotation

###Unquote with !! bang-bang

So far I've introduced "creating an expression which is a waiting-to-be-evaluated state, then evaluating it."

As briefly mentioned at the beginning, things in this situation are called "quotation" (=quoted).

So how do you make expr() function freely take arguments? As I wrote in the earlier example, expr() turns the given code into an expression as-is.

Actually, the mechanism to "partially evaluate variables given to expr()" doesn't exist in baseR. To solve this, tidyeval introduces a mechanism to evaluate variables once, peeling off one thin layer. That's !! (bang-bang).

capture_it <- function(x) {
    expr(x)
}
capture_it(x = 100)
## x
 
capture_this <- function(x) {
    expr(!!x)
}
capture_this(x = 100)
## 100

The first function capture_it() is the same as before. It evaluates x as-is inside the function, x isn't recognized as the argument x and becomes an expression as given.

Meanwhile capture_this() gives !!x inside the function. !! is pronounced "bang-bang" and handles unquoting.

The image is partially evaluating variables by peeling off one thin layer.

With !!, expression -> evaluation changes to variable evaluation -> expression -> evaluation. With room for variable evaluation, function design gains flexibility.

quotation is an expression waiting for evaluation, but unquoting via !! itself is a concept that doesn't exist in quotation, so it's different from quotation. Therefore, techniques that partially evaluate using !! etc. are different from quotation and are named quasiquotation (pseudo-quotation).

This partial evaluation of variables in "quasiquotation"

is a beginner stumbling point - quotation and quasiquotation terminology is confusing. In the end you'll probably use quasiquotation without being conscious of it, so you don't need to be particular about terminology here.

##Step 3: Understanding Quosure

So far I've introduced expression and evaluation, and unquoting via !! which peels off one thin layer to evaluate.

These are indeed the core concepts of tidyeval, but they're only the first of tidyeval's three pillars 💈.

The other two relate to "quosure". Only when quosure information is complete can you understand what principles dplyr and ggplot2 functions use to evaluate variables.

###First, Vaguely Understand environment

Environment has various meanings depending on context and programming language, but in R it has a clear meaning.

The concept of Environment is the "environment" where objects exist, with various environments like global, package, function... forming layers.

When calling an object, it searches for the object's location in order according to a certain priority. You can see a list of environments with the search() function.

search()
 
##  [1] ".GlobalEnv"        "package:dplyr"     "package:rlang"
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  "package:methods"
## [10] "Autoloads"         "org:r-lib"         "package:base"

Using rlang::env_get(), you can also specify the environment where a variable exists.

myvar <- "variable"
 
rlang::env_get(.GlobalEnv, "myvar")
## [1] "variable"

Variables (or functions) defined in a package exist within that package's environment.

dplyr::filter
 
## function (.data, ..., .preserve = FALSE)
## {
##     UseMethod("filter")
## }
## <bytecode: 0x6848d10>
## <environment: namespace:dplyr>

Thus, variables (or functions) exist in their respective environments, and you can't access variables without calling that environment. Therefore, things like "inside a function is the function's environment so you can't see it from global environment" happen.

###Bundling environment with expression Using Quosure

The definition of "Quosure" is the second of tidyeval's three pillars 💈💈. This is a step toward understanding dplyr and ggplot2 functions.

"Quosure" (Quote + Enclosure) is "expression" we've used many times with "environment" added.

Environments can be created using rlang::env() or rlang::new_environment().

myenv <- env(a = 1, b = 2)
myenv
## <environment: 0x514ab88>
 
myenv$a
## [1] 1
 
my_quosure <- new_quosure(expr(a + b), myenv)
my_quosure
 
## <quosure>
## expr: ^a + b
## env:  0x55b0417dc200

The great thing about quosure is it can evaluate expressions together with environment information. At this time using eval() would just be regular expression evaluation, so use rlang::eval_tidy(). (Finally a function name that sounds essential...!?)

eval_tidy(my_quosure)
## [1] 3

The above, being able to "evaluate expressions together with environment", is "quosure", and that was the second of tidyeval's three pillars.

It does seem convenient, but what does it mean...? 🤔 you might think, but extending this quosure further achieves that thing in ggplot2.

###Evaluate with data-mask

data-mask is the third of tidyeval's three pillars 💈💈💈.

In one word, data-mask is an attempt to use data like data frames as quosure's environment.

Below is simple processing using dplyr::filter() function. This processing can be interpreted as height variable being evaluated in the context of starwars data.

filter(starwars, height > 180)
 
# # A tibble: 38 × 14
#    name         height  mass hair_color skin_color eye_color
#    <chr>         <int> <dbl> <chr>      <chr>      <chr>
#  1 Darth Vader     202 136   none       white      yellow
#  2 Biggs Darkl…    183  84   black      light      brown
#  3 Obi-Wan Ken…    182  77   auburn, w… fair       blue-gray
#  4 Anakin Skyw…    188  84   blond      fair       blue
#  5 Chewbacca       228 112   brown      unknown    blue
#  6 Boba Fett       183  78.2 black      fair       brown
#  7 IG-88           200 140   none       metal      red
#  8 Bossk           190 113   none       green      red
#  9 Qui-Gon Jinn    193  89   brown      fair       blue
# 10 Nute Gunray     191  90   none       mottled g… red
# # … with 28 more rows, and 8 more variables:
# #   birth_year <dbl>, sex <chr>, gender <chr>,
# #   homeworld <chr>, species <chr>, films <list>,
# #   vehicles <list>, starships <list>

Actually, peeking at the filter function definition, you can see it's an object that takes in execution environment with caller_env(). The caller_env() used here performs processing to convert starwars data into an environment.

getS3method("filter", "data.frame")
 
## function (.data, ..., .preserve = FALSE)
## {
##     loc <- filter_rows(.data, ..., caller_env = caller_env())
##     dplyr_row_slice(.data, loc, preserve = .preserve)
## }
## <bytecode: 0x55b04217f1e8>
## <environment: namespace:dplyr>

When you want to see a function's definition, just type the function name. However, note that when calling as an S3 method, you must use getS3method().

tidy_eval() prioritizes the given environment for evaluation, so masking data with environment is why it's named data-mask.

# Define "var" as a variable in global environment
var <- "long long strings"
 
myenv <- env(var = "short word")
eval_tidy(expr(var), new_data_mask(myenv))
 
## [1] "short word"
### Environment data is prioritized
Info

Actually, giving env() as the second argument like earlier has become deprecated. Now using new_data_mask() is recommended, so the sample code uses the recommended way.

...That said, light users probably won't use eval_tidy() much, and actual evaluation will be done by package functions like dplyr, so don't think too deeply and you're OK.

##Summary

This became my longest article ever, so let me properly summarize what we've covered.

What It Means to Evaluate expression

First, I explained about quote - stopping variables without immediately evaluating them using expression.

expression is a special object, in a waiting-for-evaluation state. Also called quoted. The basic flow is to partially evaluate variables, create an expression, then evaluate later.

quasiquotation Where You Can unquote with !!

expressions can be created with expr(), but basically you can only create expressions as written. The technique of evaluating variables once with !! (bang-bang) then inserting into expressions brought flexibility to expression definition methods.

"Evaluating variables once" is called unquote, and it's part of "quasiquotation" - a mechanism tidyeval defined that doesn't exist in baseR.

Evaluate expression Together with environment Using data-mask

Finally, I introduced data-mask as a framework for evaluating expressions in specific contexts.

data-mask can treat even data frames as environments, so by fully utilizing data-mask, very easy-to-use functions like dplyr and ggplot2 are created.

###Next Time Preview

So this time I presented the article "Let's vaguely understand tidyeval provided by rlang package!" 👋

Next time I'll try to summarize what tidyeval to use in what cases, as case studies!! See you~ 😷