No articles match
Developing a dbplyr backend7 days ago
First steps | Write your first method | Define a dialect | Copying, computing, collecting and collapsing | SQL translation: verbs | Building SQL strings | Behind the scenes | SQL translation: vectors | Scalar function helpers | Aggregation function helpers | Window function helpers | Custom translation functions
Structured data15 days ago
Structured data basics | Data types | Basics | Missing values | Data frames | Examples | Example 1: Article summarisation | Example 2: Named entity recognition | Example 3: Sentiment analysis | Example 4: Text classification | Example 5: Working with unknown keys | Example 6: Extracting data from an image | Tools and structured data | Token usage
Cell and Column Types1 months ago
Type guessing | Excel types, R types, col_types | When column guessing goes wrong | Peek at column names | Square pegs in round holes | Logical column | Numeric column | Date column | Text or character column
Sheet Geometry2 months ago
Little known Excel facts | Implications for readxl | skip and n_max | skip | n_max | range
Getting started with ellmer2 months ago
Vocabulary | What is a token? | What is a conversation? | What is a prompt? | Example uses | Chatbots | Structured data extraction | Programming | Miscellaneous
Transformers2 months ago
collapse transformer | Shell quoting transformer | emoji transformer | sprintf transformer | signif transformer | safely transformer | "Variables and Values" transformer
How to write a function that wraps glue2 months ago
Working example | Where does glue() evaluate code?
Introduction to glue2 months ago
Gluing and interpolating | Control of line breaks | Delimiters | Where glue looks for values | SQL
Custom knitr language engines2 months ago
glue engine | glue_sql engine
Column types3 months ago
Overview | Example values | Implementation
Column-wise operations4 months ago
Basic usage | Multiple functions | Current column | Gotchas | Other verbs | filter() and filter_out() | _if, _at, _all | Why do we like across()? | Why did it take so long to discover across()? | How do you convert existing code?
dplyr <-> base R4 months ago
Overview | One table verbs | arrange(): Arrange rows by variables | distinct(): Select distinct/unique rows | filter(): Return rows with matching conditions | mutate(): Create or transform variables | pull(): Pull out a single variable | relocate(): Change column order | rename(): Rename variables by name | rename_with(): Rename variables with a function | select(): Select variables by name | summarise(): Reduce multiple values down to a single value | slice(): Choose rows by position | Two-table verbs | Mutating joins | Filtering joins
Grouped data4 months ago
group_by() | Group metadata | Changing and adding to grouping variables | Removing grouping variables | Verbs | summarise() | select(), rename(), and relocate() | arrange() | mutate() | filter() | slice() and friends
Introduction to dplyr4 months ago
Data: starwars | Single table verbs | The pipe | Filter rows with filter() | Arrange rows with arrange() | Choose rows using their position with slice() | Select columns with select() | Add new columns with mutate() | Change column order with relocate() | Summarise values with summarise() | Commonalities | Combining functions with |> | Patterns of operations | Selecting operations | Mutating operations
Programming with dplyr4 months ago
Introduction | Data masking | Data- and env-variables | Indirection | Name injection | Tidy selection | The tidyselect DSL | How-tos | User-supplied data | One or more user-supplied expressions | Any number of user-supplied expressions | Creating multiple columns | Transforming user-supplied variables | Loop over multiple variables | Use a variable from an Shiny input
Row-wise operations4 months ago
Creating | Per row summary statistics | Row-wise summary functions | List-columns | Motivation | Subsetting | Modelling | Repeated function calls | Simulations | Multiple combinations | Varying functions | Previously | rowwise() | do()
Window functions4 months ago
Types of window functions | Ranking functions | Lead and lag | Cumulative aggregates | Recycled aggregates
Translation4 months ago
Introduction | The basics | Simple verbs | filter() and arrange() | select(), summarise(), transmute() | Other calls | rename() | distinct() | Joins | Set operations | Grouping | Combinations | Copies | Performance
Conversion semantics5 months ago
Value labels | labelled() | Missing values | Tagged missing values | User defined missing values
ragnar5 months ago
Getting Started with ragnar | Why RAG? The Hallucination Problem | Use Case: Quarto Docs Chat vs. Standard Search | Setting up RAG | Creating the Store | Identify Documents for Processing | Convert Documents to Markdown | Chunk and Augment | Insert in the Store | Tying it Together | Retrieval | Customizing Retrieval | Troubleshooting and Debugging | Cost Management | Summary
Recoding columns and replacing values5 months ago
Introduction | case_when() | replace_when() | Type stability | recode_values() | replace_values() | Comparisons | if_else() | coalesce() | na_if() | SQL
Invariants: Comparing behavior with data frames5 months ago
Conventions | Column extraction | Definition of x[[j]] | Definition of x$name | Column subsetting | Definition of x[j] | Definition of x[, j] | Definition of x[, j, drop = TRUE] | Row subsetting | Definition of x[i, ] | Definition of x[i, , drop = TRUE] | Row and column subsetting | Definition of x[] and x[,] | Definition of x[i, j] | Definition of x[[i, j]] | Column update | Definition of x[[j]] <- a | Definition of x$name <- a | Column subassignment: x[j] <- a | a is a list or data frame | a is a matrix or array | a is another type of vector | a is NULL | a is not a vector | Row subassignment: x[i, ] <- list(...) | Row and column subassignment | Definition of x[i, j] <- a | Definition of x[[i, j]] <- a
Get started with vroom5 months ago
Reading files | Reading multiple files | Reading compressed files | Reading individual files from a multi-file zip archive | Reading remote files | Column selection | Reading fixed width files | Column types | Name repair | Writing delimited files | Writing CSV delimited files | Writing compressed files | Reading and writing from standard input and output | Further reading
Vroom Benchmarks6 months ago
How it works | Reading delimited files | Taxi Trip Dataset | Taxi Benchmarks | All numeric data | Long | Wide | All character data | Reading multiple delimited files | Reading fixed width files | United States Census 5-Percent Public Use Microdata Sample files | Census data benchmarks | Writing delimited files | Session and package information
Function translation6 months ago
Getting started with translations | Basic differences | Known functions | Mathematics | Modulo arithmetic | Logical comparisons and boolean operations | Bitwise operations | Type coercion | NULL/NA handling | Aggregation | Conditional evaluation | String functions | Date/time functions | Other functions | Unknown functions | Prefix functions | Infix functions | Special forms | Error for unknown translations | Window functions
Verb translation6 months ago
Single table verbs | Subqueries | Dual table verbs
Reprexes for dbplyr6 months ago
Using memdb_frame() | Translating verbs | Translating individual expressions
Introduction to dbplyr6 months ago
Getting started | Connecting to the database | Generating queries | Why use dbplyr? | What happens when dbplyr fails? | Creating your own database | MySQL/MariaDB | PostgreSQL | BigQuery
In packages7 months ago
Introduction | Using tidyr in packages | Fixed column names | Continuous integration | tidyr v0.8.3 -> v1.0.0 | Conditional code | New syntax for nest() | New syntax for unnest() | nest() preserves groups | nest_() and unnest_() are defunct
Rectangling7 months ago
Introduction | GitHub users | GitHub repos | Game of Thrones characters | Geocoding with google | Sharla Gelfand's discography
Translations7 months ago
Data types | Verbs | Functions within verbs | Parentheses | Comparison operators | Basic arithmetics | Math functions | Logical operators | Branching and conversion | String manipulation | Date manipulation | Aggregation | Shifting | Ranking | Special cases | Contributing | Known incompatibilities | Output order stability | sum() | Empty vectors in aggregate functions | min() and max() for logical input | n_distinct() and multiple arguments | is.na() and NaN values | Row names | Other differences
Column type7 months ago
Automatic guessing | Legacy behavior
Introduction to readr7 months ago
Vector parsers | Atomic vectors | Flexible numeric parser | Date/times | Factors | Column specification | Rectangular parsers | Overriding the defaults | Available column specifications | Output
Locales7 months ago
Dates and times | Names of months and days | Timezones | Default formats | Character | Numbers
Using ggplot2 in packages7 months ago
Prompt design7 months ago
Best practices | Code generation | Basic flavour | Be explicit | Teach it about new features | Structured data | Getting started | Provide examples | Capturing raw input | Token usage
Tool/function calling7 months ago
Introduction | Motivating example | Defining a tool function | Registering and using tools | Tool inputs and outputs | Image and PDF tool output
Aesthetic specifications8 months ago
Extending ggplot28 months ago
Introduction to ggplot28 months ago
Profiling Performance8 months ago
Get started with purrr9 months ago
Introduction | Map: A better way to loop | Progress bars | Parallel computing | Output variants | Input variants | Combinatorial explosion | Filtering and finding with predicates
Introduction to forcats9 months ago
Ordering by frequency | NAs in levels and values | Combining levels | Ordering by another variable | Manually reordering
Functional programming in other languages9 months ago
purrr <-> base R9 months ago
Introduction | Key differences | Direct translations | Map functions | Extractor shorthands | Predicates | Other vector transforms | Examples | Varying inputs | One input | Two inputs | Any number of inputs | Outputs | Pipes
Locale sensitive functions9 months ago
Case conversion | Sorting and ordering
Regular expressions9 months ago
Basic matches | Escaping | Special characters | Matching multiple characters | Alternation | Grouping | Anchors | Repetition | Look arounds | Comments
Two-table verbs9 months ago
Mutating joins | Controlling how the tables are matched | Types of join | Observations | Filtering joins | Set operations | Multiple-table verbs
Using dplyr in packages9 months ago
Join helpers | Data masking and tidy selection NOTEs | Deprecation | Multiple dplyr versions | Deprecation of mutate_*() and summarise_*() | Data frame subclasses
Memory protection: controlling automatic materialization9 months ago
Introduction | Eager and lazy computation | Example | Comparison | Prudence | Concept | Enforcing DuckDB operation | From stingy to lavish | Thrift | File ingestion and custom limits | Conclusion
Nested data10 months ago
Basics | Nested data and models
Pivoting10 months ago
Introduction | Longer | String data in column names | Numeric data in column names | Many variables in column names | Multiple observations per row | Wider | Capture-recapture data | Aggregation | Generate column name from multiple variables | Tidy census | Implicit missing values | Unused columns | Contact list | Longer, then wider | World bank | Multi-choice | Manual specs | By hand | Theory
Programming with tidyr10 months ago
Introduction | Tidy selection | Indirection
Tidy data10 months ago
Data tidying | Defining tidy data | Data structure | Data semantics | Tidying messy datasets | Column headers are values, not variable names | Multiple variables stored in one column | Variables are stored in both rows and columns | Multiple types in one table | One type in multiple tables
From base R10 months ago
Overall differences | Detect matches | str_detect(): Detect the presence or absence of a pattern in a string | str_which(): Find positions matching a pattern | str_count(): Count the number of matches in a string | str_locate(): Locate the position of patterns in a string | Subset strings | str_sub(): Extract and replace substrings from a character vector | str_sub() <- : Subset assignment | str_subset(): Keep strings matching a pattern, or find positions | str_extract(): Extract matching patterns from a string | str_match(): Extract matched groups from a string | Manage lengths | str_length(): The length of a string | str_pad(): Pad a string | str_trunc(): Truncate a character string | str_trim(): Trim whitespace from a string | str_wrap(): Wrap strings into nicely formatted paragraphs | Mutate strings | str_replace(): Replace matched patterns in a string | case: Convert case of a string | Join and split | str_flatten(): Flatten a string | str_dup(): duplicate strings within a character vector | str_split(): Split up a string into pieces | str_glue(): Interpolate strings | Order strings | str_order(): Order or sort a character vector
Star Wars films (dynamic HTML)10 months ago
Star Wars films (static HTML)10 months ago
Web scraping 10110 months ago
HTML basics | Elements | Contents | Attributes | Reading HTML with rvest | CSS selectors | Extracting data | Text | Tables | Element vs elements
Programming with ellmer12 months ago
Cloning chats | Resetting an object | Streaming vs batch results | Turns and content
Column formats1 years ago
Overview
Comparing display with data frames1 years ago
Digits | Basic differences | Terminal zeros | Trailing dot | Showing more digits | Fixed number of digits | Scientific notation | When is it used? | Enforce notation
Controlling display of numbers1 years ago
Options | Per-column number formatting | Rule-based number formatting | Computing on num | Arithmetics | Mathematics | Override | Recovery
Tibbles1 years ago
Creating | Coercion | Tibbles vs data frames | Printing | Subsetting | Recycling | Arithmetic operations
Streaming and async APIs1 years ago
Streaming results | Async usage | Asynchronous chat | Shiny example | Asynchronous streaming
Interoperability with DuckDB and dbplyr1 years ago
Introduction | From duckplyr to dbplyr | Call arbitrary functions in duckplyr | Conclusion
Large data1 years ago
Introduction | To duckplyr | From files | From DuckDB | Materialization | To files | Memory usage | The big picture
Fallback to dplyr1 years ago
Introduction | DuckDB mode | Relation objects | Help from dplyr | Enforce DuckDB operation | Configure fallbacks | Conclusion
Selective use of duckplyr1 years ago
Introduction | External data with explicit qualification | Restoring dplyr methods | Own data | In other packages
Telemetry1 years ago
Implementer's interface1 years ago
Do more with dates and times in R2 years ago
Parsing dates and times | Setting and Extracting information | Time Zones | Time Intervals | Arithmetic with date times | If anyone drove a time machine, they would crash | Vectorization | Further Resources
Welcome to the Tidyverse2 years ago
Summary | Tidyverse package | Components | Design principles | Acknowledgments | References
The tidy tools manifesto2 years ago
Reuse existing data structures | Compose simple functions with the pipe | Embrace functional programming | Design for humans
Introduction to stringr2 years ago
Getting and setting individual characters | Whitespace | Locale sensitive | Pattern matching | Tasks | Engines | Fixed matches | Collation search | Boundary
Dates and times3 years ago
Formats | Offsets | References
Extending tibble3 years ago
Topics documented elsewhere | Data frame subclasses | Tibble example | Data frame example
Reprex do's and don'ts4 years ago
Main requirements | This seems like a lot of work! | Further reading: | Package philosophy
An introduction to multidplyr5 years ago
Creating a cluster | Add data | partition() | Direct loading | dplyr verbs
Introducing magrittr6 years ago
Abstract | Introduction and basics | Additional pipe operators | Aliases | Development
Design tradeoffs6 years ago
Code transformation | Desired properties | Implications of design decisions | Placeholder binding | Masking environment | Laziness | Numbered placeholders | Three implementations | Nested pipe | Multiple placeholders r fail() | Lazy evaluation r pass() | Persistence and eager unbinding r pass() | Progressive stack r fail() | Lexical effects r pass() | Continuous stack r pass() | Eager lexical pipe | Multiple placeholders r pass() | Lazy evaluation r fail() | Persistence: r fail() | Eager unbinding: r pass() | Progressive stack: r pass() | Lexical effects and continuous stack: r pass() | Lazy masking pipe | Persistence: r pass() | Progressive stack: r fail() | Lexical effects r fail() | Continuous stack r fail()
googledrive8 years ago