Overview
The trashpanda package includes a set of data
import utilities designed to:
- List files in complex directory trees with filtering.
- Safely read multiple file types (CSV, Excel) while handling errors.
- Extract tables from arbitrary locations within Excel sheets.
These tools are useful for projects where data may be scattered across folders or formatted inconsistently.
Listing Files
list_data_files() recursively lists files in a
directory, with options to filter by:
- File extension (
ext) - Folder or file patterns to include or exclude
- Returning a
tibbleor alistof paths
library(trashpanda)
library(tibble)
# List all Excel files in nested directories containing "results"
files <- list_data_files(
path = "path/to/project",
ext = c("xls", "xlsx", "xlsm"),
include = c("results")
)
filesOutput: A tibble with file,
folder, and filename columns, providing a
clean inventory of data sources.
Reading Files Safely
safe_read_data() reads a single file while capturing any
errors. Returns a list:
data: tibble of the imported file (or NULL on
failure)
error: error message if reading failed
file: file path
This is especially useful when reading multiple files, some of which may be malformed.
results <- safe_read_data(
file = "data/example.xlsx",
reader = readxl::read_excel,
sheet = "Sheet1",
cols = c("id", "value", "date")
)
# Access imported data safely
results$dataReading a Directory Tree
read_data_tree() combines the previous functions:
Recursively lists all relevant files
Reads each file safely with
safe_read_data()Optionally enforces consistent column names
Combines multiple Excel sheets automatically
all_data <- read_data_tree(
path = "path/to/project",
reader = readxl::read_excel,
ext = c("xlsx", "xls"),
sheet_pattern = "2026", # optional regex
cols = c("id", "value", "date")
)
all_dataWhy this is useful:
You can process hundreds of files scattered in different files with a single call
Ensures that column names and types remain consistent
Automatically handles errors and skips unreadable files
Extracting Tables Anywhere in Excel
Some Excel sheets are messy, with tables not starting in the first
row. extract_table():
Searches for a table starting at a known header (
start_column)Optionally identifies multiple tables per sheet (
table_mode = "all")Adds metadata columns: file name, sheet, and table ID
tbl <- extract_table(
path = "data/example.xlsx",
sheet = "Sheet1",
start_column = "Sample ID"
)
tblThis function is particularly useful when:
You don’t know the exact row where the table starts and the tables are in different locations in each sheet or file
The table is somewhat consistent (e.g., you know what the table starts with)
Multiple tables exist in the same sheet
You want to keep file and sheet metadata
Putting It All Together
These functions can be combined for batch processing of Excel data:
# Extract tables from all files, safely
read_data_tree(
path = here(),
ext = "xlsx",
recursive = TRUE,
reader = extract_table,
start_column = "Sample ID",
table_mode = "all",
safely = TRUE,
id_cols = FALSE
)In the above example, the following occurs:
- All files in the base (
here::here()) and sub directories (recursive = TRUE) with the extension “.xlsx” are listed - Tables from excel files are automatically located and read via the
extract_tablefunction passed toreader - If there are multiple tables matching criteria on the same sheet,
both are read (
table_mode = "all")
An example output looks like the following:
# A tibble: 40 × 6
sheet_name `Sample ID` Value Notes Extra source_file
<chr> <chr> <chr> <chr> <chr> <chr>
1 clean_simple A1 10 ok NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
2 clean_simple A2 12 ok NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
3 clean_simple A3 15 ok NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
4 offset_table B1 5 low NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
5 offset_table B2 8 mid NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
6 dynamic_width C1 1 x NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
7 dynamic_width C2 2 y NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
8 explicit_last_col D1 100 a ignore C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
9 explicit_last_col D2 200 b ignore C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
10 internal_blanks E1 1 NA NA C:/Projects/trashpanda/test/extract_table_test_sheets - Copy.xlsx
# ℹ 30 more rows