pipespy | Jason Matthew

How It Started

I spend a lot of time piping data between commands — tailing logs, transforming JSON, filtering CSV exports. The workflow is always the same: build up a pipeline one stage at a time, run it, and hope the output looks right. When it doesn’t, the debugging options are basically | head, | wc -l, or staring at a wall of text scrolling past. pv tells you how many bytes moved, but not what they looked like.

I wanted something I could drop into any pipeline that would show me the actual data in context — throughput, record samples, format — without interfering with the pipeline itself. And I wanted an excuse to write something non-trivial in Rust.

What It Does

Drop pipespy anywhere in a shell pipeline and it shows a real-time TUI on stderr while data passes through stdout untouched:

cat events.jsonl | pipespy | jq '.users[]' | grep "active" > out.txt

It auto-detects JSON, CSV, and plain text, then syntax-highlights the live record samples. There are two display modes — a compact view that fits in a split terminal pane (throughput, sparkline, last N records), and a fullscreen mode with a line length histogram, min/max/avg stats, and a larger record viewer. Press f to toggle between them, q to detach.

There’s also a --quiet mode that skips the TUI entirely and prints a one-line summary when the pipeline completes — useful for scripts where you just want to know throughput.

The Three-Thread Model

The core design is three threads sharing a ring buffer:

Reader — reads stdin line-by-line into a shared VecDeque behind Arc<Mutex<>> with a Condvar. Records per-line byte lengths for stats.
Writer — drains the buffer to stdout. Blocks when the buffer is empty, wakes when data arrives.
TUI — samples the stats collector every 500ms and renders to stderr via ratatui. Never touches the data path.

The Condvar does the heavy lifting for flow control. The reader blocks when the buffer hits capacity (default 8MB), the writer blocks when it’s empty, and mark_done() wakes both sides on EOF. It’s simple — deliberately so. A lock-free ring buffer would be faster, but Mutex + Condvar is correct and the bottleneck is I/O, not the buffer.

The stderr Problem

This was the gnarliest bug to track down. The TUI renders to stderr via CrosstermBackend::new(stderr()), which works great — except crossterm’s enable_raw_mode() calls tcsetattr on stdin, which is a pipe when you’re in a pipeline. It fails silently or panics depending on the platform.

The fix was bypassing crossterm’s raw mode entirely and calling libc::tcsetattr directly on stderr’s file descriptor:

let stderr_fd = stderr().as_raw_fd();
let orig_termios = enable_raw_mode_on_fd(stderr_fd)?;

Similarly, keyboard input (event::poll / event::read) defaults to reading from stdin — which is where the data comes from. Enabling crossterm’s use-dev-tty feature redirects event reading to /dev/tty instead, so f and q work while data flows through stdin.

These are the kinds of bugs that don’t exist in any tutorial or example project, because those examples always assume stdin is interactive. It took a while to figure out, but it’s the most interesting part of the codebase.

Format Detection and Highlighting

Format detection runs once, after 4+ sample lines have accumulated. The logic is simple: if every non-empty line parses as valid JSON, it’s JSON. If every line has the same number of commas (≥1), it’s CSV. Otherwise, plain text. You can override with --json, --csv, or --no-detect.

The JSON highlighter is a character-by-character state machine that tracks whether a string is a key or value based on the preceding structural character ({ or , means key, : means value). Keys get green, values get cyan, numbers get yellow. CSV just cycles through six colours per column with grey commas.

Automated Distribution

I wanted the installation experience to be frictionless, so I set up four channels — all triggered by a single git tag:

A GitHub Actions workflow cross-compiles for macOS and Linux (arm64 + amd64)
Tarballs get attached to a GitHub Release
cargo publish pushes to crates.io
A final job computes SHA256 hashes, clones my homebrew-tap repo, writes an updated Ruby formula with the new version and hashes, and pushes

The result: brew install jasonm4130/tap/pipespy works within minutes of tagging a release. No manual steps, no copy-pasting hashes.

What I Learned

Rust’s ownership model is genuinely useful for concurrent code — not in the “it prevents data races” textbook sense, but practically: the compiler caught two places where I was holding a lock across a blocking call. In Go or Python those would have been deadlocks discovered at 2am. In Rust they were compile errors.

The TUI ecosystem (ratatui + crossterm) is mature and well-documented, but the moment you step outside the “normal” use case — like rendering to stderr with piped stdin — you’re on your own. That’s fine, and honestly that’s where the fun is, but it’s worth knowing going in.

Building the distribution pipeline took almost as long as the tool itself, and it was worth every minute. The difference between “you can clone and build it” and “you can brew install it” is the difference between a side project and something that feels real.

How It Started#

What It Does#

The Three-Thread Model#

The stderr Problem#

Format Detection and Highlighting#

Automated Distribution#