Reddit Discovery & Analysis with Claude
Build a working information pipeline: discover content, score it, cluster it, extract what matters. All on your GPU.
Getting Started
Course orientation and environment setup
Pipeline & First Result
Reproducible pipelines, scoring, feedback loops
Bootstrap & Pipeline
Build a multi-stage data pipeline where each stage declares its inputs and outputs. Run ML inference locally on consumer hardware.
25 min
Improving the Score
Score content across independent quality dimensions where each axis tells you something different. Graduated classification over binary keep/discard.
20 min
Growing the Pipeline
Build feedback loops that measure accuracy, identify blind spots, and improve a system through operation.
25 min
Discovery & Interface
Discovery mechanisms, interactive review, scale, extraction
Beyond Keywords
Discovery mechanisms that find what you didn't know to search for. Measure the unexpected systematically.
30 min
Dashboard & Review
Build interactive review tools where human judgments feed back into the pipeline. Browsing becomes quality improvement.
25 min
Scale & Evidence
Scale pipelines to larger data with selective execution control. Freeze what's stable, iterate on what's changing.
25 min
Quality & Resources
Extract structured resources from scored content. Catch data integrity problems before they propagate downstream.
20 min
Clustering & Hierarchy
Clustering, architectural evaluation, visualization, domain transfer
Why Cluster?
Choose the right dimensionality reduction and clustering algorithms for your data. UMAP, PCA, HDBSCAN, k-means, and why parameters matter as much as algorithm choice.
30 min
The Doubt Phase
Evaluate whether architectural decisions produce meaningful results or structural artifacts. Strip complexity while preserving what works.
30 min
Hierarchy & Sunburst
Build interactive hierarchical visualizations for navigating large classified datasets, from product taxonomies to knowledge bases.
30 min
What It's For
Apply the full pipeline to any domain by swapping the source adapter. The architecture is the reusable part.
20 min