Build a working information pipeline: discover content, score it, cluster it, extract what matters. All on your GPU.
Course orientation and environment setup
Reproducible pipelines, scoring, feedback loops
Build a multi-stage data pipeline where each stage declares its inputs and outputs. Run ML inference locally on consumer hardware.
25 min
Score content across independent quality dimensions where each axis tells you something different. Graduated classification over binary keep/discard.
20 min
Build feedback loops that measure accuracy, identify blind spots, and improve a system through operation.
25 min
Discovery mechanisms, interactive review, scale, extraction
Discovery mechanisms that find what you didn't know to search for. Measure the unexpected systematically.
30 min
Build interactive review tools where human judgments feed back into the pipeline. Browsing becomes quality improvement.
25 min
Scale pipelines to larger data with selective execution control. Freeze what's stable, iterate on what's changing.
25 min
Extract structured resources from scored content. Catch data integrity problems before they propagate downstream.
20 min
Clustering, architectural evaluation, visualization, domain transfer
Choose the right dimensionality reduction and clustering algorithms for your data. UMAP, PCA, HDBSCAN, k-means, and why parameters matter as much as algorithm choice.
30 min
Evaluate whether architectural decisions produce meaningful results or structural artifacts. Strip complexity while preserving what works.
30 min
Build interactive hierarchical visualizations for navigating large classified datasets, from product taxonomies to knowledge bases.
30 min
Apply the full pipeline to any domain by swapping the source adapter. The architecture is the reusable part.
20 min