Python for data science — starter kit
The pipeline I walk every new Python data student through: environment setup, NumPy + pandas idioms, the four plots that answer 80% of questions, and a project skeleton you can clone for your next exploration.
A curated catalog of 42 of the very best public tutorials, books, video courses, and tools for Python, machine learning, data science, and the math underneath. Every link goes to a real, free resource — no signup walls, no fabricated rankings.
42 curated resources · all free · all public
The canonical, version-tracked walk-through maintained by the Python core team. Start here.
Long-form, well-edited tutorials covering everything from f-strings to async to packaging.
Al Sweigart's practical book — full text free online. Best onboarding to Python for non-programmers.
Harvard's free CS50 spin-off taught by David J. Malan. Lecture videos, problem sets, and certificate.
The community style guide. Most professional Python codebases linter against this.
Type hints turn Python into a much safer language. Mypy is the reference type checker.
Iconic PyCon talks ("Beyond PEP 8", "Transforming Code into Beautiful Idiomatic Python") that change how you read Python.
The standard library async runtime. Cooperative concurrency for IO-bound code.
Async, typed, automatic OpenAPI. The default for new Python APIs.
Minimal boilerplate testing with fixtures and powerful assertions. Industry standard.
A modern, lockfile-driven alternative to pip + virtualenv + setup.py.
A more opinionated style guide than PEP 8 — used inside Google for production Python.
Jeremy Howard's top-down course. Train a state-of-the-art model in lesson one, then learn why it works.
The modern Coursera rewrite of the legendary Stanford ML course. Audit free; certificate paid.
The reference manual for classical ML in Python — every estimator with maths, code, and worked examples.
Peer-reviewed visual explanations of deep-learning concepts. Archived (no new posts) but evergreen.
Free, interactive deep-learning textbook with runnable PyTorch / MXNet / JAX notebooks for every chapter.
Karpathy builds backprop, an MLP, and a small GPT from scratch on YouTube. Required viewing for anyone serious about LLMs.
Andriy Burkov releases the entire book under a "read first, buy if you like it" policy. Concise overview.
Free official course covering Transformers, fine-tuning, RLHF, and the HF ecosystem.
Amazon's ML University publishes Distill-style interactive explainers of core concepts (bias-variance, ROC, etc.).
A short structured tour of ML fundamentals with TensorFlow exercises. Free and well-edited.
Bite-sized, hands-on courses: Pandas, intro ML, feature engineering, SQL, viz. All run in-browser.
The canonical reference for the most-used dataframe library in Python.
Full text free on GitHub. The most-recommended single book for the Python data-science stack.
Practical, opinionated advice on building charts that actually communicate. Hundreds of free posts.
OLS, GLM, time series, mixed effects. The companion to scikit-learn for inferential statistics.
Even if you live in Python, this book's framing of the tidy data workflow is worth a read.
The notebook environment most data work happens in. Free, open, and runs locally.
Run SQL over Parquet / CSV / pandas DataFrames at speed, with zero server setup.
Vega-Lite for Python. A more principled grammar of graphics than matplotlib for exploratory work.
Free official PDF from the authors. The most readable intro to the statistical foundations of ML.
The visual intuition for vectors, matrices, determinants, and eigen-decomposition. Watch before any ML course.
Companion series. Builds derivatives and integrals from first principles with stunning visualisations.
Free full curriculum with exercises and mastery checks. Best place to actually practise calculus, not just watch it.
Mechanics-first linear algebra with worked examples. Pair with 3Blue1Brown for intuition.
Paul Dawkins' famously clear lecture notes — Algebra, Calculus I/II/III, Diff Eqs. Hundreds of worked examples.
The full Strang lecture series. Watch this and you genuinely understand the subject.
David Jerison's lectures, full problem sets, and exams. The gold-standard introductory calc course.
Kalid Azad rebuilds e, i, the Fourier transform, calculus, and more from analogy-first explanations.
Free PDF from the authors. The linear algebra, calculus, probability, and optimisation that ML actually needs.
Free book teaching probability and statistics through Python. Practical, not symbol-heavy.
11 PDFs I built for my own students. The first six are email-gated; the cheat-sheet bundle below is a free direct download.
The pipeline I walk every new Python data student through: environment setup, NumPy + pandas idioms, the four plots that answer 80% of questions, and a project skeleton you can clone for your next exploration.
Copy-and-rename starter layouts for CLI tools, FastAPI services, ML notebooks, and small data jobs. Each template ships with sensible defaults — pyproject.toml, ruff/black config, a real test, and a CI workflow that just works.
Single-page reference for the algorithms students ask about most — linear / logistic regression, SVMs, trees + ensembles, k-NN, k-means, PCA — with the failure mode for each, the hyperparameter to tune first, and a one-line scikit-learn snippet.
A short read that turns "I made a chart" into "I made a clear chart." Covers the encoding hierarchy (position > length > angle > area > colour), when to break the rule, and a gallery of small-multiples redesigns.
The exact environment I help new students stand up in their first session: VS Code with the four extensions that earn their slot, terminal + git basics, virtualenv vs. conda, and the keyboard shortcuts that compound over months.
How I prep students for SWE / ML / data-science interviews: the 10 patterns that cover most coding questions, system-design framing for entry-to-mid roles, and the behavioural-question grid that gets you out of "I don’t know what to say."
Modern SQL for data analysts in 2026 — window functions, CTEs, the JOIN rules nobody told you, and the small set of queries that handle 90% of business questions.
DownloadModern Python OOP for 2026 — dataclasses with slots, Protocols over abstract base classes, composition over inheritance, and the four design patterns that pay rent in real codebases.
DownloadPandas 2.x / 3.0 workflow for 2026 — the PyArrow backend, copy-on-write semantics, and the small set of operations that handle 90% of real exploratory analysis.
DownloadNumPy patterns that show up in every ML codebase — array creation, broadcasting, vectorisation, the small set of ops that produce 95% of the speed wins.
DownloadThe slice of linear algebra that actually shows up in modern ML — vectors, matrices, projections, decompositions, and a working intuition for what matrix multiplication is really doing.
DownloadWe use your email only to send the PDFs and the occasional study tip. Unsubscribe anytime. Questions? Email Ali.
Resources get you started. A tutor who remembers you gets you unstuck. Free 30-min session, no credit card.