Skip to main content
A hiring manager's screen showing a clean GitHub repository with a strong README, a result chart, and a requirements file, next to a discarded tab full of generic Titanic and Iris notebooks
From the blog · by Ali Jabbary

The data/ML portfolio that actually gets you hired in 2026 (projects beat certificates)

Ali Jabbary
Ali Jabbary
M.Sc., P.Eng.
9 min read
#learning#career#data-science#machine-learning#portfolio#job-search

Article Summary

What hiring managers really look at in 90 seconds, why Titanic and Iris hurt you, and how to pick and present a project that gets read.

A student showed me his portfolio last month and asked why he wasn't getting callbacks. It was four projects: Titanic survival prediction, Iris flower classification, the MNIST digits, and a "house prices" notebook. Clean code. Tidy charts. Genuinely fine work.

I had to tell him the uncomfortable thing: a hiring manager who opens that GitHub doesn't see four projects. They see the four projects every bootcamp assigns, and they close the tab. Not because the work is bad — because it tells them nothing about you that it doesn't also tell them about the other 300 applicants who submitted the identical four.

This isn't a "you need more certificates" post. It's the opposite. Let me walk through what actually moves the needle in 2026 hiring, why the famous beginner datasets can quietly hurt you, and how to build and present one project that does more than ten of those ever could.

What hiring managers actually look at (it's less than you think)

Here's the reality that took me years to internalise: the person screening you is busy, slightly skeptical, and giving your application maybe ninety seconds before deciding to dig in or move on. Many of them open your GitHub before they finish reading your resume.

In those ninety seconds they are not evaluating your model's F1 score. They're answering one question: "Can this person take a messy, ambiguous problem and turn it into something useful, the way they'd have to on my team?"

Everything that helps answer "yes" is signal. Everything else is noise. Current hiring guidance is remarkably consistent on what counts as signal:

  • End-to-end, business-flavoured projects that show you understood the problem, the data, the trade-offs, and the impact — not just that you can call .fit().
  • Engineering quality, not just modelling. A repo with a real README, a requirements.txt, sane folder structure, and ideally some tests reads as "this person can ship." A lone .ipynb of matplotlib charts reads as "this person did a course."
  • Your reasoning, made visible. Why this dataset, why this model, why these trade-offs. Hiring managers want to see decisions, because the job is decisions.
  • Two or three polished things beat ten unfinished ones. Depth over breadth, every time.

Notice what's not on that list: the number of algorithms you've touched, your Kaggle rank, or how many courses you finished. Those feel like progress to the learner and register as nothing to the hirer.

Why Titanic and Iris actively hurt you

Let me be precise, because "don't use Titanic" is advice that gets parroted without the reason.

The famous beginner datasets — Titanic, Iris, MNIST, the Boston/California housing sets — are excellent for learning. I have students use them all the time to practice a technique in isolation. That's their job. The problem is purely about what they signal when they show up in a portfolio, where the audience and the purpose are completely different.

Three concrete reasons they backfire:

  1. They're pre-cleaned. The hard, valuable, real part of this work is everything before the model: finding data, wrangling it, deciding what to do about the nulls and the duplicates and the column that's secretly a date stored as text. A pre-cleaned tutorial dataset skips exactly the skills the job is made of. You're showcasing the easy 20% and hiding the hard 80%.
  2. The question is already answered for you. "Predict survival" comes pre-packaged. Nobody at a company will ever hand you a tidy CSV and a clearly-stated target. They hand you a vague business worry and a swamp of data and expect you to find the question. A canned-question project can't demonstrate the one skill that's hardest to teach.
  3. They scream "indistinguishable beginner." Recruiters have seen the Titanic project thousands of times. Including it doesn't just fail to differentiate you — it actively places you in the "did the standard tutorials, hasn't done real work yet" bucket. You blend into the crowd at the precise moment you're trying to stand out.

A portfolio's job is to differentiate, and a dataset everyone uses can't differentiate anyone. Keep those projects on your laptop as practice. Keep them off your GitHub front page.

How to pick a project that's worth your time

The good news: a great portfolio project isn't about more skill than you have. It's about better choices up front. I look for three ingredients, and I'd rather a student nail these in one project than churn out five without them.

1. A real question someone might actually care about. Not "predict X on dataset Y," but something with a stakeholder, even an imaginary one. "Which of my city's bus routes are chronically late, and does weather explain it?" "Can I predict which of my own newsletter subscribers are about to unsubscribe?" If you can finish the sentence "this matters because someone could do something with the answer," you have a real question. Bonus: pick something you genuinely care about. It shows, and it'll carry you through the boring middle.

2. Messy, ideally self-sourced data. This is the differentiator. Pull from a public API, scrape (politely, legally) a site, stitch together two government datasets that weren't meant to be joined, use your own exported data from an app you use. The moment your project includes "here's how I got the data and the three days I spent cleaning it," you've leapt past 90% of portfolios. The mess is the portfolio. Document the mess. Show the column that was a disaster and what you did about it.

3. A result that ships or gets written up — not a notebook that ends. A notebook that stops at a confusion matrix leaves the reader to guess "so what?" Close the loop. That can mean deployed — a small Streamlit or Gradio app, a tiny API, a dashboard someone can click — which is the strongest possible signal because it proves the thing actually runs outside your machine. Or it can mean a genuine write-up: a clear post that states the question, the approach, what you found, what surprised you, and what you'd do with more time. A written conclusion beats a deployed app that no human can interpret. Ideally do both.

If a topic is genuinely hot right now and you want to ride it: a small RAG system — something that answers questions over a set of PDFs or docs you actually care about — is about as current as it gets in 2026, because retrieval-augmented generation is the pattern half of these teams are building on. But a well-executed, original analysis of a problem you understand will always beat a half-baked project chasing a buzzword. Substance first.

How to present it so it gets read

You've done the work. Now make ninety seconds enough for a stranger to get it. This is the step most people skip, and it's nearly free.

Each project gets its own repository. Not one mega-repo with twelve folders. A clean, separate repo per project reads as "finished thing," which is what you're selling.

The README is the product. The code is the evidence. Assume the reader will judge the entire project on the README alone — because many will, and they'll only open the code if the README earns it. A README that works:

  • One or two sentences up top: what question this answers and why anyone should care. Lead with the result, not the tech stack.
  • A screenshot or a chart of the punchline, immediately. Visual proof beats paragraphs.
  • The story: where the data came from, what was messy about it, the key decisions and trade-offs, what you found.
  • How to run it. requirements.txt present, steps that actually work if someone clones it.
  • An honest "limitations and what I'd do next." This single section, done with real self-awareness, signals maturity louder than any model metric. It says you know the difference between a portfolio piece and production — which is exactly the judgment they're hiring for.

Show the engineering, lightly. A requirements.txt, a sensible folder layout (data/, src/, notebooks/), and maybe a test or two will set you apart from the wall of bare notebooks. You don't need full MLOps for a portfolio piece. You do need it to look like code a colleague could pick up.

Write down your decisions. Whether in the README or a short separate note, explain why you chose this model, these features, these parameters. Showing your reasoning is showing the exact skill the job is made of. "I tried a gradient-boosted model but a plain logistic regression was within 1% and far easier to explain, so I shipped that" tells a hiring manager more than any leaderboard score.

The honest part

I won't pretend a great portfolio guarantees a job. The 2026 market for early-career data and ML roles is genuinely competitive, and plenty of factors — timing, location, the brutal arithmetic of applications-per-opening — are outside your control. A portfolio is not a magic key.

What it is is the one lever you fully own. You can't control the market. You can absolutely control whether the person who opens your GitHub thinks "huh, this one actually did something real" or "Titanic again, next." That difference is small to build and large in effect, and it's entirely in your hands.

The recap

  • Hiring managers spend ~90 seconds and ask one question: can you turn a messy, ambiguous problem into something useful?
  • Titanic, Iris, MNIST are great for learning and poison for portfolios — pre-cleaned, pre-questioned, and seen a thousand times. Keep them off your front page.
  • Pick a project with a real question, messy (ideally self-sourced) data, and a result that deploys or gets written up. One of those beats ten canned notebooks.
  • The README is the product. Lead with the result, show the mess, document your decisions, be honest about limitations. Each project its own clean repo.
  • Projects beat certificates because projects show judgment, and judgment is what's actually being hired.

If you've got a half-finished notebook and a nagging sense that it's not "portfolio-ready," that hunch is usually right — and it's usually fixable in a focused afternoon of choosing a sharper question and writing a README that respects the reader's ninety seconds. Turning a fine-but-forgettable project into one that gets a callback is one of my favourite things to work through with someone, because it's where a little outside perspective pays off fast.

Enjoyed this post? Get the next one in your inbox.

A short, useful email when there's a new tutorial, study guide, or career-prep post on the blog. No spam, unsubscribe anytime.

Ali Jabbary

Written by Ali Jabbary

M.Sc., P.Eng. • Expert Data Scientist & ML Engineer with 10+ years of experience. 500+ students helped worldwide. Specializing in Python, AI/ML, and turning complex problems into simple solutions.

Want 1-on-1 help on this? Here's where to go next:

More articles you might find useful.

Book a free callMessage Ali