New to AI? This is the map before the buzzwords do cartwheels.
Launchpad is mission control. We chart the whole neighborhood of AI: machine
learning, deep learning, neural networks, CNNs, RNNs, transformers, attention, and
robotics, then climb a gentle ladder from Beginner to
Expert, meeting real math and real code along the way.
By the end you will be able to explain:AI vs ML vs deep learning, in plain wordsWhich tool fits images, sequences, or contextWhat a single neuron and softmax actually computeHow robots sense, think, act, and learn
Mission map: the concepts orbiting you. Visit each in its own room later.
The big picture
How it all connects, and why.
These words aren't a random pile. They nest and feed into each other. This living map
shows the family tree: AI contains Machine Learning,
which contains Deep Learning, which grows the architectures behind
today's models. Tap any node to see what it is and why it's wired to its neighbours, or
trace a whole story end to end.
Mission guide
The full map: what each room teaches, why it matters, and a path through them.
Twelve rooms, grouped into tracks. The two Foundations rooms
(Machine Learning and Deep Learning) are the deep ones: broad and detailed, the backbone
everything else is built on. Here's a path that builds up nicely, but every room stands alone, so
jump in wherever you're curious.
Prefer to ease in first? Keep scrolling. The rest of this Launchpad teaches the core ideas
on a Beginner → Intermediate → Advanced → Expert ladder before you head into the rooms.
Level 1 · Beginner
What is AI? The friendly map.
Big ideas first, in plain English. No math yet, just the lay of the land.
Explain
AI is not one thing. It is a family.
Some members recognize photos, some read sentences, some generate text, some drive robots.
The whole trick of getting started is knowing which cousin does what. Tap a card to
meet each one with everyday examples.
Interact
Worked example
One photo, the whole family helps
You snap a picture of a dog. A CNN recognizes "dog". A model writes a
caption ("a brown dog on grass") using a transformer that uses
attention to focus on the right words. If this ran on a robot pet, it
would then act: wag, follow, or fetch. Every one of those is a different
branch of the same AI family.
Visualize
The family tree, drawn as a mind map.
AI is the whole school. ML is the student who learns from examples. Deep learning is the
student with many notebooks (layers). Transformers are the student who keeps asking,
"who should I pay attention to?"
Lines link the central idea (AI) to the relatives. Resize the window; it stays crisp.
Level 2 · Intermediate
How the pieces nest and which tool fits the job.
Now we look at the mechanics: who contains whom, and how to pick a model.
Explain
AI ⊃ ML ⊃ Deep Learning.
These words are not synonyms; they are nested boxes. Deep learning is a kind of
machine learning, which is a kind of AI. Click a ring to see what lives there.
Interact
Worked example
Spam filter, three ways
A hand-written rule ("block any email saying FREE MONEY") is plain AI, no learning.
A model that learns spam words from thousands of labelled emails is ML. A model that
reads the raw text with stacked neural layers and figures out the features itself is
deep learning. Same task, three depths of the same box.
Explain
CNN vs RNN vs Transformer: which tool for which job?
Different shapes of data want different models. Pick what your data looks like and see the
recommended tool with a reason.
Interact
Worked example
Same word, different tool
To find a cat in a photo → CNN (local patches). To predict tomorrow's
temperature from the last 30 days → RNN or a small transformer (order matters).
To answer a question about a paragraph → transformer (every word can look at
every other word at once).
Explain
Robots: sense → think → act → learn.
AI usually lives in software. Robotics gives it a body. The loop never stops: sensors gather
signals, models interpret them, planners choose an action, motors move, then it improves.
Tap a stage.
Interact
Worked example
A robot vacuum, one lap
Sense: bump and cliff sensors plus a camera. Think: "wall ahead, carpet below".
Act: turn 90° and keep cleaning. Learn: remember the room map so next time
it is faster. That is the whole loop, once per second.
Level 3 · Advanced
Meet the real math, gently.
Two tiny equations power almost everything. Here they are, in words, with symbols, and with a knob to turn.
Explain
One artificial neuron.
A neuron multiplies each input by a weight (how much it matters), adds them up with a
bias (a base level), and squashes the result through an activation so the
output stays in a friendly range. That is the whole atom of a neural network.
In words: weigh each input, sum them, add a bias, then squeeze through the sigmoid \(\sigma\) so the answer lands between 0 and 1.
\(x_i\): the inputs (the evidence)
\(w_i\): weights (how much each input matters; can be negative)
\(b\): bias (shifts the decision left or right)
\(\sigma\): the activation (here sigmoid), turning a raw score into a 0 to 1 signal
\(y\): the neuron's output
Interact
The curve is the sigmoid. The dot is where your current weighted sum lands.
Worked example
Should I bring an umbrella?
Let \(x_1\) = cloudiness, \(x_2\) = humidity. Give clouds a big weight \(w_1=0.8\) and
humidity \(w_2=1.5\). With a slightly negative bias the neuron stays calm on clear days but
fires (output near 1 = "yes, umbrella") once the weighted score climbs. Drag the sliders and
watch the dot cross 0.5.
Explain
Why stack layers? One neuron draws a line; two layers draw a curve.
A single neuron's "yes" region is always cut by a straight line
(the place where \(w_1x_1+w_2x_2+b=0\)). Some patterns, like the four dots below
where diagonal corners share a class, simply cannot be split by any one line. Add a
hidden layer and the boundary is free to bend, wrapping the dots correctly. That
bending power is the whole reason deep networks exist.
Visualize
Pick the model and watch the boundary change shape:
Greener = the model says "class A"; clay = "class B". The bright contour is the 0.5 boundary, the actual decision line. The single neuron can only split the dots with a straight cut, so it always misses at least one; two layers bend around them.
Explain
Softmax: turning scores into probabilities.
Models output raw scores (logits) that can be any number. Softmax exponentiates them (so
bigger scores get much bigger) then divides by the total, giving positive numbers that add
up to 1. It is how a model says "I'm 70% sure it's a cat, 25% a dog, 5% a fox."
In words: raise \(e\) to each score, then normalize by the sum so the outputs are probabilities that total 1.
\(z_i\): the raw score (logit) for option \(i\)
\(e^{z_i}\): exponentiation, which sharpens differences and keeps everything positive
\(\sum_j e^{z_j}\): the total over all options, the normalizer
output: a probability in \([0,1]\); all of them sum to \(1\)
Interact
Worked example
Why exponentials, not just "share of the total"?
With scores 2, 1, 0 a plain share would give 67/33/0 (and breaks on negatives). Softmax gives
roughly 66/24/9, close, but it always stays positive and reacts sharply when one score
pulls ahead. Slide Cat up to 6 and watch it dominate almost completely.
Explain
One knob on softmax: temperature.
Divide every score by a temperature \(T\) before softmax and you control how decisive the
model is. Low \(T\) (\(<1\)) sharpens: the top option dominates. High \(T\) (\(>1\)) flattens:
everything moves toward an even split. This is the exact knob a chat model turns when it samples the
next word: low for focused, factual answers; high for varied, creative ones.
In words: shrink or stretch the gaps between scores first, then run the same softmax. The bigger the gaps, the more confident the distribution.
\(T\to 0^{+}\): gaps explode; the highest score wins almost all the probability (greedy)
\(T=1\): ordinary softmax, unchanged
\(T\to\infty\): gaps vanish; the distribution approaches uniform \(1/n\)
Worked example
Same scores, three temperatures.
Take the scores \(2,\,1,\,0\) (Cat, Dog, Fox). At \(T=1\) softmax gives about
\(66.5\,/\,24.5\,/\,9.0\). Cool it to \(T=0.5\) and the scores become \(4,2,0\), so Cat jumps to
about \(86.7\%\). Heat it to \(T=2\) and the scores become \(1,\,0.5,\,0\), flattening to roughly
\(50.6\,/\,30.7\,/\,18.6\). Same evidence, very different confidence: that is temperature.
Latest (2024 to 2026)
Where you'll meet this knob.
Every current chat model (GPT-class, Claude, Gemini, Llama) ends each step with logits,
then a softmax over the whole vocabulary, then samples the next token. The temperature
setting in an API call is exactly the \(T\) above; top-p / top-k just trim which
options stay in the running first. Reasoning-tuned models often sample at low temperature for math
and code, and higher for brainstorming. The tiny equation here is the production knob.
Explain
Scoring the guess: cross-entropy.
Softmax says what the model predicts; a loss says how wrong it was, giving a
single number to push down during training. For classification that number is cross-entropy: it looks
only at the probability the model gave the true answer and penalizes being unsure about it.
In words: with a one-hot true label, every term is zero except the true class, so the loss is just the negative log of the probability you gave the right answer.
\(y_i\): the true label, \(1\) for the correct class and \(0\) elsewhere (one-hot)
\(p_i\): the softmax probability the model assigned to class \(i\)
\(-\log p_{\text{true}}\): small when the model is confident and right, large when it is unsure or wrong
Worked example
Confident and right vs. unsure.
Suppose the true class is Cat. If the model gave Cat \(p=0.665\), the loss is
\(-\log 0.665 \approx 0.41\). Nudge the scores until Cat reaches \(0.90\) and the loss drops to
\(-\log 0.90 \approx 0.11\). A perfect \(p=1\) gives loss \(0\); a near-miss \(p=0.01\) gives
\(\approx 4.6\). Training repeatedly nudges weights to shrink this number; that is learning, in
one equation.
This is the link to the next rooms: softmax (here) feeds cross-entropy, and gradient descent shrinks it, the loop behind Deep Learning and Transformer training alike.
Level 4 · Expert
Real vocabulary, real code, and your next mission.
The formal names you'll meet in courses, papers, and code, then a challenge and the launch board.
Reference
The formal vocabulary table.
Launchpad gives the friendly story; this gives the names you'll be quizzed and interviewed on.
AIArtificial Intelligence: systems that perform tasks normally requiring human intelligence.MLMachine Learning: learning patterns from data instead of hand-written rules.Deep LearningNeural networks with many layers that learn their own features (representations).Neuron / PerceptronThe atom: a weighted sum plus bias passed through an activation function.CNNConvolutional Neural Network: shares small filters over local image/grid patches.RNNRecurrent Neural Network: processes sequences step by step, carrying a hidden state.TransformerArchitecture built on self-attention + feed-forward blocks; processes a whole sequence in parallel.AttentionA weighted blend: each token decides how much to focus on every other token.SoftmaxTurns a vector of scores into a probability distribution that sums to 1.LogitsThe raw, un-normalized scores a model outputs before softmax.RoboticsEmbodied AI: sensors, perception, planning, control, and actuators in a sense-think-act loop.EmbeddingA learned vector that represents a word, image patch, or item as numbers.
Code
PyTorch mini examples.
The same ideas in real code. Toggle it open and read the per-line notes underneath.
From neuron to transformer
import torch
import torch.nn as nn
# 1) A single neuron: weighted sum + bias, then an activation
neuron = nn.Sequential(nn.Linear(2, 1), nn.Sigmoid())
# 2) A tiny multilayer network (an MLP): input -> hidden -> output
mlp = nn.Sequential(
nn.Linear(4, 8), # 4 inputs to 8 hidden units
nn.ReLU(), # nonlinearity
nn.Linear(8, 3), # 8 hidden to 3 class scores (logits)
)
logits = mlp(torch.randn(5, 4)) # 5 examples
probs = torch.softmax(logits, dim=-1) # scores -> probabilities
# 3) A CNN layer: scans image-like grids with shared filters
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3)
# 4) An RNN cell: walks a sequence carrying hidden state
rnn = nn.GRU(input_size=10, hidden_size=20, batch_first=True)
# 5) A transformer encoder layer: attention + feed-forward
layer = nn.TransformerEncoderLayer(d_model=32, nhead=4, batch_first=True)
tokens = torch.randn(2, 5, 32) # 2 sentences, 5 tokens, 32 features
out = layer(tokens)
nn.Linear(2, 1) is literally \(\sum_i w_i x_i + b\): a weighted sum with a bias.
nn.Sigmoid / nn.ReLU are activation functions, the squashes that let networks learn curves, not just straight lines.
torch.softmax converts the 3 output logits into 3 probabilities that sum to 1 (the formula from Level 3).
nn.Conv2d slides a small 3×3 filter across the image, sharing the same weights everywhere.
nn.GRU is a modern RNN cell that reads a sequence one step at a time, updating a hidden memory.
nn.TransformerEncoderLayer bundles multi-head attention, a feed-forward network, layer norm, and residual connections into one block.
Challenge
Checkpoint quiz.
No grades, no doom, just proof the map is sticking. Each answer gives instant feedback.
Launchpad quiz
0 / 16
Mission readiness confirmed. You earned your launch clearance.
Reflect
These are just for you; nothing is sent or stored anywhere.
Launch
You are ready. Pick your first room.
You've got the map and the core math. The other rooms (with what each one teaches)
are laid out in the mission guide at the top.
A good first step is Machine Learning (the big idea behind all of it).