A structured visual guide to the core components of neural networks from individual neurons and weights to activation functions, loss functions, and optimization — grounded in hands-on exploration with TensorFlow Playground.
The diagram below shows how data flows through a neural network — entering at the input layer, being transformed through hidden layers, and producing a result at the output layer. Each circle is a neuron. Each connection carries a weight.
Raw data (e.g. coordinates, pixel values) enters the network as numeric features.
Each connection multiplies the input by a learned weight, scaling its influence.
An activation function (e.g. ReLU, Tanh) decides how strongly the neuron fires.
Signals propagate layer by layer toward the output, building richer representations.
The loss function measures how far the output is from the correct answer.
The optimizer updates weights to reduce loss, repeating until performance converges.
Hands-on experimentation with the Neural Network Playground revealed how design choices interact. The following are key findings from training across multiple dataset types and configurations.
Required at least 1 hidden layer with 4+ neurons and Tanh activation to achieve a clean circular boundary. Linear activation completely failed to separate the classes.
The hardest dataset. Required 2–3 hidden layers with 6+ neurons each. The spiral pattern is highly non-linear and exposed how shallow networks plateau in performance.
At noise level 0, models achieved near-perfect boundaries. At noise 50, even complex networks showed high test loss demonstrating the overfitting problem in real data.
A learning rate of 0.03 provided stable, reliable convergence across datasets. Too low (0.001) made training painfully slow; too high (1.0) destabilized the loss entirely.
Adding X², Y², and X·Y as input features dramatically helped the XOR and circle datasets — the network could express polynomial relationships without needing more layers.
Deeper networks with more neurons took longer to converge but achieved better test loss on complex data. Simpler networks were faster but bottlenecked on non-linear patterns.
Neural networks are often described as "black boxes" systems that produce outputs without clear explanations. Visualizing their internal components breaks open that box. When you can see how weights shift, how activation functions shape decision boundaries, and how loss drops over time, the network stops being mysterious and starts being understandable.
This exercise reinforced that the six core components layers, neurons, weights, activation functions, loss functions, and optimization algorithms do not operate in isolation. They form an interdependent system where every design choice ripples through the whole. Choosing the wrong activation function can make layers useless. Setting the wrong learning rate can destabilize optimization entirely. Architecture decisions cascade.
Depth and width of the network must match the complexity of the problem more is not always better.
Without activation functions, stacked layers collapse into a single linear transformation losing all their power.
The loss function is the compass. Without an accurate signal, the optimizer has no direction to improve.
Learning rate, batch size, and regularization must be balanced no single setting is universally optimal.
Seeing decision boundaries change in real time builds the intuition that equations alone cannot provide.
High noise showed that even well-architected networks cannot overcome fundamentally noisy or insufficient data.