Dayo Portfolio
Portfolio Artifact 02  ·  Applied AI

Neural Network
Visual Framework

A structured visual guide to the core components of neural networks from individual neurons and weights to activation functions, loss functions, and optimization — grounded in hands-on exploration with TensorFlow Playground.

Deep Learning TensorFlow Playground Visual Communication AI Leadership
01 Neural Network Architecture

The diagram below shows how data flows through a neural network — entering at the input layer, being transformed through hidden layers, and producing a result at the output layer. Each circle is a neuron. Each connection carries a weight.

Input Layer
X
Y
Raw features fed into the network
Hidden Layer 1
Detects basic patterns
Hidden Layer 2
Combines patterns into concepts
Output Layer
Final classification or prediction
STEP 01
Input

Raw data (e.g. coordinates, pixel values) enters the network as numeric features.

STEP 02
Weight × Input

Each connection multiplies the input by a learned weight, scaling its influence.

STEP 03
Activation

An activation function (e.g. ReLU, Tanh) decides how strongly the neuron fires.

STEP 04
Forward Pass

Signals propagate layer by layer toward the output, building richer representations.

STEP 05
Loss

The loss function measures how far the output is from the correct answer.

STEP 06
Backprop

The optimizer updates weights to reduce loss, repeating until performance converges.

02 Core Components Defined
🧱
COMPONENT 01
Layers
A layer is a group of neurons that process information together at the same stage of the network. Every neural network has at least three types: an input layer (receives raw data), one or more hidden layers (extract and transform patterns), and an output layer (produces the final prediction or classification). Layers are the fundamental organizational unit of neural network architecture.
Playground observation: Adding more hidden layers allowed the network to solve the spiral dataset, a problem a single-layer network could not separate. Each additional layer enabled more complex decision boundaries.
Structural
COMPONENT 02
Neurons
A neuron (also called a node or unit) is the basic computational unit of a neural network. It receives one or more numeric inputs, multiplies each by its corresponding weight, sums the results together, and passes the total through an activation function to produce an output. Neurons are loosely inspired by biological neurons in the brain, which fire electrical signals when sufficiently stimulated.
Playground observation: Increasing the number of neurons per hidden layer from 2 to 6 dramatically improved the model's ability to fit the circle dataset, reducing training loss and producing a smoother decision boundary.
Computational Unit
⚖️
COMPONENT 03
Weights
Weights are numeric parameters assigned to each connection between neurons. They control how strongly one neuron influences another, a high weight amplifies a signal, while a low or negative weight suppresses it. Weights are what the network actually "learns": during training, the optimization algorithm continuously adjusts them to minimize prediction errors. A neural network with millions of parameters is essentially storing millions of tuned weight values.
Playground observation: The colored lines between neurons in the Playground represent weight values, blue for positive, orange for negative. As training progressed, the weights visibly shifted, and the decision boundary reshaped in response.
Learnable Parameter
🔀
COMPONENT 04
Activation Functions
An activation function is a mathematical formula applied to the output of each neuron. Without it, every layer would just be a linear transformation, and stacking them would have no benefit the whole network would behave like a single layer. Activation functions introduce non-linearity, allowing the network to learn complex, curved decision boundaries. Common choices include ReLU (Rectified Linear Unit), Tanh (hyperbolic tangent), Sigmoid, and Linear.
Playground observation: Switching from Linear to Tanh activation on the circle dataset caused the model to converge much faster and achieve a cleaner circular boundary linear activation failed to separate the classes at all.
Non-linearity
📉
COMPONENT 05
Loss Functions
A loss function (also called a cost function) measures the difference between the network's predictions and the true correct answers. It produces a single number, the "loss" that summarizes how wrong the model is. The goal of training is to minimize this number. For classification tasks, Cross-Entropy Loss is standard. For regression tasks, Mean Squared Error (MSE) is most common. The loss function is the signal that guides the entire learning process.
Playground observation: The test loss score displayed in the Playground dropped steadily as training progressed. When noise was increased to 50, the gap between training loss and test loss widened a classic sign of overfitting.
Error Measurement
🎯
COMPONENT 06
Optimization Algorithms
An optimization algorithm uses the loss function's output to update the network's weights in the direction that reduces error. The most fundamental optimizer is Gradient Descent, which calculates the slope of the loss and nudges weights downhill. Modern variants like Stochastic Gradient Descent (SGD) and Adam (Adaptive Moment Estimation) improve on this by adjusting learning rates automatically and processing data in mini-batches for efficiency. The learning rate controls how big each update step is.
Playground observation: Adjusting the learning rate from 0.001 to 0.1 caused the model to train much faster, but setting it too high (1.0) caused the loss to spike and the model to become unstable, overshooting the optimal weights.
Weight Updater
03 TensorFlow Playground Observations

Hands-on experimentation with the Neural Network Playground revealed how design choices interact. The following are key findings from training across multiple dataset types and configurations.

📊 Dataset: Circle

Required at least 1 hidden layer with 4+ neurons and Tanh activation to achieve a clean circular boundary. Linear activation completely failed to separate the classes.

🌀 Dataset: Spiral

The hardest dataset. Required 2–3 hidden layers with 6+ neurons each. The spiral pattern is highly non-linear and exposed how shallow networks plateau in performance.

📡 Effect of Noise

At noise level 0, models achieved near-perfect boundaries. At noise 50, even complex networks showed high test loss demonstrating the overfitting problem in real data.

Learning Rate Impact

A learning rate of 0.03 provided stable, reliable convergence across datasets. Too low (0.001) made training painfully slow; too high (1.0) destabilized the loss entirely.

🔢 Feature Engineering

Adding X², Y², and X·Y as input features dramatically helped the XOR and circle datasets — the network could express polynomial relationships without needing more layers.

⏱️ Training Time vs. Complexity

Deeper networks with more neurons took longer to converge but achieved better test loss on complex data. Simpler networks were faster but bottlenecked on non-linear patterns.

04 Summary
Key Insights

Why Visualizing Neural Networks Matters

Neural networks are often described as "black boxes" systems that produce outputs without clear explanations. Visualizing their internal components breaks open that box. When you can see how weights shift, how activation functions shape decision boundaries, and how loss drops over time, the network stops being mysterious and starts being understandable.

This exercise reinforced that the six core components layers, neurons, weights, activation functions, loss functions, and optimization algorithms do not operate in isolation. They form an interdependent system where every design choice ripples through the whole. Choosing the wrong activation function can make layers useless. Setting the wrong learning rate can destabilize optimization entirely. Architecture decisions cascade.

🏗️
Architecture matters

Depth and width of the network must match the complexity of the problem more is not always better.

🔀
Non-linearity is essential

Without activation functions, stacked layers collapse into a single linear transformation losing all their power.

📉
Loss guides learning

The loss function is the compass. Without an accurate signal, the optimizer has no direction to improve.

⚖️
Balance is key

Learning rate, batch size, and regularization must be balanced no single setting is universally optimal.

🎯
Visualization builds intuition

Seeing decision boundaries change in real time builds the intuition that equations alone cannot provide.

🌊
Data quality limits models

High noise showed that even well-architected networks cannot overcome fundamentally noisy or insufficient data.