TD3 — Perceptron & Multi-Layer Perceptron

Objectif : comprendre les réseaux de neurones en les construisant from scratch avec Python pur (uniquement des fonctions, pas de classes), puis en utilisant scikit-learn pour des applications réelles.

Partie I — Le neurone formel : Perceptron (Rosenblatt, 1957)

En 1957, Frank Rosenblatt propose le Perceptron : un algorithme d'apprentissage supervisé pour la classification binaire. C'est le premier « neurone artificiel » — une imitation extrêmement simplifiée d'un neurone biologique.

Modèle mathématique

Un Perceptron calcule une combinaison linéaire des entrées, puis applique une fonction d'activation seuil :

Exercice 1 — Implémenter les fonctions du Perceptron

Écrivez les fonctions suivantes (sans classe, uniquement des fonctions) :

activation_seuil(z) — retourne 1 si z ≥ 0, 0 sinon
predire(w, b, x) — calcule z = b + Σ w·x, retourne activation_seuil(z)
entrainer(X, y, lr=0.1, n_iter=100) — initialise w = [0]*n, b = 0, applique la règle du Perceptron sur chaque époque, retourne (w, b)

Solution

def activation_seuil(z):
    return 1 if z >= 0 else 0

def predire(w, b, x):
    z = b + sum(wi * xi for wi, xi in zip(w, x))
    return activation_seuil(z)

def entrainer(X, y, lr=0.1, n_iter=100):
    n_features = len(X[0])
    w = [0.0] * n_features
    b = 0.0
    for _ in range(n_iter):
        for xi, yi in zip(X, y):
            y_pred = predire(w, b, xi)
            erreur = yi - y_pred
            if erreur != 0:
                w = [wi + lr * erreur * xij for wi, xij in zip(w, xi)]
                b += lr * erreur
    return w, b

Exercice 2 — Portes logiques AND et OR

Testez vos fonctions sur les jeux de données suivants :

# AND
X_and = [(0,0), (0,1), (1,0), (1,1)]
y_and = [0, 0, 0, 1]

# OR
X_or  = [(0,0), (0,1), (1,0), (1,1)]
y_or  = [0, 1, 1, 1]

Affichez les prédictions et vérifiez que le Perceptron apprend correctement.

Question : Que valent les poids et le biais après l'entraînement sur AND ? Sur OR ?

Solution

w, b = entrainer(X_and, y_and, lr=0.1, n_iter=10)
for xi, yi in zip(X_and, y_and):
    print(xi, yi, predire(w, b, xi))
# → (0,0) 0 0 | (0,1) 0 0 | (1,0) 0 0 | (1,1) 1 1 ✓

print(w, b)  # ex: [0.1, 0.1], -0.2

Exercice 3 — XOR : l'échec du Perceptron

Le XOR (ou exclusif) est défini par :

X_xor = [(0,0), (0,1), (1,0), (1,1)]
y_xor = [0, 1, 1, 0]

Entraînez le Perceptron sur XOR. Que constatez-vous ?

« XOR : la porte logique qui a gelé l'intelligence artificielle pendant 20 ans. »

En 1969, Marvin Minsky et Seymour Papert publient Perceptrons, un livre qui démontre mathématiquement les limitations du Perceptron simple (incapable de résoudre XOR). Cette publication contribue au premier hiver de l'IA (1970s-1980s), période de désillusion et de coupes budgétaires massives dans la recherche en IA.

Question : Pourquoi le Perceptron échoue-t-il sur XOR ? (Indice : visualisez les points dans le plan et la frontière de décision d'un Perceptron.)

Solution

w, b = entrainer(X_xor, y_xor, lr=0.1, n_iter=100)
for xi, yi in zip(X_xor, y_xor):
    print(xi, yi, "→", predire(w, b, xi))
# Résultat (approximatif) : erreur persistante
# Le Perceptron ne peut pas séparer XOR car les classes ne sont pas
# linéairement séparables. La frontière d'un Perceptron est une droite ;
# or XOR nécessite au moins deux droites (une solution non linéaire).

Partie II — Fonctions d'activation continues & Descente de gradient

Le problème du Perceptron seuil : il n'est pas dérivable. Impossible d'utiliser le gradient pour minimiser une erreur de manière progressive.

Solution : remplacer la fonction seuil par une fonction continue et dérivable.

La fonction sigmoïde

La sigmoïde « écrase » toute valeur réelle dans l'intervalle ]0, 1[, interprétable comme une probabilité.

Descente de gradient

On minimise une fonction de coût (loss) en déplaçant les paramètres dans la direction opposée au gradient :

Exercice 4 — Sigmoïde et sa dérivée

Implémentez les fonctions sigmoid(x) et sigmoid_derivative(x).

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

Testez pour x = −2, 0, 2. Vérifiez que σ'(x) = σ(x)(1−σ(x)).

Exercice 5 — Perceptron sigmoïde avec descente de gradient

Créez les fonctions suivantes (toujours sans classe) :

predire_sigmoid(w, b, x) — retourne σ(b + Σw·x)
entrainer_sigmoid(X, y, lr=0.1, n_iter=1000) — descente de gradient (batch) pour minimiser la MSE, retourne (w, b, loss_historique)

Solution

def predire_sigmoid(w, b, x):
    z = b + sum(wi * xi for wi, xi in zip(w, x))
    return sigmoid(z)

def classer(w, b, x):
    return 1 if predire_sigmoid(w, b, x) >= 0.5 else 0

def entrainer_sigmoid(X, y, lr=0.1, n_iter=1000):
    n_samples, n_features = len(X), len(X[0])
    w = [0.0] * n_features
    b = 0.0
    loss_historique = []

    for epoch in range(n_iter):
        y_pred = []
        z_vals = []
        for xi in X:
            z = b + sum(wi * xij for wi, xij in zip(w, xi))
            z_vals.append(z)
            y_pred.append(sigmoid(z))

        # MSE loss
        loss = 0.5 * sum((yi - ypi)**2
                      for yi, ypi in zip(y, y_pred)) / n_samples
        loss_historique.append(loss)

        # Gradients
        dw = [0.0] * n_features
        db = 0.0
        for xi, yi, ypi, zi in zip(X, y, y_pred, z_vals):
            sig = ypi
            grad = -(yi - ypi) * sig * (1 - sig)  # dL/dz
            for j in range(n_features):
                dw[j] += grad * xi[j]
            db += grad

        w = [wj - lr * dj / n_samples for wj, dj in zip(w, dw)]
        b -= lr * db / n_samples

    return w, b, loss_historique

Testez sur AND, OR, XOR. La MSE doit diminuer à chaque époque (affichez loss_historique). Le Perceptron sigmoïde échoue-t-il aussi sur XOR ?

Oui, le Perceptron sigmoïde échoue aussi sur XOR ! Changer l'activation ne change pas la nature linéaire de la combinaison z = wx + b. XOR nécessite une couche cachée supplémentaire.

Exercice 6 — Classification 2D (frontière linéaire)

Générez un dataset jouet 2D et testez entrainer_sigmoid :

# Points dans [0,2] × [0,2], label = 1 si x + y > 2
X_lin = []
y_lin = []
for x in [i/10 for i in range(21)]:
    for y in [j/10 for j in range(21)]:
        X_lin.append((x, y))
        y_lin.append(1 if x + y > 2 else 0)

Entraînez le modèle et affichez son accuracy.

Solution

w, b, loss_hist = entrainer_sigmoid(X_lin, y_lin, lr=0.1, n_iter=1000)

correct = sum(1 for xi, yi in zip(X_lin, y_lin)
              if classer(w, b, xi) == yi)
print(f"Accuracy : {correct}/{len(X_lin)} = {correct/len(X_lin):.2%}")

# La théorie : w ≈ [1, 1], b ≈ -2  (car x + y - 2 = 0 à la frontière)
print(f"w = {w}, b = {b}")

Partie III — Multi-Layer Perceptron & Rétropropagation

Un MLP (Multi-Layer Perceptron) empile plusieurs couches de neurones : une couche d'entrée, une ou plusieurs couches cachées, et une couche de sortie. C'est l'architecture fondatrice du deep learning.

Architecture 2-4-1

Forward pass

Rétropropagation (backpropagation)

Le gradient de l'erreur se propage de la sortie vers l'entrée via la règle de dérivation en chaîne (chain rule) :

Exercice 7 — MLP : forward avec des fonctions

On représente les paramètres du MLP par des listes Python :

W1 : matrice taille (n_caché × n_entrée), liste de listes
b1 : liste de taille n_caché

# Exemple pour une architecture 2-4-1
n_entree, n_cache, n_sortie = 2, 4, 1

# Initialisation aléatoire
import random
W1 = [[random.uniform(-1, 1) for _ in range(n_entree)]
      for _ in range(n_cache)]
b1 = [0.0] * n_cache
W2 = [[random.uniform(-1, 1) for _ in range(n_cache)]
      for _ in range(n_sortie)]
b2 = [0.0] * n_sortie

Implémentez la fonction forward(W1, b1, W2, b2, x) qui retourne (z1, a1, z2, a2).

Solution

def forward(W1, b1, W2, b2, x):
    # Couche cachée
    z1 = [b1[i] + sum(W1[i][j] * x[j] for j in range(len(x)))
          for i in range(len(b1))]
    a1 = [sigmoid(z) for z in z1]

    # Couche de sortie (1 neurone)
    z2 = b2[0] + sum(W2[0][i] * a1[i] for i in range(len(a1)))
    a2 = sigmoid(z2)

    return z1, a1, z2, a2

Exercice 8 — Rétropropagation & entraînement sur XOR

Implémentez la fonction :

entrainer_mlp(X, y, W1, b1, W2, b2, lr=0.5, n_iter=5000)

qui applique la rétropropagation et retourne les paramètres mis à jour + l'historique de la loss.

La formule magique pour la sortie avec BCE (Binary Cross-Entropy) + sigmoïde : δ₂ = ŷ − y. La dérivée de la sigmoïde et celle de la BCE se combinent élégamment !

Solution

def entrainer_mlp(X, y, W1, b1, W2, b2, lr=0.5, n_iter=5000):
    n_samples = len(X)
    n_input = len(X[0])
    n_hidden = len(b1)
    loss_historique = []

    for epoch in range(n_iter):
        total_loss = 0.0

        for xi, yi in zip(X, y):
            z1, a1, z2, a2 = forward(W1, b1, W2, b2, xi)

            # BCE loss
            eps = 1e-15
            loss = -(yi * math.log(a2 + eps) +
                     (1 - yi) * math.log(1 - a2 + eps))
            total_loss += loss

            # --- Backward ---

            # Gradients couche de sortie
            d_z2 = a2 - yi  # BCE + sigmoïde combiné
            d_W2 = [d_z2 * a1[i] for i in range(n_hidden)]
            d_b2 = d_z2

            # Gradients couche cachée
            d_a1 = [W2[0][i] * d_z2 for i in range(n_hidden)]
            d_z1 = [d_a1[i] * a1[i] * (1 - a1[i])
                    for i in range(n_hidden)]
            d_W1 = [[d_z1[i] * xi[j] for j in range(n_input)]
                     for i in range(n_hidden)]
            d_b1 = d_z1

            # Mise à jour des poids
            for i in range(n_hidden):
                for j in range(n_input):
                    W1[i][j] -= lr * d_W1[i][j]
                b1[i] -= lr * d_b1[i]
            for i in range(n_hidden):
                W2[0][i] -= lr * d_W2[i]
            b2[0] -= lr * d_b2

        loss_historique.append(total_loss / n_samples)

    return W1, b1, W2, b2, loss_historique

Entraînez le MLP sur XOR et vérifiez qu'il réussit enfin là où le Perceptron échouait !

# Initialisation
W1 = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(4)]
b1 = [0.0] * 4
W2 = [[random.uniform(-1, 1) for _ in range(4)]]
b2 = [0.0]

W1, b1, W2, b2, hist = entrainer_mlp(
    X_xor, y_xor, W1, b1, W2, b2, lr=0.5, n_iter=5000
)

print("Prédictions XOR :")
for xi, yi in zip(X_xor, y_xor):
    _, _, _, a2 = forward(W1, b1, W2, b2, xi)
    print(xi, yi, "→", round(a2, 4), "(" + str(1 if a2 >= 0.5 else 0) + ")")
print(f"Loss finale: {hist[-1]:.6f}")
# Devrait converger vers ~0.01 ou moins

« Il n'a fallu que 30 ans et une couche cachée pour vaincre XOR. »
(Et un peu de rétropropagation.)

Exercice 9 — Variation : nombre de neurones cachés

Testez l'impact du nombre de neurones sur la couche cachée (2, 4, 8, 16). Comment évolue :

la vitesse de convergence ?
la loss finale ?
la capacité à généraliser ?

Piste

for n_cache in [2, 4, 8, 16]:
    W1 = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(n_cache)]
    b1 = [0.0] * n_cache
    W2 = [[random.uniform(-1, 1) for _ in range(n_cache)]]
    b2 = [0.0]
    W1, b1, W2, b2, hist = entrainer_mlp(
        X_xor, y_xor, W1, b1, W2, b2, lr=0.5, n_iter=5000
    )
    acc = sum(1 for xi, yi in zip(X_xor, y_xor)
              if (forward(W1, b1, W2, b2, xi)[3] >= 0.5) == yi) / 4
    print(f"n_cache={n_cache:2d}, loss={hist[-1]:.6f}, acc={acc:.0%}")

Partie IV — Scikit-learn : MLPClassifier

La notation pointée. Vous avez déjà utilisé math.exp(), random.uniform(). Les bibliothèques exposent des objets et des fonctions :

MLPClassifier() crée un objet classeur
.fit(X, y) est une méthode de cet objet
.predict(X) utilise l'objet entraîné pour prédire

Exercice 10 — MLPClassifier sur XOR

Utilisez sklearn.neural_network.MLPClassifier pour résoudre XOR.

Solution

from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(
    hidden_layer_sizes=(4,),    # 1 couche cachée, 4 neurones
    activation='logistic',       # sigmoïde
    learning_rate_init=0.5,
    max_iter=5000,
    random_state=42
)
clf.fit(X_xor, y_xor)

print("Prédictions sklearn :")
for xi, yi in zip(X_xor, y_xor):
    print(xi, yi, "→", clf.predict([xi])[0])
print(f"Score : {clf.score(X_xor, y_xor):.0%}")

Exercice 11 — Frontière de décision

Visualisez la frontière de décision du MLP sur un maillage 2D :

Solution (NumPy/Matplotlib)

import numpy as np
import matplotlib.pyplot as plt

# Maillage
xx, yy = np.meshgrid(np.linspace(-0.5, 1.5, 200),
                     np.linspace(-0.5, 1.5, 200))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.figure(figsize=(6, 5))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='bwr')
plt.scatter([0,0,1,1], [0,1,0,1],
           c=['b','r','r','b'], s=200, edgecolors='k')
plt.title("Frontière de décision MLP sur XOR")
plt.show()

Partie V — Application : Classification de chiffres manuscrits

Nous allons comparer différents modèles sur le dataset digits (images 8×8 de chiffres manuscrits, 1797 échantillons). C'est le « Hello World » du deep learning.

Exercice 12 — Perceptron vs MLP vs k-NN

Comparez trois classifieurs :

sklearn.linear_model.Perceptron — linéaire (vu au TD2)
sklearn.neural_network.MLPClassifier — non linéaire
sklearn.neighbors.KNeighborsClassifier — non linéaire

Code complet

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import numpy as np

# Chargement des données
digits = load_digits()
X, y = digits.data, digits.target        # X : 1797 × 64, y : 1797

# Séparation train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Modèles
modeles = {
    "Perceptron": Perceptron(max_iter=1000, random_state=42),
    "MLP (4,)":   MLPClassifier(hidden_layer_sizes=(4,), max_iter=1000, random_state=42),
    "MLP (16,)":  MLPClassifier(hidden_layer_sizes=(16,), max_iter=1000, random_state=42),
    "MLP (64,)":  MLPClassifier(hidden_layer_sizes=(64,), max_iter=1000, random_state=42),
    "k-NN (k=3)": KNeighborsClassifier(n_neighbors=3),
}

for nom, modele in modeles.items():
    modele.fit(X_train, y_train)
    y_pred = modele.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"{nom:12s} → accuracy = {acc:.2%}")

# Exemple : matrice de confusion pour MLP 64
mlp = modeles["MLP (64,)"]
y_pred_mlp = mlp.predict(X_test)
cm = confusion_matrix(y_test, y_pred_mlp)

plt.figure(figsize=(8, 6))
plt.imshow(cm, cmap='Blues', interpolation='nearest')
plt.title("Matrice de confusion — MLP (64,)")
plt.colorbar()
plt.xlabel("Prédiction")
plt.ylabel("Vrai")
# Afficher les valeurs dans les cases
for i in range(10):
    for j in range(10):
        plt.text(j, i, str(cm[i, j]),
                 ha='center', va='center',
                 color='white' if cm[i, j] > cm.max()/2 else 'black')
plt.show()

Exercice 13 — Visualiser les erreurs

Affichez quelques images mal classifiées par le MLP.

Solution

errors = []
for i in range(len(X_test)):
    if mlp.predict([X_test[i]])[0] != y_test[i]:
        errors.append(i)

plt.figure(figsize=(10, 4))
for idx, i in enumerate(errors[:8]):
    plt.subplot(2, 4, idx + 1)
    plt.imshow(X_test[i].reshape(8, 8), cmap='gray')
    vrai = y_test[i]
    pred = mlp.predict([X_test[i]])[0]
    plt.title(f"Vrai={vrai}, Préd={pred}", color='red')
    plt.axis('off')
plt.tight_layout()
plt.show()

« Si votre MLP ne reconnaît pas un 3, dites-lui que c'est un 8 déguisé. »

Exercice 14 (★) — Variations

Normalisation : comparez avec StandardScaler.
Validation croisée : utilisez cross_val_score pour estimer la performance.
Plus de couches : essayez hidden_layer_sizes=(32, 16) (2 couches cachées).
ReLU : remplacez activation='logistic' par 'relu'.

Modèle	Avantages	Limites
Perceptron	Simple, rapide, interprétable	Uniquement linéaire, échoue sur XOR
MLP 1 couche	Approximation universelle, non-linéaire	Vanishing gradient, overfitting, réglage difficile
Deep Learning	Plusieurs couches = hiérarchie de features	Nécessite beaucoup de données et de calcul

TD3 — Perceptron & Multi-Layer Perceptron

Partie I — Le neurone formel : Perceptron (Rosenblatt, 1957)

Modèle mathématique

Exercice 1 — Implémenter les fonctions du Perceptron

Exercice 2 — Portes logiques AND et OR

Exercice 3 — XOR : l'échec du Perceptron

Partie II — Fonctions d'activation continues & Descente de gradient

La fonction sigmoïde

Descente de gradient

Exercice 4 — Sigmoïde et sa dérivée

Exercice 5 — Perceptron sigmoïde avec descente de gradient

Exercice 6 — Classification 2D (frontière linéaire)

Partie III — Multi-Layer Perceptron & Rétropropagation

Architecture 2-4-1

Forward pass

Rétropropagation (backpropagation)

Exercice 7 — MLP : forward avec des fonctions

Exercice 8 — Rétropropagation & entraînement sur XOR

Exercice 9 — Variation : nombre de neurones cachés

Partie IV — Scikit-learn : MLPClassifier

Exercice 10 — MLPClassifier sur XOR

Exercice 11 — Frontière de décision

Partie V — Application : Classification de chiffres manuscrits

Exercice 12 — Perceptron vs MLP vs k-NN

Exercice 13 — Visualiser les erreurs

Exercice 14 (★) — Variations

Conclusion