attack CNN? - CTF Challenge Writeup

Challenge Information

Name: attack CNN?
Points: 10
Category: Misc
Objective: Perform an adversarial attack against two provided YOLO (You Only Look Once) models. Craft an image such that the two models produce significantly different predictions, demonstrating an understanding of adversarial machine learning techniques.

Solution

1. Understanding the Goal

Given two object detection models:
- yolo_v8.pt
- yolo_v10.pt

The objective is to generate a single adversarial image that causes the models to output different results under the following conditions:

1
2
different_prediction = result_v8["class_name"] != result_v10["class_name"]
confidence_gap = abs(result_v8["confidence"] - result_v10["confidence"]) >= 0.4

Both conditions must be satisfied to consider the attack successful.

2. Choosing the Attack Method

This is a classic use case for adversarial machine learning techniques, targeting deep neural networks.
Two common attacks were considered:
- FGSM (Fast Gradient Sign Method) – faster, simpler
- PGD (Projected Gradient Descent) – more effective but slower
For this challenge, PGD was used for better perturbation control and higher attack success rate.

3. Executing the Attack

Loaded both YOLO models using an appropriate framework (e.g., PyTorch + Ultralytics YOLOv8).

Started from a base image and applied the PGD adversarial attack, modifying the image iteratively to:

Change the prediction class name

Create a ≥ 0.4 confidence gap

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import torch
import cv2
import numpy as np
import os
from ultralytics import YOLO

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_v8 = YOLO("yolo_v8.pt").to(device)
model_v10 = YOLO("yolo_v10.pt").to(device)

def load_image(image_path, size=640):
    img = cv2.imread(image_path)
    img = cv2.resize(img, (size, size))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img / 255.0
    img = torch.tensor(img, dtype=torch.float32).permute(2, 0, 1).unsqueeze(0)
    return img.to(device)

@torch.no_grad()
def get_top_prediction(model, image_tensor):
    results = model.predict(image_tensor, verbose=False)
    boxes = results[0].boxes
    if boxes is None or boxes.cls.shape[0] == 0:
        return {"class": "None", "confidence": 0.0}
    confs = boxes.conf.cpu().numpy()
    classes = boxes.cls.cpu().numpy()
    top = confs.argmax()
    return {"class": str(int(classes[top])), "confidence": float(confs[top])}

def add_random_start(image, epsilon):
    noise = torch.empty_like(image).uniform_(-epsilon, epsilon)
    return torch.clamp(image + noise, 0, 1)

def pgd_with_dummy_grad(image, epsilon, alpha, iters):
    ori = image.clone().detach()
    for _ in range(iters):
        image = image.clone().detach().requires_grad_(True)
        dummy_loss = image.mean()
        dummy_loss.backward()
        grad = image.grad
        image = image + alpha * grad.sign()
        delta = torch.clamp(image - ori, min=-epsilon, max=epsilon)
        image = torch.clamp(ori + delta, 0, 1).detach()
    return image

def run_pgd_ctf(image_path, epsilon=0.03, alpha=0.005, iters=40, max_restarts=20, out_dir="pgd_ctf_out"):
    os.makedirs(out_dir, exist_ok=True)
    base = load_image(image_path)
    for i in range(max_restarts):
        start = add_random_start(base.clone(), epsilon)
        adv = pgd_with_dummy_grad(start, epsilon, alpha, iters)
        result_v10 = get_top_prediction(model_v10, adv)
        result_v8 = get_top_prediction(model_v8, adv)
        print(f"[{i}] v10: {result_v10}, v8: {result_v8}")
        if result_v8["class"] != result_v10["class"] and abs(result_v8["confidence"] - result_v10["confidence"]) >= 0.4:
            img = adv.squeeze().permute(1, 2, 0).cpu().numpy() * 255
            img = img.astype(np.uint8)
            img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
            cv2.imwrite(f"{out_dir}/pgd_success_{i}.png", img)
            print("✅ CTF condition met.")
            return
    print("❌ CTF condition not met after max_restarts.")

if __name__ == "__main__":
    run_pgd_ctf("car.png", epsilon=0.03, alpha=0.005, iters=40)

The attack was successful when the altered image caused the two models to disagree both in classification and confidence, satisfying the provided formula.

Flag

The flag for this challenge is: NHNC{you_kn0w_h0w_t0_d0_adv3rs3ria1_attack}

Summary

The “attack CNN?” challenge introduces players to the field of adversarial AI, focusing on image-based attacks against neural networks. By crafting subtle perturbations, players learn how seemingly minor changes can cause significant shifts in deep learning model outputs — an essential skill in both AI security and red-teaming contexts.