attack CNN?

The goal of the challenge is to perform an adversarial attack against two provided YOLO (You Only Look Once) models. Craft an image such that the two models produce **significantly different predictions**, demonstrating an understanding of adversarial machine learning techniques.

attack CNN? - CTF Challenge Writeup

Challenge Information

  • Name: attack CNN?
  • Points: 10
  • Category: Misc
  • Objective: Perform an adversarial attack against two provided YOLO (You Only Look Once) models. Craft an image such that the two models produce significantly different predictions, demonstrating an understanding of adversarial machine learning techniques.

Solution

1. Understanding the Goal

  • Given two object detection models:

    • yolo_v8.pt
    • yolo_v10.pt
  • The objective is to generate a single adversarial image that causes the models to output different results under the following conditions:

    1
    2
    
    different_prediction = result_v8["class_name"] != result_v10["class_name"]
    confidence_gap = abs(result_v8["confidence"] - result_v10["confidence"]) >= 0.4
    
  • Both conditions must be satisfied to consider the attack successful.


2. Choosing the Attack Method

  • This is a classic use case for adversarial machine learning techniques, targeting deep neural networks.

  • Two common attacks were considered:

    • FGSM (Fast Gradient Sign Method) – faster, simpler
    • PGD (Projected Gradient Descent) – more effective but slower
  • For this challenge, PGD was used for better perturbation control and higher attack success rate.


3. Executing the Attack

  • Loaded both YOLO models using an appropriate framework (e.g., PyTorch + Ultralytics YOLOv8).

  • Started from a base image and applied the PGD adversarial attack, modifying the image iteratively to:

    • Change the prediction class name

    • Create a ≥ 0.4 confidence gap

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      
      import torch
      import cv2
      import numpy as np
      import os
      from ultralytics import YOLO
      
      device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
      
      model_v8 = YOLO("yolo_v8.pt").to(device)
      model_v10 = YOLO("yolo_v10.pt").to(device)
      
      def load_image(image_path, size=640):
          img = cv2.imread(image_path)
          img = cv2.resize(img, (size, size))
          img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
          img = img / 255.0
          img = torch.tensor(img, dtype=torch.float32).permute(2, 0, 1).unsqueeze(0)
          return img.to(device)
      
      @torch.no_grad()
      def get_top_prediction(model, image_tensor):
          results = model.predict(image_tensor, verbose=False)
          boxes = results[0].boxes
          if boxes is None or boxes.cls.shape[0] == 0:
              return {"class": "None", "confidence": 0.0}
          confs = boxes.conf.cpu().numpy()
          classes = boxes.cls.cpu().numpy()
          top = confs.argmax()
          return {"class": str(int(classes[top])), "confidence": float(confs[top])}
      
      def add_random_start(image, epsilon):
          noise = torch.empty_like(image).uniform_(-epsilon, epsilon)
          return torch.clamp(image + noise, 0, 1)
      
      def pgd_with_dummy_grad(image, epsilon, alpha, iters):
          ori = image.clone().detach()
          for _ in range(iters):
              image = image.clone().detach().requires_grad_(True)
              dummy_loss = image.mean()
              dummy_loss.backward()
              grad = image.grad
              image = image + alpha * grad.sign()
              delta = torch.clamp(image - ori, min=-epsilon, max=epsilon)
              image = torch.clamp(ori + delta, 0, 1).detach()
          return image
      
      def run_pgd_ctf(image_path, epsilon=0.03, alpha=0.005, iters=40, max_restarts=20, out_dir="pgd_ctf_out"):
          os.makedirs(out_dir, exist_ok=True)
          base = load_image(image_path)
          for i in range(max_restarts):
              start = add_random_start(base.clone(), epsilon)
              adv = pgd_with_dummy_grad(start, epsilon, alpha, iters)
              result_v10 = get_top_prediction(model_v10, adv)
              result_v8 = get_top_prediction(model_v8, adv)
              print(f"[{i}] v10: {result_v10}, v8: {result_v8}")
              if result_v8["class"] != result_v10["class"] and abs(result_v8["confidence"] - result_v10["confidence"]) >= 0.4:
                  img = adv.squeeze().permute(1, 2, 0).cpu().numpy() * 255
                  img = img.astype(np.uint8)
                  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
                  cv2.imwrite(f"{out_dir}/pgd_success_{i}.png", img)
                  print("✅ CTF condition met.")
                  return
          print("❌ CTF condition not met after max_restarts.")
      
      if __name__ == "__main__":
          run_pgd_ctf("car.png", epsilon=0.03, alpha=0.005, iters=40)
      
  • The attack was successful when the altered image caused the two models to disagree both in classification and confidence, satisfying the provided formula.

    Flag


Flag

The flag for this challenge is: NHNC{you_kn0w_h0w_t0_d0_adv3rs3ria1_attack}


Summary

The “attack CNN?” challenge introduces players to the field of adversarial AI, focusing on image-based attacks against neural networks. By crafting subtle perturbations, players learn how seemingly minor changes can cause significant shifts in deep learning model outputs — an essential skill in both AI security and red-teaming contexts.


Last updated on Jul 07, 2025 00:00 UTC