At least our ETA is better than M$.
http://xkcd.com/612/
Summary: optimizing an algorithm using Treap data structure and CRC32 properties.
After reverse-engineering the binary, we can write the following pseudocode in python:
from binascii import crc32 def lcg_step(): global lcg lcg = (0x5851F42D4C957F2D * lcg + 0x14057B7EF767814F) % 2**64 return lcg def extract(val): res = 32 + val - 95 * (( ((val - (0x58ED2308158ED231 * val >> 64)) >> 1) + (0x58ED2308158ED231 * val >> 64)) >> 6) return chr(res & 0xff) buf = [] lcg = 8323979853562951413 crc = 0 for i in xrange(31415926): # append a symbol c = extract( lcg_step() ) buf.append(c) # reverse interval x = lcg_step() % len(buf) y = lcg_step() % len(buf) l, r = min(x, y), max(x, y) buf[l:r+1] = buf[l:r+1][::-1] # update crc crc = crc32("".join(buf), crc) % 2**32 array = [...] # from binary flag = "" for i in range(len(array)): if buf[array[i]] == "}": flag += "%08x" % crc flag += buf[array[i]] |
The binary generates a large array using an LCG PRNG, reverses subarrays defined by PRNG and updates the CRC of the whole state after each iteration. There are 31 million iterations total, and straightforward reversing subarrays and computing CRC32 will take quadratic time so this is going to be infeasible. We have to come up with better algorithm.
One of the data structures which can quickly reverse intervals is the Treap which is also known as randomized binary search tree. Since it is basically a binary search tree, it can easily be modified to maintain and update various sums on intervals. Since CRC32 is not a simple sum, it requires some special care. I took the basic implementation of Treap from here (the code in the end of the article).
A good thing about Treap is that it allows to quickly “extract” a node which corresponds to any interval of the array. In conjunction with lazy propagation, it allows to do cool things. For example, to reverse an interval we “extract” the node corresponding to that interval and set a “rev” flag. If then later we visit this node for some reason, the reversal is “pushed down”: two children of the node are swapped and each children’s “rev” flag is flipped. In such lazy way we will do logarithmic number of operations per each reversal on average.
The main problem here is to update the CRC32 state with the whole array after each reversal. We need to teach our Treap to compute CRC32 even after performing reversals.
Note that the CRC32 value is added to the flag only in the end, thus, using this basic Treap, we can already compute the final string and extract the first part of the flag in a few minutes:
hitcon{super fast reversing and CRC32 – [FINAL CRC HERE]}
Unfortunately, we HAVE to compute the CRC to get the points. Let’s do it!
About CRC32
The CRC32 without the initial and the final xors with 0xffffffff (let’s call it $rawCRC32$) is simply multiplication modulo an irreducible polynomial over $GF(2)$:
$$rawCRC32(m) = m \times x^{32} \mod P(x),$$
where
$$\begin{split}
P(X) & = & X^{32}+X^{26}+X^{23}+X^{22}+X^{16}+X^{12}+X^{11}+ \\
& + & X^{10}+X^8+X^7+X^5+X^4+X^2+X+1.
\end{split}$$
Such polynomials are nicely stored in 32-bit words. For example, $P(x) = \mathtt{0xEDB88320}$ (MSB are lowest degree terms). Multiplications are done in a way similar to fast exponentiation (see Finite field arithmetic).
A good thing is that $rawCRC32(m)$ is linear:
$$rawCRC32(a \oplus b) = rawCRC32(a) \oplus rawCRC32(b).$$
Shifting the message left by one bit is equivalent to multiplying it by $x$. Therefore, for concatenation we get:
$$rawCRC32(a || b) = rawCRC32(a) \times x^{|b|} \oplus rawCRC32(b).$$
Using this formula allows us to combine CRC values of two large strings quite quickly. Computing $x^{|y|}$ can be done using fast exponentiation or simply precomputed.
Adding CRC32 into the Treap
Let’s store in each tree node the $rawCRC32$ of the corresponding segment and, additionally, $rawCRC32$ of the reversed segment. Then, depending on the “rev” flag we may retrieve one or the other value. When we “push” the lazy reversal down, we simply swap the two values. The main part is then in computing the two $rawCRC$ values of a node using values of its child nodes. This is quite easy to code using the concatenation formula given before. The formula is also useful when merging the CRCs of the consequtive states.
Here is the CRC-related code:
uint32_t POLY = 0xedb88320L; uint32_t HI = 1u << 31; uint32_t LO = 1; uint32_t ONEBYTE = (1u << 31) >> 8; uint32_t BYTE_CRC[256]; uint32_t SHIFT_BYTES[100 * 1000 * 1000]; inline uint32_t poly_mul(uint32_t a, uint32_t b) { uint32_t p = 0; while (b) { if (b & HI) p ^= a; b <<= 1; if (a & LO) a = (a >> 1) ^ POLY; else a >>= 1; } return p; } void precompute() { SHIFT_BYTES[0] = HI; // 1 FOR(i, 1, 100 * 1000 * 1000) { SHIFT_BYTES[i] = poly_mul(SHIFT_BYTES[i-1], ONEBYTE); } FORN(c, 256) { BYTE_CRC[c] = poly_mul(c, ONEBYTE); } } inline uint32_t lift(uint32_t crc, LL num) { return poly_mul(crc, SHIFT_BYTES[num]); } |
And here is modification of the Treap related to the CRC:
inline uint32_t crc1(pitem it) { if (!it) return 0; if (it->rev) return it->crc_backward; return it->crc_forward; } inline uint32_t crc2(pitem it) { if (!it) return 0; if (it->rev) return it->crc_forward; return it->crc_backward; } inline void update_up (pitem it) { if (it) { it->cnt = cnt(it->l) + cnt(it->r) + 1; int left_size = cnt(it->l); int right_size = cnt(it->r); uint32_t cl, cr, cmid; cmid = BYTE_CRC[it->value]; cl = crc1(it->l); cr = crc1(it->r); it->crc_forward = lift(cl, right_size + 1) ^ lift(cmid, right_size) ^ cr; cl = crc2(it->l); cr = crc2(it->r); it->crc_backward = cl ^ lift(cmid, left_size) ^ lift(cr, left_size + 1); } } inline void push (pitem it) { if (it && it->rev) { swap(it->crc_forward, it->crc_backward); it->rev = false; swap (it->l, it->r); if (it->l) it->l->rev ^= true; if (it->r) it->r->rev ^= true; } } |
The full solution is available here.
This code works for ~40 minutes on my laptop and produces the final CRC: d72a4529.
The flag then is hitcon{super fast reversing and CRC32 – d72a4529}.