Formats

Various data formats have their own quirks.

A couple of things to look out for:

  1. Is the data validated before or after unicode normalization?

  2. Is any of the data part of a turing complete configuration language?

CSV

Even something this simple has multiple dialects and varieties. Mess around with whitespace and separators and quotes.

IP address

Twitter thread Another twitter thread Blog post Another Blog post Tool

$ ping 0177.1
$ ping 134744072
$ ping 0x8080808
$ ping 010.0x0000008.00000010.8
$ ping 8.0x0000000000000080808
$ ping 192.168.36095
$ ping 192.11046143
$ ping 0000000001.0000000002.0000000003.000000004

There are many less-common IP address formats. Try them.

JSON

Tell me more

Different parsers deal with special cases differently. Perhaps if an app uses one library for parsing and a different one for parsing+validation, you can bypass validation by duplicating keys?

Magic Bytes

List ordered by magic

PDF

Insecure PDF features

Polyglots

Tell me more Mitra

They can use polyglots to hide their code. We can use polyglots to bypass type checks.

Yes, officer, that's a GIF I'm uploading. Oh, but it's also PHP code.

Regular Expressions

Tool

Ranges

[A-z] includes more than just letters.

Terminals

If, for some reason, your payload is inspected in an actual real-life terminal (-emulator), you may want to try Terminal Escape Injection. Include escapes in your payload such that a terminal will overwrite the sensitive text with benign-looking data.

URL filtering

Example If a regex is used to split the host, perhaps URL parsing can be fooled. Real parsers take everything before an @ as a username.

Unicode

Normalization

Tell me more List Tool Tool Tool

① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯ ⑰ ⑱ ⑲ ⑳ 
⑴ ⑵ ⑶ ⑷ ⑸ ⑹ ⑺ ⑻ ⑼ ⑽ ⑾ ⑿ ⒀ ⒁ ⒂ ⒃ ⒄ ⒅ ⒆ ⒇ 
⒈ ⒉ ⒊ ⒋ ⒌ ⒍ ⒎ ⒏ ⒐ ⒑ ⒒ ⒓ ⒔ ⒕ ⒖ ⒗ ⒘ ⒙ ⒚ ⒛ 
⒜ ⒝ ⒞ ⒟ ⒠ ⒡ ⒢ ⒣ ⒤ ⒥ ⒦ ⒧ ⒨ ⒩ ⒪ ⒫ ⒬ ⒭ ⒮ ⒯ ⒰ ⒱ ⒲ ⒳ ⒴ ⒵ 
Ⓐ Ⓑ Ⓒ Ⓓ Ⓔ Ⓕ Ⓖ Ⓗ Ⓘ Ⓙ Ⓚ Ⓛ Ⓜ Ⓝ Ⓞ Ⓟ Ⓠ Ⓡ Ⓢ Ⓣ Ⓤ Ⓥ Ⓦ Ⓧ Ⓨ Ⓩ 
ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ 
⓪ ⓫ ⓬ ⓭ ⓮ ⓯ ⓰ ⓱ ⓲ ⓳ ⓴ ⓵ ⓶ ⓷ ⓸ ⓹ ⓺ ⓻ ⓼ ⓽ ⓾ ⓿

(Source)

Unknown encodings

Tool to auto-detect layered text transforms Cyberchef is good for messing around with known chains Haiti identifies hash types

URL

Domain names can't contain underscores, except that subdomains certainly can. Neat.

Domain name filters can sometimes be bypassed by unicode normalization exploits (see above).

Not all languages parse URLs identically.

Some platforms (NodeJS) are more permissive about URL formats:

http:\domain.com

Also try the tricks listed under IP address

XML

Different XML-based formats have their own cavities. Can the schema be bypassed to include dangerous elements?

XXE

Tell me more Tell me even more You guessed it Tool Work on it What about dirty files?

Can the XML "import" external resources?

Last updated