> For the complete documentation index, see [llms.txt](https://luftenshjaltar.gitbook.io/ctf/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://luftenshjaltar.gitbook.io/ctf/general/formats.md).

# Formats

Various data formats have their own quirks.

A couple of things to look out for:

1. Is any data parsed [multiple times, using different methods](https://about.gitlab.com/blog/2020/03/30/how-to-exploit-parser-differentials/)?  &#x20;
2. Is the data validated before or after unicode normalization? &#x20;
3. Is any of the data part of a *turing complete configuration language*?

## CSV

Even something this simple has multiple dialects and varieties. Mess around with whitespace and separators and quotes.

## IP address

[Twitter thread](https://twitter.com/dave_universetf/status/1342685822286360576)\
[Another twitter thread](https://twitter.com/x0rz/status/928584447292858368)\
[Blog post](https://blog.dave.tf/post/ip-addr-parsing/)\
[Another Blog post](https://ma.ttias.be/theres-more-than-one-way-to-write-an-ip-address/)\
[Tool](https://github.com/D4Vinci/Cuteit)

```
$ ping 0177.1
$ ping 134744072
$ ping 0x8080808
$ ping 010.0x0000008.00000010.8
$ ping 8.0x0000000000000080808
$ ping 192.168.36095
$ ping 192.11046143
$ ping 0000000001.0000000002.0000000003.000000004
```

There are many less-common IP address formats. Try them.

## JSON

[Tell me more](https://labs.bishopfox.com/tech-blog/an-exploration-of-json-interoperability-vulnerabilities)

Different parsers deal with special cases differently. Perhaps if an app uses one library for parsing and a different one for parsing+validation, you can bypass validation by duplicating keys?

## Magic Bytes

[List ordered by magic](https://www.garykessler.net/library/file_sigs.html)

## PDF

[Insecure PDF features](https://web-in-security.blogspot.com/2021/01/insecure-features-in-pdfs.html)

## Polyglots

[Tell me more](/ctf/general/reversing.md#polyglots)\
[Mitra](https://github.com/corkami/mitra)

They can use polyglots to hide their code. We can use polyglots to bypass type checks.

*Yes, officer*, that's a GIF I'm uploading. Oh, but it's also PHP code.

## Regular Expressions

[Tool](https://regex101.com/)

#### Ranges

`[A-z]` includes more than just letters.

## Terminals

If, for some reason, your payload is inspected in an actual real-life terminal (-emulator), you may want to try [Terminal Escape Injection](https://www.infosecmatter.com/terminal-escape-injection/). Include escapes in your payload such that a terminal will overwrite the sensitive text with benign-looking data.

#### URL filtering

[Example](https://twitter.com/YShahinzadeh/status/1250889458641141760) If a regex is used to split the host, perhaps URL parsing can be fooled. Real parsers take everything before an `@` as a username.

## Unicode

### Normalization

[Tell me more](https://jlajara.gitlab.io/web/2020/02/19/Bypass_WAF_Unicode.html) [List](https://appcheck-ng.com/wp-content/uploads/unicode_normalization.html) [Tool](https://github.com/eldstal/strinvader) [Tool](https://github.com/JesseClarkND/abnormalizer) [Tool](https://spaceraccoon.github.io/unicollider/)

```
① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯ ⑰ ⑱ ⑲ ⑳ 
⑴ ⑵ ⑶ ⑷ ⑸ ⑹ ⑺ ⑻ ⑼ ⑽ ⑾ ⑿ ⒀ ⒁ ⒂ ⒃ ⒄ ⒅ ⒆ ⒇ 
⒈ ⒉ ⒊ ⒋ ⒌ ⒍ ⒎ ⒏ ⒐ ⒑ ⒒ ⒓ ⒔ ⒕ ⒖ ⒗ ⒘ ⒙ ⒚ ⒛ 
⒜ ⒝ ⒞ ⒟ ⒠ ⒡ ⒢ ⒣ ⒤ ⒥ ⒦ ⒧ ⒨ ⒩ ⒪ ⒫ ⒬ ⒭ ⒮ ⒯ ⒰ ⒱ ⒲ ⒳ ⒴ ⒵ 
Ⓐ Ⓑ Ⓒ Ⓓ Ⓔ Ⓕ Ⓖ Ⓗ Ⓘ Ⓙ Ⓚ Ⓛ Ⓜ Ⓝ Ⓞ Ⓟ Ⓠ Ⓡ Ⓢ Ⓣ Ⓤ Ⓥ Ⓦ Ⓧ Ⓨ Ⓩ 
ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ 
⓪ ⓫ ⓬ ⓭ ⓮ ⓯ ⓰ ⓱ ⓲ ⓳ ⓴ ⓵ ⓶ ⓷ ⓸ ⓹ ⓺ ⓻ ⓼ ⓽ ⓾ ⓿
```

([Source](https://www.hahwul.com/phoenix/ssrf-open-redirect))

## Unknown encodings

[Tool](https://transformations.jobertabma.nl/) to auto-detect layered text transforms [Cyberchef](https://gchq.github.io/CyberChef/) is good for messing around with known chains [Haiti](https://github.com/noraj/haiti) identifies hash types

## URL

[Domain names](https://twitter.com/s0md3v/status/1354733673069694978?s=19) can't contain underscores, except that subdomains certainly can. Neat.

Domain name filters can sometimes be bypassed by unicode normalization exploits (see above).

[Not all languages](https://github.com/jimen0/differer) parse URLs identically.

Some platforms (NodeJS) are more permissive about URL formats:

```
http:\domain.com
```

Also try the tricks listed under IP address

## XML

Different XML-based formats have their own cavities. Can the schema be bypassed to include dangerous elements?

### XXE

[Tell me more](https://blog.cobalt.io/how-to-execute-an-xml-external-entity-injection-xxe-5d5c262d5b16) [Tell me even *more*](https://owasp.org/www-community/vulnerabilities/XML_External_Entity_%28XXE%29_Processing) [*You guessed it*](https://www.netsparker.com/blog/web-security/xxe-xml-external-entity-attacks/)\
[Tool](https://github.com/luisfontes19/xxexploiter)\
[Work on it](https://gosecure.github.io/xxe-workshop/)\
[What about dirty files?](https://dzone.com/articles/xml-external-entity-xxe-limitations)

Can the XML "import" external resources?


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://luftenshjaltar.gitbook.io/ctf/general/formats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
