Is your Python code vulnerable to log injection?

Following the news on log4j lately, you may wonder if Python’s logging library is safe. After all, there is a potential for injection attacks where string formatting meets user input. Thankfully, Python’s logging isn’t vulnerable to remote code execution. Nonetheless it is still important to be careful with untrusted data. This article will describe some common pitfalls, and how the popular practice of logging f-strings could — in certain situations — leave you vulnerable to other types of attacks.

The basics of logging

Where does formatting meet user input exactly? Let’s start with a basic logging setup.

import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

Logging a message:

logger.info("hello world")

Which prints:

INFO:__main__:hello world

Let’s see the string formatting¹ in action:

context = {'user': 'bob', 'msg': 'hello everybody'}
logger.info("user '%(user)s' commented: '%(msg)s'.", context)

This outputs the following:

INFO:__main__:user 'bob' commented: 'hello everybody'.

Simple injection

If you don’t sanitize your inputs, you may be vulnerable to log injection. Consider the following message:

"hello'.\nINFO:__main__:user 'alice' commented: 'I like pineapple pizza"

If logged with the previous template, this results in:

INFO:__main__:user 'bob' commented: 'hello'.
INFO:__main__:user 'alice' commented: 'I like pineapple pizza'.

As you can see, an attacker can not only corrupt logs, but also incriminate others.

Mitigation

We can mitigate this particular attack by escaping newline characters. But, beware that there are plenty of other evil unicode control characters which can mess up your logs. The safest solution is to simply not log untrusted text. If you need to store it for an audit trail, use a database. Alternatively, structured logging can prevent newline-based attacks.

Double formatting trouble

There’s another interesting vulnerability which is particular to Python. Because the old %-style formatting used by logging is often considered ugly, many people prefer to use f-strings:

logger.info(f"user '{user}' commented: '{msg}'.")

Admittedly this looks nicer, but it won’t stop logging from trying to format the resulting string itself. So if the msg is…

"%(foo)s"

…we are left with this after the f-string evaluates:

logger.info("user 'bob' commented: '%(foo)s'.")

So what does logging do? Does it try to look up foo and crash? Thankfully not. In the depths of the logging source code we find:

if self.args:
    msg = msg % self.args

No arguments, no formatting. Makes sense. But once there is an argument, things get interesting. Consider this:

logger.info(f"user '%(user)s' commented: '{msg}'.", context)

Of course, nobody is likely to mix formatting styles at first. But it is plausible that either:

Someone would add the msg parameter to an existing log statement in this way;
When refactoring to an f-string, someone forgot to remove the context argument;
That log messages are passed through a user-defined function or logging filter which adds a context argument.

In this case we get an error like this:

--- Logging error ---
[...snip...]
KeyError: 'foo'
Call stack:
  File "example.py", line 29, in <module>
    logger.info(f"user '%(user)s' commented: '{msg}'.", context)
Message: "user '%(user)s' commented: '%(foo)s'."
Arguments: {'user': 'bob'}

Annoying to have in the logs? Yes. Dangerous? Not…yet. But by formatting an external string into our log message (which in turn gets formatted again by logging) we open the door to string formatting attacks. Thankfully, Python is a lot less vulnerable than C, but there are still ways to abuse it.

Padding a ton

One such case is abusing padding syntax. Consider this message:

"%(user)999999999s"

This will pad the user with almost a gigabyte of whitespace. Not only will this slow your log statement down to a crawl, it could also clog up your logging infrastructure.

Why is this a problem? Attackers being able to crash your server is bad enough. If they cripple your logging, you wouldn’t even know what hit you.

Leaky logs

Another potential risk is leaking sensitive information. In our example, if the context contains a "secret" key, an attacker could leak them into the logs with the following message:

"%(secret)s"

This is particularly dangerous when combined with the padding vulnerability, as the attacker could use timing to sniff out which keys are present.

On the flip side, we can be thankful that Python’s %-style formatting syntax is so limited. If logging used the new braces-style, an attacker wouldn’t even need sensitive data to be present in context, by using a message like:

"{0.__init__.__globals__['SECRET']}"

You might wonder what the big deal is if secrets land in the logs — it’s not in the open, right? The problem is that logs are usually a lot easier for an attacker to access than a credential store. Because of this, CWE ranks “Insertion of Sensitive Information into Log File” as number 39 on their list of most dangerous software weaknesses.

Mitigation

To eliminate these risks, you should always let logging handle string formatting. Don’t format log messages yourself with f-strings or otherwise ². Thankfully, there is a flake8 plugin that can check this for you. Also, once PEP675 is implemented, you could perhaps use a typechecker to check only literal strings are passed to the logger.

Recommendations

Don’t log untrusted text. Python’s logging library doesn’t protect you from newlines or other unicode characters which allow attackers to mess up — or even forge — logs.
Don’t format logs yourself (with f-strings or otherwise). In certain situations this could leave you vulnerable to denial-of-service attacks or even sensitive data exposure.

A full sample of the code used can be found here, so you can experiment for yourself.

You can discuss this post on reddit or Hacker News.

Update 2022-01-04

I’ve since created an issue on the Python bug tracker to document security risks in the logging docs, and perhaps even to create a more secure logger API.

Update 2022-02-19

The log formatting DoS vulnerability has been included in PEP675 as a potential use of for the string literal type. I’ve opened a discussion on discuss.python.org for improvements to the logging API.

Thanks

To Daan Debie for reviewing this post!

This may be less well-known, but logging supports named formatting when a dictionary is passed as an argument. This is different from the extra= parameter! You can see for yourself how this works with the % operator: "hello %(name)" % {"name": "bob"} and "hello %s" % "bob". In this post I’ll be focussing on vulnerabilities when using the dictionary approach. ↩
Using logging’s built-in formatting is also often better for performance, among other reasons. ↩