Fixing leaky logs: how to find a bug and ensure it never returns

by Nathan Brahms on October 28, 2020

TL;DR I lay out a case for moving security enforcement into the hands of developers. I show how I and another developer at r2c successfully identified data leakage in our logs, fixed the issue, and prevented it from happening in the future. We did this in a matter of hours, without assistance from our AppSec team.

Introduction

As a developer and engineering manager, I’ve become obsessed with finding ways to rapidly solve security issues across the engineering organization without ever needing to fully involve our security team.

Why is this important? I see multiple benefits:

  • Fixing security issues is fast. So fast, in fact, that we can solve them in minutes after identifying them, without security issues languishing for days or weeks. In previous roles, I've seen internally known security issues lie open with only obscurity protecting my organization from fallout.
  • When developers can solve security issues easily themselves, it frees the security team to focus on “big picture” security. I want security engineers to be thinking how to choose frameworks, set up tools, help with secure architecture, and build defense-in-depth—not finding my last XSS mistake.

I call this concept “self-service DevSec”.

In the rest of this blog post, I'll walk through a security bug we encountered during the day-to-day course of regular development work. I'll discuss how we discovered the issue, and how, within just a few hours, fixed the security issue, and used Semgrep to prevent the bug from reoccurring.

Here's the story:

Story

Last month, I was debugging a Flask web-app authentication workflow with Clara McCreery, another engineer at r2c. Like many engineers faced with a confusing debugging problem, one of our first steps was to throw the web-app into debug logging.

Specifically, we wanted to know what was going on with our database operations, so we set our ORM (in this case, we use SQLAlchemy) into INFO-level logging with:

logging.getLogger("sqlalchemy.engine.base.Engine").setLevel(logging.INFO)

This configures SQLAlchemy to log all SQL statements, together with passed parameters. Let's look at some of the output we saw:

INFO:werkzeug:127.0.0.1 - - [25/Sep/2020 11:50:01] "POST /api/auth/authenticate HTTP/1.1" 200 -
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT token.id AS token_id, token.token AS token_token, token.name AS token_name
FROM token
WHERE token.token = %(token_1)s
 LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'token_1': $2a$10$KVsyW1jjKn.pvkVi3w9Rn.1mwnZFd7F2SFveGDG8flIhbe.MoJH4G, 'param_1': 1}

...Uh-oh.

We definitely shouldn’t be logging tokens (even if they're securely hashed). (In this example the actual token value has been changed to protect the innocent.)

Let’s make a plan

At this point we’ve identified a security issue, and we want to stomp it out while preserving our ability to inspect logs. Our plan:

  1. Mitigate the immediate security issue.
  2. Find a permanent solution to the problem that’s future proof. A permanent solution means a baked-in change to our systems. Ideally this solution is automated and seamless across our entire organization.
  3. Add a mechanism to enforce our solution’s use organization-wide.

In the rest of this post, I’ll walk you through how we addressed each step. Notably, we were able to accomplish this entire flow in a couple hours, without engaging the security team at all.

1. Mitigation

Mitigation here was fairly straightforward, as we already knew the root cause of our problem. We can quickly revert our logging change. Then we can do a quick audit of our logs to ensure that only development test tokens were leaked.

2. The permanent solution

How do we prevent SQLAlchemy from logging sensitive data?

A valiant attempt

Step 1 was to read the docs. A quick web search of “sqlalchemy hide parameters in engine logging” linked us to the SQLAlchemy Engine documentation. A detailed read later, we found the hide_parameters flag, which prevents the logging framework from emitting any parameters in logs or exceptions.

While this certainly would prevent our security issue, it was too blunt of a hammer for us: we wanted to know (for example) database IDs, and the like, for debugging.

The real solution

We then inspected the relevant SQLAlchemy source code. The relevant code is in sqlalchemy/engine/base.py:

    if self._echo:
        self.engine.logger.info(statement)
        if not self.engine.hide_parameters:
            self.engine.logger.info(
                "%r",
                sql_util._repr_params(
                    parameters, batches=10, ismulti=context.executemany
                ),
            )

sql_util._repr_params, in turn, runs:

def _repr_params(self, params, typ):
    trunc = self.trunc
    if typ is self._DICT:
        return "{%s}" % (
            ", ".join(
                "%r: %s" % (key, trunc(value))
                for key, value in params.items()
            )
        )
    ...

Investigating trunc, we found that it converts the parameter value by truncating the parameter’s repr to a maximum number of characters.

This meant that we should override the repr method of the parameter object to prevent sensitive logging.

At this point, like good engineers, we took the lazy route: stand on your peers’ shoulders. I found this GitHub issue, where Mike Bayer had already posted a nice solution.

Some shameless copying later (and adding some types to make mypy happy), we had this Gist. The key code is:

class ObfuscatedString(types.TypeDecorator):
    """
    String column type for use with SQLAlchemy models whose
    content should not appear in logs or exceptions
    """

    impl = types.String

    class Repr(str):
        def __repr__(self) -> str:
            return "********"

    def process_bind_param(self, value: Optional[str], dialect: Any) -> Optional[Repr]:
        return self.Repr(value) if value else None

    def process_result_value(
        self, value: Optional[Repr], dialect: Any
    ) -> Optional[str]:
        return str(value) if value else None


setattr(db, "ObfuscatedString", ObfuscatedString)

What does this code accomplish? It replaces our original str parameters with a new ObfuscatedString.Repr parameter. When logged (or when emitted into an exception message), the string is replaced by our ******** obfuscation sentinel. Since the parameter is still bound as a raw string (via impl = types.String), the correct value is still inserted and selected from the database.

To use this new column type, we set our token’s column type:

class Token(db.Model):
    ...
    token = db.Column(db.ObfuscatedString, ...)
    ...

We then re-enabled INFO logging, and checked that we were properly obfuscating text:

INFO:werkzeug:127.0.0.1 - - [25/Sep/2020 13:48:55] "GET /api/agent/deployments/1/policies HTTP/1.1" 200 -
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT token.id AS token_id, token.token AS token_token, token.name AS token_name
FROM token
WHERE token.token = %(token_1)s
 LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'token_1': ********, 'param_1': 1}

For completeness, we also validated in our development database console that the correct values were stored and retrieved.

Great success! 🚢 Ship it.

3. Enforcement

It was tempting to rest on our laurels here. We had solved our security issue for the time being, and we could get back to debugging our original auth issue.

But we wanted to guarantee that we would never see this issue again. How would we do this?

Here are some ideas that I’m sure we’ve all encountered before:

  1. Block all commits to SQLAlchemy models on security review!
  2. Host a yearly security training for all devs, including the pitfalls of logging sensitive data!
  3. Audit logs weekly!
  4. File an issue with your SAST provider, demanding they add checks to catch sensitively logged data!

If there is a central take-away from this blog post, it is this: these are not ideal solutions:

  1. Blocking commits introduces needless friction into the development process, slows development velocity, and needlessly distracts the security team.
  2. Security trainings are an important component to a security program, and necessary to keep developers aware of evolving security threats, but humans have fallible memory, and we can forget things we've heard months or even days in the past.
  3. Regular audits, like blocking commits, introduce a heavy workload on an almost certainly overloaded security team.
  4. Your SAST provider will certainly welcome your suggestion, but you will be beholden to their software release cycle, and may not see checks be available for months; furthermore, if your issue is domain-specific, it may not even make sense for a check to be implemented within a generalist product.

Fortunately, Semgrep gave us a simple solution here: Define an invariant in your code, and enforce it using a Semgrep scan on every CI run.

At r2c, we use GitHub Actions to run Semgrep on every merge request. We define what checks Semgrep should run using a managed policy, a list of rules and notification settings managed by semgrep.dev.

To guarantee our code against future issues, I went to semgrep.dev/editor and wrote a quick rule to detect potential insecurely logged SQLAlchemy columns.

Here's the rule definition in Semgrep's YAML definition language:

rules:
- id: obfuscate-sensitive-string-columns
  patterns:
    - pattern: |
        $COLUMN = db.Column(db.String, ...)
    - metavariable-regex:
        metavariable: $COLUMN
        regex: '.*(?<![A-Za-z])(token|key|email|secret)(?![A-RT-Za-rt-z]).*'
  message: |
    '$COLUMN' may expose sensitive information in logs and exceptions. Use
    'db.ObfuscatedString' instead of 'db.String'.
  severity: WARNING

What does this rule do? Let’s break it down:

  • id: We give our rule a concise descriptive ID for easy reference by any developer who sees it pop up in their editor or CI output.
  • patterns: This is composed of two parts:

    • pattern: This expression tells Semgrep to look for any SQLAlchemy ORM column definition in our codebase (our SQLAlchemy instance is called db in this example) that has a String column type. It also binds the column name to a metavariable named COLUMN.
    • metavariable-regex: This expression tells Semgrep to only report a match if the COLUMN metavariable contains a word fragment like token, email, key, or secret. The regular expression contains lookaround declarations to prevent us from matching unrelated words like keyboard.
  • message: When Semgrep matches our pattern, we want to make sure we explain what the detected problem is, why it is a vulnerability, and how to fix it. Following these best practices for your messages will help developers fix problems independently without confusion (or unnecessary push back).
  • severity: You can assign any severity that makes sense in your domain.

A quick “Deploy to Policy” button-press later, and we could now guarantee that all our web apps are protected over time.

Developers who have integrated Semgrep into their programming workflow via our VS Code extension would also begin to results in their IDE.

Note that this solution is deliberately iterative: we may find more column names that we identify as sensitive, or want to include db.Text types as well. Fortunately that’s a quick revision and re-deploy as needed.

Conclusion

In this post I’ve demonstrated how you—as a developer or as a manager—can use lightweight static analysis, like Semgrep, to help enforce invariants in your code.

At r2c, we habitually turn to Semgrep to prevent ourselves from repeating mistakes:

Accidentally leave the debugger on in a commit? There’s a rule to prevent that.

When we found that importing a certain library slows down program initialization, we wrote a rule to ensure it’s loaded lazily.

And when we found insecure logging—well, you’ve just read that story.