Ally gives back to LangChain AI community with PII Masking module

Harish Naik & Dzmitry Dubarau
Jan 11, 2024
5 min read

Happy New Year! Only two weeks in, and 2024 is already off to a fast start, especially in the world of AI. Now that developers are getting more comfortable exploring the technology, we’re all starting to see potential challenges that will have to be overcome as use cases grow.

As a financial services company, Ally operates in a highly regulated industry where security and customer data privacy is at the center of everything we do. In 2023, our engineering team was challenged to build the Ally.ai platform with code designed to prevent sharing of personally identifiable information (PII) from being shared with any AI application or large language model (LLM). In other words, before any Ally data could be shared, PII had to be removed.

Challenge accepted.

We created a unique coding module — called PII Masking — designed to identify and scrub PII prior to data being shared with any AI application and then store it in a safe place. We’re excited to share that the PII Masking module is now available for download in LangChain’s open source community. Click on the link below to view the module via LangChain:

Masking | 🦜️🔗 Langchain

The code, which is customizable, can be downloaded by developers looking to protect sensitive data or customer PII from being processed through an AI model. This module is the first contribution available via LangChain that addresses a significant challenge for AI developers working with PII in highly regulated, consumer-focused industries.

The module went live on December 7, 2023, and to-date, the engagement tells us that developers needed a bit of help — a catalyst like the PII Masking module — to help them make their data safer. It provides a starting point for organizations that frequently handle customer PII — including those in financial services, healthcare and retail — to build generative AI applications that also protect data.

The Great Thing About Open Source Communities

LangChain, a young but influential company dedicated to advancing AI, has an active open source community that has accelerated the adoption of AI. We have seen value in the LangChain community, including the availability of vast open source code and the helpful interactions among developers. That’s the great thing about open source: there are many ways to contribute and get involved, which is a big reason our team at Ally wanted to share our module.

Ally used code from LangChain to help develop components of the Ally.ai platform, which launched in Summer 2023. Submitting the PII Masking module is our way of paying it forward, of being problem solvers, and uplifting our developer peers as we all explore emerging technologies. We like to think that the responsible use of AI is a goal that we can all get behind, so giving back to others in the community to help in that pursuit is something we’re proud of. Our leader, Sathish Muthukrishnan, put it this way:

“The module contribution to the LangChain community will serve as an example for other organizations who want to harness the power of AI, but in a way that positively and responsibly serves consumers.” — Sathish M, Chief Information, Data and Digital Officer at Ally

This is the point — achieving responsible use of AI will take effort from all of us. From the developers on the ground writing code, to the risk management team, to the data analysts, to leaders challenging us to be even better. Everyone has an actionable role in the pursuit of responsible and ethical AI.

The team at LangChain agrees. In speaking with CEO Harrison Chase, it was clear from the start that they believe strongly in empowering developers and in the democratization of AI development, all with high ethical standards. He shared this comment with us, as our relationship with LangChain has continued to grow:

We built LangChain for the developer. We wanted to make it as easy as possible to develop and deploy LLM-powered apps that could drive real impact. PII filtering & masking is a necessity for companies handling sensitive user data, and with Ally’s collaboration and contributions, there is a module option for users in financial services and other regulated industries to consider. This is a significant step forward in how companies can practice responsible AI.” — Harrison Chase, Co-Founder & CEO of LangChain

More about the PII Masking Module

Starting with open source code from LangChain as a springboard, Ally developers built the PII Masking module to solve a challenge that was apparent in Ally’s first generative AI use case with Customer Care and Experience — almost every customer service call includes some form of PII.

Before a transcript of a customer service call, which usually includes several different types of PII, could be shared with an LLM via the Ally.ai platform, the engineering team had to create a workflow that included four key steps in identifying and masking PII:

Data Cleanse: Normalize data set, clean incoming data
Tokenize: Define sensitive data, attribute tokens to each PII value and store data securely
Redact: Remove sensitive data, send redacted information to the LLM
Rehydrate: Replace tokens coming back into the private ecosystem with the original PII information, which is then stored in a secure manner in compliance with privacy regulations.

Multiple types of PII creates some complication around tokenization. Because PII is varied across sectors, the module was designed to assist in identifying the type of PII that should be masked according to an organization’s needs. There’s two-way communication with the LLM: Removing the PII before the info goes to the LLM, and then reinserting or “rehydrating” the PII when it comes back into the organization’s secured ecosystem.

In the Ally use case, what returns from the LLM is a summary of the customer service call, with the PII included, for documentation and compliance purposes. The LLM was not exposed to PII data during the process. Engineers from any organization can download the PII Masking module from LangChain to see if it can be adapted for use within their AI applications.

We’re delighted to say that the AI engineering work done at Ally in 2023 might make an even bigger impact on the broad AI community this year. By focusing on ways to protect consumer privacy and to keep data secure, we’ll all be in a better place as we set our goals for AI in 2024.

Interested in joining Ally's team of talented technologists to make a difference for our customers and communities? Check outAlly Careersto learn more.