16 August, 2024

Securing Your LLM Applications: Detecting PII in Prompts with LLM-Guard ⚔️

In the era of Generative AI, where Large Language Models (LLMs) are increasingly woven into the fabric of modern applications, the need for safeguarding sensitive information has never been more critical. As organizations integrate LLMs into their workflows, the challenge of detecting and anonymizing Personally Identifiable Information (PII) in prompts — especially those processed in-memory — becomes paramount. Enter LLM Guard, your AI-powered sentinel in the battle against data exposure.

Why PII Matters: The Stakes Are High

PII is the linchpin of an individual’s digital identity. It’s not just a collection of data points; it’s a digital fingerprint that, when mishandled, can lead to dire consequences. Whether it’s GDPR or HIPAA, global regulations demand stringent measures for PII protection. But what happens when this sensitive information makes its way into LLM prompts? If left unchecked, it could inadvertently proliferate across storage points, model training datasets, and third-party services, amplifying the risk of data breaches and privacy violations.

Anatomy of a Privacy Breach: The Attack Surface

Consider this scenario: A company uses an LLM to automate customer service. The prompts sent to the LLM contain user queries, some of which include PII like names, addresses, or credit card numbers. If the model provider stores these prompts, either for improving the model or for other purposes, the PII is now exposed to risks outside of the company’s control. This scenario underscores the importance of ensuring that any PII in prompts is detected and anonymized before it ever reaches the LLM.

The Role of Anonymize Scanner in LLM Guard

The Anonymize Scanner within LLM Guard acts as your digital guardian, scrutinizing prompts in real-time and ensuring they remain free from PII. Here’s how it works:

  • PII Detection: The scanner identifies PII entities across various categories, including credit card numbers, personal names, phone numbers, URLs, email addresses, IP addresses, UUIDs, social security numbers, crypto wallet addresses, and IBAN codes.
  • Anonymization: Once detected, the scanner can anonymize or redact this information, ensuring that the LLM only processes sanitized data.
  • In-Memory Operation: Unlike traditional methods that might focus on data at rest or in transit, LLM Guard operates directly on prompts in RAM, offering real-time protection without the latency of I/O operations.

Understanding PII Entities: A Closer Look

Let’s break down the types of PII the Anonymize Scanner targets:

  • Credit Cards: The scanner is adept at identifying credit card formats, including specific patterns like those for Visa or American Express.
  • Names: It recognizes full names, including first, middle, and last names, ensuring that any personally identifiable name data is flagged.
  • Phone Numbers: From a simple 10-digit number to complex international formats, the scanner is equipped to detect various phone number patterns.
  • URLs & Email Addresses: Whether it’s a URL pointing to a personal site or an email address in a customer query, the scanner can identify and anonymize these elements.
  • IP Addresses: Both IPv4 and IPv6 addresses are within the scanner’s purview, crucial for applications dealing with network data.
  • UUIDs & Social Security Numbers: Unique identifiers and social security numbers are some of the most sensitive data types, and the scanner is finely tuned to detect these.
  • Crypto Wallets & IBANs: As financial transactions become more digital, detecting and anonymizing crypto wallet addresses and IBAN codes is vital to prevent fraud and ensure compliance.

Detecting PII in Prompts Using LLM Guard

To illustrate the power of LLM Guard, let’s walk through a sample yet practical example. Below is a Python snippet that demonstrates how to use the Anonymize scanner to detect and redact PII in a prompt before it is processed by an LLM:

# Install the necessary package
pip install llm_guard

# Importing required modules
from llm_guard import scan_prompt
from llm_guard.input_scanners import Anonymize

# Define a prompt that contains PII
prompt = """Make an SQL insert statement to add a new user to our database.
Name is John Doe.
Email is test@test.com.
Phone number is 555-123-4567."""


# Scan the prompt for PII and sanitize it
sanitized_prompt, results_valid, results_score = scan_prompt(input_scanners, prompt)

# Check if the prompt contains any invalid PII data
if any(results_valid.values()) is False:
print(f"Prompt {prompt} is not valid, scores: {results_score}")
exit(1)

# Output the sanitized prompt
print(f"Prompt: {sanitized_prompt}")

Output:


Explanation:

  • The scan_prompt function utilizes the Anonymize scanner to inspect the prompt for any PII entities.
  • The scanner detects sensitive information such as names, email addresses, and phone numbers, replacing them with placeholders like [REDACTED_PERSON_1], [REDACTED_EMAIL_ADDRESS_1], and [REDACTED_PHONE_NUMBER_1].
  • The sanitized prompt is then printed, showcasing how the PII has been effectively anonymized.

This real-time detection and redaction process is crucial for maintaining the privacy and security of your data when interfacing with LLMs.

Fortifying Your LLM Applications

Integrating LLM Guard into your application stack is more than just a compliance checkbox; it’s a proactive stance in the ongoing battle for data privacy. By ensuring that PII is detected and anonymized in-memory, before it even has a chance to be processed by an LLM, you’re not only protecting your users but also fortifying your applications against potential breaches.

In a world where AI is becoming omnipresent, LLM Guard offers a robust solution to a critical problem. Don’t wait for a breach to prioritize PII protection — make it a cornerstone of your LLM deployment strategy.

Stay tuned for more on LLM-Guard guardrail features. 🙂

03 July, 2024

✨Building a Smart Image Parser with .NET, Semantic Kernel, and GPT-4o 🧿

Hey there, tech enthusiasts! Today, we’re diving into the magical world of AI, .NET, and the Semantic Kernel. Imagine a world where you provide an image URL, and in just a few lines of code, your program analyzes every tiny detail in that image. Sounds cool, right? Well, buckle up, because we’re about to embark on a fun journey to create exactly that!


The Magic Ingredients

Here’s what we’ll be using:

  • .NET (because we love a sturdy framework)
  • Semantic Kernel (because semantics matter, folks)
  • GPT-4o (the brain behind the operation)

Without further ado, let’s jump into the code snippets and see how we can make this happen.

The Controller — Where the Magic Begins

First, we need to set up our controller. This is where all the action happens. Let’s take a look at the code:


using Microsoft.AspNetCore.Mvc;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace AzureSemanticKernel.AI.Controllers
{
public class OpenAiController : Controller
{
private string open_ai_key = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
private string open_ai_org = Environment.GetEnvironmentVariable("OPEN_AI_ORG_ID");

[HttpPost]
public async Task<string> GetGpt4oImageResponse(string imgUrl)
{
try
{
var kernelBuilder = Kernel.CreateBuilder();

kernelBuilder.AddOpenAIChatCompletion("gpt-4o", open_ai_key, open_ai_org);

var kernel = kernelBuilder.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory();
history.AddSystemMessage("You are a friendly and helpful assistant that responds to questions directly");

var message = new ChatMessageContentItemCollection
{
new TextContent("Can you do a detail analysis and tell me all the minute details that present in this image?"),
new ImageContent(new Uri(imgUrl))
};

history.AddUserMessage(message);

var result = await chat.GetChatMessageContentAsync(history);

return result.Content;
}
catch (Exception ex)
{
return ex.Message;
}
}
}
}

Breaking Down the Magic

Let’s dissect this code and understand what each part does.

1. Namespace and Using Statements:

  • Just the usual suspects. Nothing to see here. Add those packages and move on!

2. Controller Setup:

  • We create an OpenAiController and define our API key and organization ID from environment variables. Remember, hard-coding keys is like leaving your front door open with a "Please Rob Me" sign.

3. GetGpt4oImageResponse Method:

  • This is where the fun begins! We define an asynchronous method to process the image URL.
  • Kernel Creation: We create a kernel builder and add our GPT-4o chat completion service. Think of the kernel as the magical cauldron where all the ingredients mix together.
  • Chat Service: We get the chat service from the kernel. This service is like our friendly neighborhood barista who knows exactly how we like our coffee.
  • Chat History: We initialize the chat history and add a system message to set the tone for our AI assistant. We want our assistant to be friendly and helpful, just like your favorite support agent.
  • User Message: We create a message collection with a text prompt and the image URL. This is like giving the barista your order — “I want a detailed analysis with extra foam, please!”
  • Response Handling: We send the message to the chat service and wait for the response. If all goes well, we return the content. If something goes wrong, we catch the exception and return the error message (because nobody likes an unhandled exception ruining the party).

Configuration — The Secret Sauce

Finally, let’s not forget the configuration setup in Program.cs:

using DotNetEnv;

Env.Load();

This simple line ensures our environment variables are loaded, so our API keys and organization IDs are safely tucked away. It’s like having a secret recipe locked in a vault.

Wrapping Up

And there you have it! In just a few lines of code, we’ve created a powerful image parser that uses .NET, Semantic Kernel, and GPT-4o to analyze images and return detailed descriptions. Now you can impress your friends, family, and maybe even your boss with your new found AI prowess.

Remember, the key to keeping your code exciting is to blend functionality with a dash of humor and a sprinkle of creativity. Happy coding, and may your bugs be few and your features be many! 🎉

28 June, 2024

✨Building an Engaging Chat Program with Azure OpenAI and .NET Semantic Kernel 🧿

In the ever-evolving landscape of artificial intelligence and cloud computing, creating interactive and intelligent applications has become more accessible and powerful. Today, we’ll dive into the fascinating world of integrating Azure OpenAI with .NET Semantic Kernel to build a simple yet engaging chat program. This blog post will walk you through the key components of the application, highlighting the elegance and efficiency of the integration.


 

Setting the Stage: Loading Configuration

The first step in our journey is setting up the environment configuration. Using the DotNetEnv library, we can effortlessly load environment variables from a .env file. This approach keeps our sensitive information secure and makes the configuration process seamless.

using DotNetEnv;

Env.Load();

This line of code ensures that our application can access the necessary Azure OpenAI credentials and other configuration details stored in the .env file.

Initializing the Kernel: The Common Class

The heart of our chat program lies in the initialization of the Semantic Kernel. The Common class is responsible for setting up the kernel using the credentials and model details from the environment variables.

using Microsoft.SemanticKernel;

namespace AzureSemanticKernel.AI
{
public static class Common
{
private static string key = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY");
private static string model = Environment.GetEnvironmentVariable("AZURE_OPENAI_MODEL");
private static string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");

public static Kernel initializeKernel()
{
var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.Services.AddAzureOpenAIChatCompletion(model, endpoint, key);
var kernel = kernelBuilder.Build();

return kernel;
}
}
}

In this class, we retrieve the API key, model, and endpoint from the environment variables and use them to configure the kernel. The initializeKernel method creates and returns a fully initialized kernel ready to handle chat requests.

Bringing It All Together: The Main Controller

Now that we have our configuration and kernel setup, let’s look at how we can create a simple yet powerful chat controller. The SimpleController class is where the magic happens. It receives user messages, processes them using the Semantic Kernel, and returns the responses.

using Microsoft.AspNetCore.Mvc;
using Microsoft.SemanticKernel;

namespace AzureSemanticKernel.AI.Controllers
{
public class SimpleController : Controller
{
[HttpPost]
public async Task<string> GetChatKernelResponse(string userMessage)
{
try
{
var kernel = Common.initializeKernel();

var result = await kernel.InvokePromptAsync(userMessage);

return result.ToString();
}
catch (Exception ex)
{
return ex.Message;
}
}
}
}

The GetChatKernelResponse action handles POST requests with user messages. It initializes the kernel, invokes the chat completion prompt, and returns the response. Error handling ensures that any exceptions are caught and their messages are returned to the user.

 
 
 

Conclusion

By leveraging Azure OpenAI and .NET Semantic Kernel, we’ve created a simple yet robust chat program with just few lines of code. This integration demonstrates the power and flexibility of modern AI and cloud technologies. Whether you’re a seasoned developer or just starting, this project showcases how easily you can build engaging and intelligent applications.

As AI continues to evolve, the possibilities for creating interactive and responsive applications are endless. Dive into the code, experiment with different models and prompts, and unleash the full potential of Azure OpenAI and .NET Semantic Kernel in your projects. Happy coding!

 

24 June, 2024

🎉 Automating Disaster Recovery for Azure Service Bus: A Seamless Solution ✨

Disaster recovery is a critical component of any robust IT infrastructure. For Azure Service Bus, a highly reliable cloud messaging service, automating the failover process is crucial for minimizing downtime and ensuring business continuity. While the Azure portal provides an easy way to initiate a failover with a click of a button, relying on manual intervention is not ideal for enterprise-grade solutions. Automation is the key.


In this blog post, we’ll explore an automated approach to handling disaster recovery for Azure Service Bus using a PowerShell script. This script seamlessly initiates the failover process and manages the underlying tasks, making it easier for infrastructure architects to ensure continuity without manual intervention.

Why Automate Service Bus Failover?

Azure Service Bus offers geo-disaster recovery capabilities that ensure high availability and data protection. However, initiating a failover manually can be risky and not recommended in production scenarios. Automation provides several benefits:

  1. Consistency: Automated scripts ensure that the failover process is executed the same way every time, reducing the risk of human error.
  2. Speed: Automation can trigger failover processes instantly, minimizing downtime.
  3. Efficiency: Automated processes can handle multiple tasks simultaneously, such as reconfiguring namespaces and cleaning up resources.

Understanding the Burn Down of the Primary Namespace

One critical aspect of Azure Service Bus failover is that after the failover, the primary namespace is essentially “burned down.” This means that the primary namespace becomes inactive, and all its entities, such as queues and topics, need to be cleaned up. This cleanup process ensures that the primary namespace is ready to be reconfigured or decommissioned.

The PowerShell Script

Let’s dive into the PowerShell script that automates the Azure Service Bus failover process. This script ensures that the primary and secondary namespaces are provisioned and manages the failover with minimal manual intervention.

Step 1: Parameter Initialization

First, we set the necessary parameters, such as subscription ID, resource group name, primary and secondary namespaces, and the alias name.

$connection = Connect-AzAccount -ErrorAction Stop
Write-Host "Connected to Azure successfully." -ForegroundColor Yellow

#************** Parameters ********************************************************************************************************************
$subscriptionId = ""
$resourceGroupName = ""
$sbusPrimaryNamespace = ""
$sbusSecondaryNamespace = ""
$sbusAliasName = ""
$partnerId = "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.ServiceBus/ ` namespaces/$sbusPrimaryNamespace"
#***********************************************************************************************************************************************

Step 2: Provisioning Functions

We define functions to ensure that both namespaces are fully provisioned before and after the failover. These functions poll the provisioning state and wait until it reaches the ‘Succeeded’ state for both normal and Geo-provisioned.

function Wait-ForNamespaceProvisioning {
param (
[string]$resourceGroupName,
[string]$namespaceName
)

$maxRetries = 30
$retryCount = 0
$delay = 60 # Delay in seconds between retries

do {
$namespace = Get-AzServiceBusNamespace -ResourceGroupName $resourceGroupName -NamespaceName $namespaceName
if ($namespace.ProvisioningState -eq "Succeeded") {
Write-Output "Namespace $namespaceName is provisioned."
return
}

Write-Output "Namespace $namespaceName is still in provisioning state: $($namespace.ProvisioningState)."
Start-Sleep -Seconds $delay
$retryCount++

} while ($namespace.ProvisioningState -ne "Succeeded" -and $retryCount -lt $maxRetries)

if ($namespace.ProvisioningState -ne "Succeeded") {
throw "Namespace $namespaceName did not reach 'Succeeded' state within the allotted time."
}
}

function Wait-ForNamespaceGeoProvisioning {
param (
[string]$resourceGroupName,
[string]$namespaceName
)

$maxRetries = 30
$retryCount = 0
$delay = 60 # Delay in seconds between retries

do {
$namespace = Get-AzServiceBusGeoDRConfiguration -ResourceGroupName $resourceGroupName ` -NamespaceName $namespaceName
        if ($namespace.ProvisioningState -eq "Succeeded") {
Write-Output "Namespace $namespaceName is geo provisioned."
return
}

if($null -eq $namespace){
Write-Output "Namespace $namespaceName is geo provisioned."
return
}

Write-Output "Namespace $namespaceName is still in geo provisioning state: $($namespace.ProvisioningState)."
Start-Sleep -Seconds $delay
$retryCount++

} while ($namespace.ProvisioningState -ne "Succeeded" -and $retryCount -lt $maxRetries)

if ($namespace.ProvisioningState -ne "Succeeded") {
throw "Namespace $namespaceName did not reach 'Succeeded' state within the allotted time."
}
}

Step 3: Combined Provisioning Function

This function ensures that both the primary and secondary namespaces are provisioned.

function Wait-ForNamespaceAndGeoProvisining {
param (
[string]$resourceGroupName,
[string]$PrimaryNamespace,
[string]$SecondaryNamespace
)

Wait-ForNamespaceProvisioning -resourceGroupName $resourceGroupName -namespaceName $PrimaryNamespace
Wait-ForNamespaceProvisioning -resourceGroupName $resourceGroupName -namespaceName $SecondaryNamespace

Wait-ForNamespaceGeoProvisioning -resourceGroupName $resourceGroupName -namespaceName $PrimaryNamespace
Wait-ForNamespaceGeoProvisioning -resourceGroupName $resourceGroupName -namespaceName $SecondaryNamespace

return
}

Step 4: Initiate Failover

This section of the script handles the failover process itself, ensuring both namespaces are ready, initiating the failover, and then performing post-failover cleanup and reconfiguration.

#************** Initiate failover **************************************

Wait-ForNamespaceAndGeoProvisining -resourceGroupName $resourceGroupName -PrimaryNamespace $sbusPrimaryNamespace ` -SecondaryNamespace $sbusSecondaryNamespace
Write-Output "`nFailing Over : Azure Service Bus $sbusPrimaryNamespace ...`n"

Set-AzServiceBusGeoDRConfigurationFailOver `
-Name $sbusAliasName `
-ResourceGroupName $resourceGroupName `
-NamespaceName $sbusSecondaryNamespace `

Wait-ForNamespaceAndGeoProvisining -resourceGroupName $resourceGroupName -PrimaryNamespace $sbusPrimaryNamespace ` -SecondaryNamespace $sbusSecondaryNamespace

Step 5: Cleanup After Failover

Post-failover, we delete all queues in the primary namespace to clean up resources. This is crucial since the primary namespace will be “burned down” after the failover.

Write-Output "`nDeleting all queues in the primary $sbusPrimaryNamespace ..."

$queues = Get-AzServiceBusQueue -ResourceGroupName $resourceGroupName -NamespaceName $sbusPrimaryNamespace
foreach ($queue in $queues) {
Remove-AzServiceBusQueue `
-ResourceGroupName $resourceGroupName `
-NamespaceName $sbusPrimaryNamespace `
-QueueName $queue.Name

Write-Host "Deleted queue: $($queue.Name)"
}

Wait-ForNamespaceAndGeoProvisining -resourceGroupName $resourceGroupName -PrimaryNamespace $sbusPrimaryNamespace ` -SecondaryNamespace $sbusSecondaryNamespace

Step 6: Reconfiguration

After cleaning up the primary namespace, the script sets the alias back to the secondary namespace.

Write-Output "`nSetting the alias back after failover ..."

New-AzServiceBusGeoDRConfiguration `
-Name $sbusAliasName `
-NamespaceName $sbusSecondaryNamespace `
-ResourceGroupName $resourceGroupName `
-PartnerNamespace $partnerId

Wait-ForNamespaceAndGeoProvisining -resourceGroupName $resourceGroupName -PrimaryNamespace $sbusPrimaryNamespace ` -SecondaryNamespace $sbusSecondaryNamespace
Write-Output "`nService Bus failover process completed successfully !!"

#************** DONE ************************************************************

Disconnect-AzAccount

Full codebase:

https://github.com/AtanuGpt/AzureServiceBusDR/blob/main/failover.ps1

Explanation of the Failover Process

  1. Initial Provisioning Check: The script first ensures that both the primary and secondary namespaces are fully provisioned before initiating the failover.
  2. Initiate Failover: The failover process is initiated using Set-AzServiceBusGeoDRConfigurationFailOver, which switches the alias to the secondary namespace.
  3. Cleanup Primary Namespace: After the failover, the script deletes all queues in the primary namespace. This step is critical because the primary namespace is effectively “burned down” or rendered inactive and cleaned up. This involves removing all entities (queues, topics, etc.) in the primary namespace.
  4. Reconfiguration: The alias is reconfigured to point to the secondary namespace, ensuring continued operation.
  5. Final Check: The script performs a final check to ensure both namespaces are in the correct state post-failover.

Conclusion

Automating the disaster recovery process for Azure Service Bus is not just a convenience — it’s a necessity for maintaining high availability and ensuring business continuity. This PowerShell script provides a comprehensive solution, making the failover process seamless and efficient. By adopting automation, you can ensure that your Azure Service Bus environment is always prepared for any eventuality, keeping your services running smoothly even in the face of disruptions.


18 June, 2024

🚀 Revolutionizing Document Interaction: An AI-Powered PDF Voice-2-Voice Chatbot Using LlamaIndex 🐑, Langchain 🔗 Azure AI Speech 🎤and Google Audio 🔊

Welcome to the third and last installment of this series on innovative AI-driven chatbots. In our journey from basic text-based interactions to text-2-voice-enabled capabilities, we now introduce a voice-2-voice-enabled PDF chatbot. This advanced system allows users to interact verbally with their PDF documents, significantly enhancing accessibility and usability. Let’s explore how this chatbot works and its implications for users.

Evolution of Chatbots: From Text to Voice

In our initial blog post, we introduced a text-based chatbot capable of processing queries from uploaded documents. This laid the groundwork for seamless interaction with textual information.

http://techiemate.blogspot.com/2024/06/revolutionizing-document-interaction-ai.html

Building on this, our second post showcased text to voice recognition integration, an interactive voice assistant capable of understanding and reading out the answer from the content in your PDFs. This enhancement marked a significant leap towards intuitive user engagement, catering to diverse user preferences and accessibility needs.

http://techiemate.blogspot.com/2024/06/revolutionizing-document-interaction-ai_13.html 

Introducing Voice-to-Voice Interaction with PDFs

Today, we introduce our latest innovation: a voice-enabled PDF chatbot capable of both transcribing spoken queries and delivering spoken responses directly from PDF documents. This breakthrough technology bridges traditional document interaction with modern voice-driven interfaces, offering a transformative user experience.

The Technical Backbone: Exploring the Codebase

Let’s delve into the technical components that power our voice-enabled PDF chatbot:

Setting Up Dependencies and Environment

import os
import faiss
import streamlit as st
from dotenv import load_dotenv
from langchain_core.messages import AIMessage, HumanMessage
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.vector_stores.faiss import FaissVectorStore
import azure.cognitiveservices.speech as speechsdk
import speech_recognition as sr

# Initialize Faiss index for vectorization
d = 1536
faiss_index = faiss.IndexFlatL2(d)
PERSIST_DIR = "./storage"

# Load environment variables
load_dotenv()

The code snippet above sets up necessary dependencies:

  • Faiss: Utilized for efficient document vectorization, enabling similarity search based on content.
  • Streamlit: Facilitates the user interface for seamless interaction with the chatbot and document upload functionality.
  • LangChain: Powers the message handling and communication within the chatbot interface.
  • LlamaIndex: Manages the storage and retrieval of vectorized document data, optimizing query performance.
  • Azure Cognitive Services (Speech SDK): Provides capabilities for speech recognition and synthesis, enabling the chatbot to transcribe and respond to spoken queries.
  • Google Audio : Provides speech into text using Google’s speech recognition API.

Document Handling and Vectorization

def saveUploadedFiles(pdf_docs):
UPLOAD_DIR = 'uploaded_files'
try:
for pdf in pdf_docs:
file_path = os.path.join(UPLOAD_DIR, pdf.name)
with open(file_path, "wb") as f:
f.write(pdf.getbuffer())
return "Done"
except:
return "Error"

def doVectorization():
try:
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader("./uploaded_files").load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
index.storage_context.persist()
return "Done"
except:
return "Error"

The saveUploadedFiles function saves PDF documents uploaded by users to a designated directory (uploaded_files). The doVectorization function utilizes Faiss to vectorize these documents, making them searchable based on content similarities.

Speech Recognition and Transcription

def transcribe_audio():
recognizer = sr.Recognizer()
microphone = sr.Microphone()

with microphone as source:
recognizer.adjust_for_ambient_noise(source)
audio = recognizer.listen(source, timeout=20)

st.write("🔄 Transcribing...")

try:
text = recognizer.recognize_google(audio)
return text
except sr.RequestError:
return "API unavailable or unresponsive"
except sr.UnknownValueError:
return "Unable to recognize speech"

The transcribe_audio function uses the speech_recognition library to capture spoken queries from users via their microphone. It adjusts for ambient noise and listens for up to 20 seconds before transcribing the speech into text using Google's speech recognition API.

Querying and Fetching Data

def fetchData(user_question):
try:
vector_store = FaissVectorStore.from_persist_dir("./storage")
storage_context = StorageContext.from_defaults(
vector_store=vector_store, persist_dir=PERSIST_DIR
)
index = load_index_from_storage(storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query(user_question)
return str(response)
except:
return "Error"

The fetchData function retrieves relevant information from vectorized documents based on user queries. It loads the persisted Faiss index from storage and queries it to find and return the most relevant information matching the user's question.

Defining the Welcome Message

The WelcomeMessage variable contains a multi-line string that introduces users to the voice-enabled PDF chatbot. It encourages them to upload PDF documents and start asking questions:

WelcomeMessage = """
Hello, I am your PDF voice chatbot. Please upload your PDF documents and start asking questions to me.
I would try my best to answer your questions from the documents.
"""

This message serves as the initial greeting when users interact with the chatbot, providing clear instructions on how to proceed.

Initializing Chat History and Azure Speech SDK Configuration

The code block initializes the chat history and sets up configurations for Azure Speech SDK:

if "chat_history" not in st.session_state:
st.session_state.chat_history = [
AIMessage(content=WelcomeMessage)
]

AZURE_SPEECH_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.getenv("AZURE_SPEECH_REGION")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SPEECH_REGION)
speech_config.speech_synthesis_voice_name = "en-US-AriaNeural"
speech_config.speech_synthesis_language = "en-US"

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
  • Chat History Initialization: Checks if chat_history exists in the Streamlit session state. If not, it initializes it with the WelcomeMessage wrapped in an AIMessage object. This ensures that the chat starts with the welcome message displayed to the user.
  • Azure Speech SDK Configuration: Retrieves Azure Speech API key and region from environment variables (AZURE_SPEECH_KEY and AZURE_SPEECH_REGION). It sets up a SpeechConfig object for speech synthesis (speech_synthesizer). The voice name and language are configured to use the "en-US-AriaNeural" voice for English (US).

Streamlit Integration: User Interface Design

def main():
load_dotenv()

st.set_page_config(
page_title="Chat with multiple PDFs",
page_icon=":sparkles:"
)

st.header("Chat with single or multiple PDFs :sparkles:")

for message in st.session_state.chat_history:
if isinstance(message, AIMessage):
with st.chat_message("AI"):
st.markdown(message.content)
elif isinstance(message, HumanMessage):
with st.chat_message("Human"):
st.markdown(message.content)

with st.sidebar:
st.subheader("Your documents")
pdf_docs = st.file_uploader(
"Upload your PDFs here and click on 'Process'",
accept_multiple_files=True
)

if st.button("Process"):
with st.spinner("Processing"):
IsFilesSaved = saveUploadedFiles(pdf_docs)
if IsFilesSaved == "Done":
IsVectorized = doVectorization()
if IsVectorized == "Done":
st.session_state.isPdfProcessed = "done"
st.success("Done!")
else:
st.error("Error! in vectorization")
else:
st.error("Error! in saving the files")

if st.button("Start Asking Question"):
st.write("🎤 Recording started...Ask your question")
transcription = transcribe_audio()
st.write("✅ Recording ended")

st.session_state.chat_history.append(HumanMessage(content=transcription))

with st.chat_message("Human"):
st.markdown(transcription)

with st.chat_message("AI"):
with st.spinner("Fetching data ..."):
response = fetchData(transcription)
st.markdown(response)

result = speech_synthesizer.speak_text_async(response).get()
st.session_state.chat_history.append(AIMessage(content=response))

if "WelcomeMessage" not in st.session_state:
st.session_state.WelcomeMessage = WelcomeMessage
result = speech_synthesizer.speak_text_async(WelcomeMessage).get()

#============================================================================================================
if __name__ == '__main__':
main()

User Experience: Seamless Interaction

Imagine uploading a collection of PDF documents — research papers, technical manuals, or reports — and simply speaking your questions aloud. The chatbot not only transcribes your speech but also responds audibly, providing immediate access to relevant information. This seamless interaction is particularly beneficial for users with visual impairments or those multitasking who prefer auditory information consumption.

Demo 💬🎤🤖


 

Enhancing Accessibility and Efficiency

Our voice-enabled PDF chatbot represents a significant advancement in accessibility technology. By integrating speech recognition, document vectorization, and AI-driven query processing, we empower users to effortlessly interact with complex information. This technology not only enhances accessibility but also boosts efficiency by streamlining the process of retrieving information from documents.

Conclusion: Paving the Way Forward

As we continue to explore the capabilities of AI in enhancing user experiences, the voice-enabled PDF chatbot stands as a testament to innovation in accessibility and usability. Whether you’re a researcher seeking insights from academic papers or a professional referencing technical documents, this technology promises to revolutionize how we interact with information.

Stay tuned as we push the boundaries further, exploring new applications and advancements in AI-driven technology. Stay connected and yes Happy Coding ! 😊

13 June, 2024

🚀 Revolutionizing Document Interaction: An AI-Powered PDF Text-2-Voice Chatbot Using LlamaIndex 🐑, Langchain 🔗 and Azure AI Speech 🔊

 In this blog post, we’re diving into the creation of an intelligent PDF voice chatbot using cutting-edge technologies like LangChain, LlamaIndex, and Azure AI Speech. This isn’t just another chatbot; it’s an interactive voice assistant capable of understanding and reading out the answer from the content in your PDFs. This project is a step up from my previous blog post where we explored building a text-based PDF chatbot without the voice functionality. If you missed that, be sure to check it out here

Technologies Used

  1. LangChain: For chaining language models and building complex applications.
  2. LlamaIndex: To index and query documents efficiently.
  3. Azure AI Speech: For speech synthesis, giving our bot a human-like voice.
  4. Streamlit: To create a user-friendly web interface.

Let’s jump into the code and see how these technologies come together to create our voice assistant chatbot.

Setting Up the Environment

First, ensure you have all the necessary libraries installed. You can do this by running:

pip install os faiss-cpu streamlit python-dotenv azure-cognitiveservices-speech langchain llama_index

Also, make sure you have your Azure Cognitive Services API keys ready.

The Code

Here’s the complete code for our voice chatbot:

import os
import faiss
import streamlit as st
from dotenv import load_dotenv
from langchain_core.messages import AIMessage, HumanMessage
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.vector_stores.faiss import FaissVectorStore
import azure.cognitiveservices.speech as speechsdk

d = 1536
faiss_index = faiss.IndexFlatL2(d)
PERSIST_DIR = "./storage"

AZURE_SPEECH_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.getenv("AZURE_SPEECH_REGION")
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")

def saveUploadedFiles(pdf_docs):
UPLOAD_DIR = 'uploaded_files'
try:
for pdf in pdf_docs:
file_path = os.path.join(UPLOAD_DIR, pdf.name)
with open(file_path, "wb") as f:
f.write(pdf.getbuffer())
return "Done"
except:
return "Error"

def doVectorization():
try:
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader("./uploaded_files").load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
index.storage_context.persist()
return "Done"
except:
return "Error"

def fetchData(user_question):
try:
vector_store = FaissVectorStore.from_persist_dir("./storage")
storage_context = StorageContext.from_defaults(
vector_store=vector_store, persist_dir=PERSIST_DIR
)
index = load_index_from_storage(storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query(user_question)
return str(response)
except:
return "Error"

WelcomeMessage = """
Hello, I am your PDF voice chatbot. Please upload your PDF documents and start asking questions.
I will try my best to answer your questions based on the documents.
"""


if "chat_history" not in st.session_state:
st.session_state.chat_history = [
AIMessage(content=WelcomeMessage)
]

AZURE_SPEECH_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.getenv("AZURE_SPEECH_REGION")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SPEECH_REGION)
speech_config.speech_synthesis_voice_name = "en-US-AriaNeural"
speech_config.speech_synthesis_language = "en-US"

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

def main():
load_dotenv()

st.set_page_config(
page_title="Chat with multiple PDFs",
page_icon=":sparkles:"
)

st.header("Chat with single or multiple PDFs :sparkles:")

for message in st.session_state.chat_history:
if isinstance(message, AIMessage):
with st.chat_message("AI"):
st.markdown(message.content)
elif isinstance(message, HumanMessage):
with st.chat_message("Human"):
st.markdown(message.content)

with st.sidebar:
st.subheader("Your documents")
pdf_docs = st.file_uploader(
"Upload your PDFs here and click on 'Process'",
accept_multiple_files=True
)

if st.button("Process"):
with st.spinner("Processing"):
IsFilesSaved = saveUploadedFiles(pdf_docs)
if IsFilesSaved == "Done":
IsVectorized = doVectorization()
if IsVectorized == "Done":
st.session_state.isPdfProcessed = "done"
st.success("Done!")
else:
st.error("Error! in vectorization")
else:
st.error("Error! in saving the files")

user_question = st.chat_input("Ask a question about your document(s):")

if user_question is not None and user_question != "":
st.session_state.chat_history.append(HumanMessage(content=user_question))

with st.chat_message("Human"):
st.markdown(user_question)

with st.chat_message("AI"):
with st.spinner("Fetching data ..."):
response = fetchData(user_question)
st.markdown(response)

result = speech_synthesizer.speak_text_async(response).get()
st.session_state.chat_history.append(AIMessage(content=response))

if "WelcomeMessage" not in st.session_state:
st.session_state.WelcomeMessage = WelcomeMessage
result = speech_synthesizer.speak_text_async(WelcomeMessage).get()

if __name__ == '__main__':
main()

Breaking Down the Code

Environment Setup

We start by importing necessary libraries and loading environment variables using dotenv. The .env file :

OPENAI_API_KEY = ""
AZURE_OPENAI_API_KEY = ""
AZURE_SPEECH_KEY = ""
AZURE_SPEECH_REGION = ""

File Upload and Vectorization

The saveUploadedFiles function handles file uploads, saving PDFs to a directory. The doVectorization function uses Llama_Index and FAISS to create a searchable index from the uploaded documents.

Fetching Data

The fetchData function retrieves answers to user questions by querying the vector index.

Voice Synthesis

We use Azure AI Speech service to convert text responses into speech. This involves configuring the speechsdk.SpeechConfig and synthesizing speech with speechsdk.SpeechSynthesizer.

Streamlit Interface

Streamlit makes it easy to build a web interface. The sidebar allows users to upload and process PDFs. The main chat interface displays the conversation and handles user inputs.

Running the App

Run the Streamlit app using:

streamlit run your_script_name.py

Upload your PDF documents, ask questions, and listen as the bot responds with both text and voice.


 

Conclusion

By combining LangChain, LlamaIndex, and Azure AI, we’ve created an intelligent PDF voice chatbot that can make interacting with documents more engaging and accessible. Whether you’re using it for research, studying, or just exploring, this project showcases the potential of modern AI and NLP technologies.

Don’t forget to check out the previous blog post for a text-only version of this chatbot. Happy coding!