13 June, 2024

🚀 Revolutionizing Document Interaction: An AI-Powered PDF Text-2-Voice Chatbot Using LlamaIndex 🐑, Langchain 🔗 and Azure AI Speech 🔊

 In this blog post, we’re diving into the creation of an intelligent PDF voice chatbot using cutting-edge technologies like LangChain, LlamaIndex, and Azure AI Speech. This isn’t just another chatbot; it’s an interactive voice assistant capable of understanding and reading out the answer from the content in your PDFs. This project is a step up from my previous blog post where we explored building a text-based PDF chatbot without the voice functionality. If you missed that, be sure to check it out here

Technologies Used

  1. LangChain: For chaining language models and building complex applications.
  2. LlamaIndex: To index and query documents efficiently.
  3. Azure AI Speech: For speech synthesis, giving our bot a human-like voice.
  4. Streamlit: To create a user-friendly web interface.

Let’s jump into the code and see how these technologies come together to create our voice assistant chatbot.

Setting Up the Environment

First, ensure you have all the necessary libraries installed. You can do this by running:

pip install os faiss-cpu streamlit python-dotenv azure-cognitiveservices-speech langchain llama_index

Also, make sure you have your Azure Cognitive Services API keys ready.

The Code

Here’s the complete code for our voice chatbot:

import os
import faiss
import streamlit as st
from dotenv import load_dotenv
from langchain_core.messages import AIMessage, HumanMessage
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.vector_stores.faiss import FaissVectorStore
import azure.cognitiveservices.speech as speechsdk

d = 1536
faiss_index = faiss.IndexFlatL2(d)
PERSIST_DIR = "./storage"

AZURE_SPEECH_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.getenv("AZURE_SPEECH_REGION")
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")

def saveUploadedFiles(pdf_docs):
UPLOAD_DIR = 'uploaded_files'
try:
for pdf in pdf_docs:
file_path = os.path.join(UPLOAD_DIR, pdf.name)
with open(file_path, "wb") as f:
f.write(pdf.getbuffer())
return "Done"
except:
return "Error"

def doVectorization():
try:
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader("./uploaded_files").load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
index.storage_context.persist()
return "Done"
except:
return "Error"

def fetchData(user_question):
try:
vector_store = FaissVectorStore.from_persist_dir("./storage")
storage_context = StorageContext.from_defaults(
vector_store=vector_store, persist_dir=PERSIST_DIR
)
index = load_index_from_storage(storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query(user_question)
return str(response)
except:
return "Error"

WelcomeMessage = """
Hello, I am your PDF voice chatbot. Please upload your PDF documents and start asking questions.
I will try my best to answer your questions based on the documents.
"""


if "chat_history" not in st.session_state:
st.session_state.chat_history = [
AIMessage(content=WelcomeMessage)
]

AZURE_SPEECH_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.getenv("AZURE_SPEECH_REGION")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SPEECH_REGION)
speech_config.speech_synthesis_voice_name = "en-US-AriaNeural"
speech_config.speech_synthesis_language = "en-US"

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

def main():
load_dotenv()

st.set_page_config(
page_title="Chat with multiple PDFs",
page_icon=":sparkles:"
)

st.header("Chat with single or multiple PDFs :sparkles:")

for message in st.session_state.chat_history:
if isinstance(message, AIMessage):
with st.chat_message("AI"):
st.markdown(message.content)
elif isinstance(message, HumanMessage):
with st.chat_message("Human"):
st.markdown(message.content)

with st.sidebar:
st.subheader("Your documents")
pdf_docs = st.file_uploader(
"Upload your PDFs here and click on 'Process'",
accept_multiple_files=True
)

if st.button("Process"):
with st.spinner("Processing"):
IsFilesSaved = saveUploadedFiles(pdf_docs)
if IsFilesSaved == "Done":
IsVectorized = doVectorization()
if IsVectorized == "Done":
st.session_state.isPdfProcessed = "done"
st.success("Done!")
else:
st.error("Error! in vectorization")
else:
st.error("Error! in saving the files")

user_question = st.chat_input("Ask a question about your document(s):")

if user_question is not None and user_question != "":
st.session_state.chat_history.append(HumanMessage(content=user_question))

with st.chat_message("Human"):
st.markdown(user_question)

with st.chat_message("AI"):
with st.spinner("Fetching data ..."):
response = fetchData(user_question)
st.markdown(response)

result = speech_synthesizer.speak_text_async(response).get()
st.session_state.chat_history.append(AIMessage(content=response))

if "WelcomeMessage" not in st.session_state:
st.session_state.WelcomeMessage = WelcomeMessage
result = speech_synthesizer.speak_text_async(WelcomeMessage).get()

if __name__ == '__main__':
main()

Breaking Down the Code

Environment Setup

We start by importing necessary libraries and loading environment variables using dotenv. The .env file :

OPENAI_API_KEY = ""
AZURE_OPENAI_API_KEY = ""
AZURE_SPEECH_KEY = ""
AZURE_SPEECH_REGION = ""

File Upload and Vectorization

The saveUploadedFiles function handles file uploads, saving PDFs to a directory. The doVectorization function uses Llama_Index and FAISS to create a searchable index from the uploaded documents.

Fetching Data

The fetchData function retrieves answers to user questions by querying the vector index.

Voice Synthesis

We use Azure AI Speech service to convert text responses into speech. This involves configuring the speechsdk.SpeechConfig and synthesizing speech with speechsdk.SpeechSynthesizer.

Streamlit Interface

Streamlit makes it easy to build a web interface. The sidebar allows users to upload and process PDFs. The main chat interface displays the conversation and handles user inputs.

Running the App

Run the Streamlit app using:

streamlit run your_script_name.py

Upload your PDF documents, ask questions, and listen as the bot responds with both text and voice.


 

Conclusion

By combining LangChain, LlamaIndex, and Azure AI, we’ve created an intelligent PDF voice chatbot that can make interacting with documents more engaging and accessible. Whether you’re using it for research, studying, or just exploring, this project showcases the potential of modern AI and NLP technologies.

Don’t forget to check out the previous blog post for a text-only version of this chatbot. Happy coding!

No comments:

Post a Comment