03 July, 2024

✨Building a Smart Image Parser with .NET, Semantic Kernel, and GPT-4o 🧿

Hey there, tech enthusiasts! Today, we’re diving into the magical world of AI, .NET, and the Semantic Kernel. Imagine a world where you provide an image URL, and in just a few lines of code, your program analyzes every tiny detail in that image. Sounds cool, right? Well, buckle up, because we’re about to embark on a fun journey to create exactly that!


The Magic Ingredients

Here’s what we’ll be using:

  • .NET (because we love a sturdy framework)
  • Semantic Kernel (because semantics matter, folks)
  • GPT-4o (the brain behind the operation)

Without further ado, let’s jump into the code snippets and see how we can make this happen.

The Controller — Where the Magic Begins

First, we need to set up our controller. This is where all the action happens. Let’s take a look at the code:


using Microsoft.AspNetCore.Mvc;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace AzureSemanticKernel.AI.Controllers
{
public class OpenAiController : Controller
{
private string open_ai_key = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
private string open_ai_org = Environment.GetEnvironmentVariable("OPEN_AI_ORG_ID");

[HttpPost]
public async Task<string> GetGpt4oImageResponse(string imgUrl)
{
try
{
var kernelBuilder = Kernel.CreateBuilder();

kernelBuilder.AddOpenAIChatCompletion("gpt-4o", open_ai_key, open_ai_org);

var kernel = kernelBuilder.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory();
history.AddSystemMessage("You are a friendly and helpful assistant that responds to questions directly");

var message = new ChatMessageContentItemCollection
{
new TextContent("Can you do a detail analysis and tell me all the minute details that present in this image?"),
new ImageContent(new Uri(imgUrl))
};

history.AddUserMessage(message);

var result = await chat.GetChatMessageContentAsync(history);

return result.Content;
}
catch (Exception ex)
{
return ex.Message;
}
}
}
}

Breaking Down the Magic

Let’s dissect this code and understand what each part does.

1. Namespace and Using Statements:

  • Just the usual suspects. Nothing to see here. Add those packages and move on!

2. Controller Setup:

  • We create an OpenAiController and define our API key and organization ID from environment variables. Remember, hard-coding keys is like leaving your front door open with a "Please Rob Me" sign.

3. GetGpt4oImageResponse Method:

  • This is where the fun begins! We define an asynchronous method to process the image URL.
  • Kernel Creation: We create a kernel builder and add our GPT-4o chat completion service. Think of the kernel as the magical cauldron where all the ingredients mix together.
  • Chat Service: We get the chat service from the kernel. This service is like our friendly neighborhood barista who knows exactly how we like our coffee.
  • Chat History: We initialize the chat history and add a system message to set the tone for our AI assistant. We want our assistant to be friendly and helpful, just like your favorite support agent.
  • User Message: We create a message collection with a text prompt and the image URL. This is like giving the barista your order — “I want a detailed analysis with extra foam, please!”
  • Response Handling: We send the message to the chat service and wait for the response. If all goes well, we return the content. If something goes wrong, we catch the exception and return the error message (because nobody likes an unhandled exception ruining the party).

Configuration — The Secret Sauce

Finally, let’s not forget the configuration setup in Program.cs:

using DotNetEnv;

Env.Load();

This simple line ensures our environment variables are loaded, so our API keys and organization IDs are safely tucked away. It’s like having a secret recipe locked in a vault.

Wrapping Up

And there you have it! In just a few lines of code, we’ve created a powerful image parser that uses .NET, Semantic Kernel, and GPT-4o to analyze images and return detailed descriptions. Now you can impress your friends, family, and maybe even your boss with your new found AI prowess.

Remember, the key to keeping your code exciting is to blend functionality with a dash of humor and a sprinkle of creativity. Happy coding, and may your bugs be few and your features be many! 🎉