I was finally able to set it up for my local llama.cpp server. It was just a bit difficult to find instructions for my use case, how to set it up for my local llama.cpp server. I had to choose "Local model" not "OpenAI compatible (custom)". Then things worked better, the chat results were shown with proper formatting.
So, please add some instructions for us llama.cpp server users.
Also it takes a bit too long (~40 sec) for the extension to connect to llama.cpp server and show the models selection dialog, when VS is launched. It shows my model, but it says, it's not loaded. Not sure if that's a bug.
I was finally able to set it up for my local llama.cpp server. It was just a bit difficult to find instructions for my use case, how to set it up for my local llama.cpp server. I had to choose "Local model" not "OpenAI compatible (custom)". Then things worked better, the chat results were shown with proper formatting.
So, please add some instructions for us llama.cpp server users.
Also it takes a bit too long (~40 sec) for the extension to connect to llama.cpp server and show the models selection dialog, when VS is launched. It shows my model, but it says, it's not loaded. Not sure if that's a bug.