Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Different continuation after restoring state #888

Open
TomoJu opened this issue Aug 2, 2024 · 1 comment
Open

[BUG]: Different continuation after restoring state #888

TomoJu opened this issue Aug 2, 2024 · 1 comment

Comments

@TomoJu
Copy link

TomoJu commented Aug 2, 2024

Description

Hi,
I'm developing a client-application with Llamasharp and .NET 8 and found a behavior that does not meet my expectations.
When I run a session with two identical questions I get always the same answers for these two questions. This is what I expect when I use the same parameters and especially the same seed.
But when I first ask question 1, save the session state, create a new session, load the saved state and ask question 2 I do net get the same answer like in the first step. I would expect that I get the same result after saving/restoring the session with same questions and same parameters.
I modified the LoadAndSaveState-Sample to show this behavior.

I used llama2 7b q5km chat and llama2 13b q5km chat that for my tests

Regards
Tomo

Reproduction Steps

using LLama.Common;

namespace LLama.Examples.Examples
{
// This example shows how to save/load state of the executor.
public class LoadAndSaveState
{
public static async Task Run()
{
string modelPath = @"...llama-2-7b-chat.Q5_K_M.gguf";

        var promptSea = "What is the color of the sea?";
        var promptSky = "What is the color of the sky?";

        var parameters = new ModelParams(modelPath)
        {
            Seed = 1337,
            GpuLayerCount = 20
        };

        var inferenceParams = new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } };

        //Both questions in one session
        using (var model1 = await LLamaWeights.LoadFromFileAsync(parameters))
        {
            using (var context1 = model1.CreateContext(parameters))
            {
                var ex1 = new InteractiveExecutor(context1);

                Console.WriteLine(promptSea);

                await foreach (var text in ex1.InferAsync(promptSea, inferenceParams))
                {
                    Console.Write(text);
                }

                Console.WriteLine();

                Console.WriteLine(promptSky);

                //Answer for Question 2.
                await foreach (var text in ex1.InferAsync(promptSky, inferenceParams))
                {
                    Console.Write(text);
                }

                Console.WriteLine();
            }
        }

        var modelStatePath = @"...modelState.bin";

        var executorStatePath = @"...executorState.bin";

        //Only question 1 and save state
        using (var model2 = await LLamaWeights.LoadFromFileAsync(parameters))
        {
            using (var context2 = model2.CreateContext(parameters))
            {
                var ex2 = new InteractiveExecutor(context2);

                Console.WriteLine(promptSea);

                await foreach (var text in ex2.InferAsync(promptSea, inferenceParams))
                {
                    Console.Write(text);
                }

                Console.WriteLine();

                ex2.Context.SaveState(modelStatePath);
                await ex2.SaveState(executorStatePath);
            }
        }

        //Load state and question 2. Answer is not the same like in both questions in one session
        using (var model3 = await LLamaWeights.LoadFromFileAsync(parameters))
        {
            using (var context3 = model3.CreateContext(parameters))
            {
                var ex3 = new InteractiveExecutor(context3);

                var ctx3 = ex3.Context;
                ctx3.LoadState(modelStatePath);
                ex3 = new InteractiveExecutor(ctx3);
                await ex3.LoadState(executorStatePath);

                Console.WriteLine(promptSky);

                //Answer for Question 2. 
                await foreach (var text in ex3.InferAsync(promptSky, inferenceParams))
                {
                    Console.Write(text);
                }
            }
        }
    }
}

}

Environment & Configuration

  • Operating system: Windows 10
  • .NET runtime version: 8.0
  • LLamaSharp version: 0.14.0
  • CUDA version (if you are using cuda backend): 12.3
  • CPU & GPU device: Intel I7 64 GB RAM, NVIDIA GeForce RTX 2070

Known Workarounds

None

@martindevans
Copy link
Member

Unfortunately I think this is expected behaviour, llama.cpp itself is not entirely deterministic (even with a fixed seed etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants