Skip to content

Tasksync token based billing benefits #98

@benjiro29

Description

@benjiro29

This project now only works for the old request-based system. Unfortunately, Taskysync is deprecated due to recent changes in Copilot billing: GitHub Copilot is moving to usage-based billing

I like to point out that this is actually the absolute opposite of reality.

Tasksync its benefit of keeping a prompt session alive means that you keep the LLM server kv cache alive. So when you are prompting changes that involve data that is cached, tasksync saves a TON of input token usage.

Yesterday did a test by running opencode go via Visual Studio Code vs Visual Studio Code + Tasksync. As OpenCode logs all the input/output and cache utilization, it allows for clear logging.

Testing basic prompt, conclude, prompt conclude agentic programming shows.

First step:

  • Loading the instructions
  • Loading the harness
  • Loading any agent or other .md files
  • Loading the prompt

This alone results in a 25k tokens on every prompt.

Second step:

  • Loading files / data relevant to the prompt

Third step:

  • Loading thinking data
  • Loading files / data relevant to the thinking

Repeat steps doing the work:

  • ....

At this point, we already reached 75k content being loaded.

Ending steps:

  • Task finalized ...

Now repeat the above steps again for

  • Prompt 2
  • Prompt 3
  • ...

With Tasksync active we get the same activity for the first Prompt ...

But prompt 2

  • Loads the new prompt information
  • Gets 25k data but this is already cached. Serves from the 10x cheaper cache.

Second step

  • Loads data / files ... Most of it is already cached. Again serve from 10x cheaper cache. What used to be a 500 to 3000 token request, now turns into a 40 to 100 token request

Third set ...


Checking the logs of OpenCode Go shows this effect ...

  • A 40 min session with Tasksync active, shows 50, 100, 200, maybe a 1000 token input. With barely any billing effects.
  • A 5 minute session without Tasksync showed around 8x the cost. What was easily traced back to a large amount of input tokens as each prompt rebuild all the cache, again, and again ...

This is also the main reason why some people at Copilot have those insane projected token cost bills. I have had multiple days of 3x 5h hour session days (until hitting the limit) with GPT 5.5+Tasksync. The actual reported token usage was between $4 and $6. Because Tasksync kept the content on the server cache layer. So instead of spending $5 on input, this resulted in $0.5 on the same input. And while technically Tasksync is slightly heavier on output tokens, this is offset by avoiding the initial output tokens that happen with new prompt requests anyway.

If you develop agentic, you tend to focus on specific code ... Your not developing code all over your code project. As a result, you tend to hit the same files over and over again until something works or is visually how you want it to be.

So i disagree wholeheartedly that this project is deprecated because of the GH changes.

In fact, its even more powerful then before with Copilot their new billing. It does not even stop here, it works with OpenCode Go and probably any other token based billing provider.

Is this a issue for Microsoft or any providers? I doubt it, as more people hitting the cache, results in a lot more free capacity on their servers. The only providers that may be unhappy, are those that act as pure resellers of LLM models and earn a fixed margin.

You can try the same tests, and i think you will concur with me, that TaskSync has a new life ahead of itself,. Not as a sneaky way to avoid Premium Prompts but as a token saver tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions