As you mentioned, sliding window sticks out as the thing to do. A state during the iteration will look like [stuff on the left] [window of size k] [stuff on the right]. At any iteration, you will also have the length of the encoding. Initialization of this is easy: spend the O(N) time to compute the encoding length for S[k..n-1].
The effect of updating the sliding window (moving to the right by 1) is deleting a character on the right and adding one on the left. Based on the last character of [stuff on the left] and first character of [stuff on the right], you compute the new encoding length (taking into account merging left/right if necessary). I won't present the details, but you can imagine that such a computation is possible in O(1), since we're only making a very local change.
We can keep track of the minimal encoding length throughout this process and return the best answer.
Implementation note: I would probably represent the "stuff on the left/right" as list of (character, count) pairs. Also, to save space, I think it is only necessary to remember one thing (the last pair) on the left.