You have to build a dodgy graph.

First, all distances in the graph will not be one number, but a pair - the actual path length and the number of turns. When summing the two lengths, add them one by one, and when comparing them, compare the lengths first, and then the number of turns. Theoretically, you can collapse everything back into one number if you count 100000* + . But there can be special effects if the paths are very complex and have more than 100000 turns. Also, beware of overflow.

For each cell, make 4 vertices, which will mean that you are in that cell and "looking" to one of the 4 sides.

Create 4 edges between adjacent directions with length {0, 1} (length 0, but 1 turn). From each vertex also create an edge of length {1, 0} (length - 1, 0 turns) to the vertex in the neighboring cell with the same direction (from the "top" of 4 vertices make an edge to the "top" vertex in the top cell. Similarly for the other three directions).

Now the shortest path in this graph will be what you need. There is also an issue with the initial endpoints. You can add a "central" vertex to the start and end cells, which is connected by edges of length {0, 1} (it is important to make 1 turn, otherwise you can twist in place through this central vertex). Think of this vertex as "looking at the floor." You start in some cell, looking at the floor, and you need to end up in another cell, again looking at the floor. You can turn around in the cage, or move to the cage in front of you.

Since there are already different edge lengths here, traversing the width will no longer look for the shortest distance. But you can use dextra/A*.

The implementation doesn't even need to store all vertices and edges. You just have a vertex described by {x, y, d} instead of {x, y, d}. Instead of trying 4 directions [-1,0],[0,-1],[0,1],[1,0] - you make the transition {x+dx[d],y+dy[d],d} or change d to (d+1)%d and (d+3)%4. And you don't keep a single number in the queues, but a pair of {length, turns}. For A* you still need to estimate the remaining length - well you take the length you already have, and take 1 for the number of turns if the beginning and end have different directions.

By playing with the lengths of the edges and their comparisons you can, for example, allow slightly longer paths if they have a lot fewer turns. (e.g. you can go around the perimeter of the maze than a slightly shorter but zig-zag path inside). To do this, the length of the edge can be length*K+turns.

Since there are 4-5 times as many vertices here, it will be 4-5 times slower.