Search of 17clues phase 1 
http://pagespersoorange.fr/gpenet/SK17/PH1/ph1.htm 
The search of 17 clues Sudoku “as of blue” in solution grids can be divided in four sub tasks
Produce a pseudo entry file band1+band2 + relevant attached bands3 Do preliminary tasks UA generation Valid bands expansion …. Find Bands 1+2 of interest Search band 3 in bands 1+2 having potential
A good understanding of the phase 1 of the process (Find Bands 1+2 of interest) is a prerequisite to analyze the preliminary tasks.
Core treatment principles
The key idea in the blue search of 17 clues puzzles is to combine first a valid solution in band 1 against the gangster band 1 to a valid solution in band 2 against the gangster band 2 If this pair gives a valid solution in bands 1+2 against the gangster 1+2, then the relevant set of bands 3 is processed.
To be closer to the code, this can be expressed in a slightly different way.
The process combines a X valid solution of “x given” in the “band a” against the gangster of this band To a Y valid solution of “y given” in the “band b” against the gangster of this band
In the process, we have x<=y and with the 665 distribution, x=5, y=6. In the search of 17 clues with band or stack having more than 6 clues, the condition is x+y < =10
Generally speaking, X will be once band 1 and once band 2. In few cases, the minimum number of given to solve a band gangster is 6. The corresponding band can not be X in this search 566 656 And if (very few cases) both bands have a minimum of 6 clues than the pair of bands can be skipped.
The minimum for x,y has been studied in this thread http://forum.enjoysudoku.com/bandsandlowcluepuzzlest30218135.html
We can see in the table that the minimum is 2, with many of the lexically minimal band 1 having a minimum of 3 or 4 clues.
All 5 clues valid solutions and 6 clues valid solutions have redundant clues (in fact a majority of them). I tried to use this to speed up the process, but without any success, so, in line with blue’s code, our set of valid solution has the redundant clues required to reach the count “x” or “y”.
Performance issue
A) The ratio valid bands 1+2 / number of XY is very low (usually some 1/1000 in the tests done) and the brute force very costly. B) The average number of XY is a huge number 10 000 x 100 000 as order of magnitude.
For both reason, going from a given XY to a valid XY is a critical code. As we will see, the process uses many tables , and mainly, in this part of the code,
To define the list of XY To clear XY keeping untouched one or more UAs of the gangster 2 bands
This induce another issue, seen but not clearly identified by blue in his version of the code. We have a potential huge cache effect.
UAs bands 1+2 filter
This is by far the most important filter A XY must hit all UAs in bands 1+2. As we are in the critical code, this is checked in the simplest possible way, very similar to what Gary Mc Guire did in his code. Vectors of UAs not hit (one bit per UA) are built for each X and each Y. (Vx,Vy)
If Vx&Vy is not null, the XY is discarded.
The Vx and Vy vectors are built at specific moments in the process (see below). In between, some more UAs appear when a brute force (bands 1+2) shows a multiple solution. These UAs can be used as filters as well, but they are not immediately in “vector xy mode”. They are stored in 2 FIFO tables and the check is done later, before the brute force validation.
Stack control
This is specific to the 665 distribution, No stack can pass the count of 6. The count per stack is prepared for each X and each Y in a 64 bits field. Adding these 2 field gives the sum in one instruction. This control can be done just after the UAs vector filter.
Steps with 4 fixed cells, 5 fixed cells (2X2Y 2X3Y)
The process for this first phase (valid XY selection) will be divided in chunks to optimize the cache effect. A primary division has been done to optimize the UAs filter.
In the expansion of a band to produce valid X or valid Y, the first cells are taken out of the smaller UAs. The files are by construction sorted on these first cells, and, by construction again, we have a limited number of “first cells” .
Here, we have an external double loop defining a “step” as the process of all X;Y having the same 2 first cells in X and 2/3 first cells in Y;
The corresponding index is worked out during bands expansion. In the 665 distribution, we always apply this split. In the search of 17 clues with one band having >6 clues, it could be better if the number of X is small to use a split on one cell for X (1X2Y); This is why the corresponding index is worked out. It is not in use here. Reversely, when a step 2x2 would lead to a big nX x nY count, 3 fixed cells are used for “Y”.
A step has a penalty (shrink of the UA tables and vectors building), and a bonus (fresh UAs are used in vector mode). The program apply a 3y step when the nX x nY count is over 2000000 (LIM3Y #define).
Step and UAs; socket UAs shrink . With 4/5 known cells, many UAs are hit. For each step, the tables of UAs Guas are shrinked and this is the place where the UAs cells vectors are built. We have also by construction subsets of the X tables and of the Y tables belonging to this step. UAs hit by the first 2 cells (X 2 cells or Y 2/3 cells) are ignored. The vectors are applied to these X and Y to produce the Vx and Vy. The process is limited to 512 bands 1+2 UAs., In most cases, at the start we have around 64 UAs for a step. Later after addition of UAs seen at the check of an XY passing all filters, the limit of 512 can be reached.
A similar process is done for socket UAs, They are here stored (shrink GUAs) in a unique table with an “index vector” giving the location of the UAs per socket. We have to keep many more socket UAs . A limit per socket specific to the step tries to keep the global table at a reasonable size;
We have here the point where UAs found downstream (during the validation of an XY for example) can not contribute to the vector filter. The UAs of interest can be added to the primary tables, but they will be available for the vector process in the next step only.
In between, some FIFO tables empty at the start of a step, will store the last (#define G17MORESIZE 32) UAS in 2 different tables, one for small UAs, the other one for bigger UAs.
A similar effect will come in socket UAs if an attempt is done later to add the missing ones. This has been tested but nothing good came out.
In a step, we have usually 2 runs, one for the 566 distribution, one for the 656 distribution. The code using the “revered entry file” avoiding the 665 case has only the 656 distribution
Chunks
A step still has too many X and too many Y to optimize the cache effect. The chunk division is strictly a cache optimization split. There is nothing special in the process for a chunk.
A chunk will process a subgroup of “X” with a subgroup of “Y”. In the step loop, “X” is the outer table and “Y” the inner table.
A “X” subgroup will appear once only (per run 566;656) . Some preparation task can be done in the chunk split outer loop.
The chunk size is controlled by #define G17CHKX 256 #define G17CHKY 256
More tests are needed to see where is the right value.
2 Chunk critical loop split in chunks
