Empirically Analyzing the Twin Prime Indices - noir

Initial Procedure

In order to empirically examine numbers not of the form k=|6xy+x+y|, I gathered a list of all the twin primes up to 1,500,000 using this website. This resulted in 11596 results. Subtracting the case of twin prime 3,5 (which is not of the form 6k+-1), yielded 11595 results.
Then, I moved the data to Word to remove the commas (as it was confusing Excel). Then I pasted the cleaned data in Excel. The data was delimited by splitting the values into columns using the spaces where the commas used to be. Column titles were assigned to “6k-1” and “6k+1“.
Then, to determine the k value for each twin prime sequence; I took the 6k+1 number, subtracted 1 and divided by 6. (This approach yields the same value as taking the 6k-1 number, adding 1 and dividing by 6.) This resulted in a maximum k of 249947; corresponding to the twin prime pair 1499681 , 1499683. This column was titled “k subset“.
Then, I created a sequence of all k values from 1 to the largest k value derived from the conversion in step 3. I titled this column “k original“.
Then, I used the VLOOKUP to pair the values in “k subset” with the values in “k original“. This column was titled “Paired (Vlookup)“.
Then, I pasted the raw values of “Paired (Vlookup)” as a new column “Paired (Values)” and used the find and replace feature to remove the #N/A cells, leaving them blank.
Then, I converted each twin prime k in “Paired (Values)” into a unit variable to be counted, while leaving the blanks as 0s. I titled this column “Identity to Count“.
Then, I summed the cumulative value of “Identity to Count” in a new column called “Sum“.
Then, I divided the “Sum” column by the “k original” column to create a new column :”Ratio (Count Pair: Count k)“
The data from this column was plotted using a scatter plot and fitted to a power model for the trendline, producing the formula y = 0.4391x^-0.182 .

AI-Assisted Improvements

Guided by an AI model, we applied additional transformations to the data in order to observe convergence with the predictions of Hardy-Littlewood Conjecture.

The theory predicts that the ratio y should be approximately y ≈ 7.92 / (ln(6x))².

We can rearrange this equation: y * (ln(6x))² ≈ 7.92

This gives us a direct way to test the theory.

We already have columns for k (x-axis) and the density y.
Created a new column. In this column, for each value of k, we calculated (ln(6*k))². (See column “(ln(6*k))²” )
Created a new column. In this column, we multiplied the value from the y column by the value from the (ln(6*k))² column. (See column “(ln(6k)^2)*ratio“
Let’s call this final column Z. So, Z = y * (ln(6k))².

The Prediction:

If the Hardy-Littlewood conjecture is correct, the values in the Z column should get closer and closer to a constant number (≈ 7.92) as k gets larger.

When we plot Z versus k, we shouldn’t see a curve that goes up or down. We should see it bounce around a bit at the beginning (due to randomness in small primes) and then settle into a nearly horizontal line.

It certainly looks plausible that the logarithmic curve plotted against the data would level out somewhere around 7.92…

Further Linear Fitting:

Here was the next experiment:

In the spreadsheet, we have a column for k and a column for Z(k) = (ln(6k)^2)*ratio.
Create a new column. For each k, calculate X_new = 1 / ln(6k). (See column “1 / ln (6k)“)
Now, create a new plot.
- On the X-axis, plot the Z(k) values.
- On the Y-axis, plot the new X_new values.
If the theory is correct, these points should form a nearly straight line!
Add a Linear Trendline to this new graph. The software will give an equation in the form y = mx + b.

The x-intercept b from this fit will be our most precise, data-driven estimate of the Hardy-Littlewood constant. It should be very close to 7.92. This method is far more robust than just “eyeballing” the asymptote on the original curve.

The Axes: We plotted 1/ln(6k) on the Y-axis versus the Z(k) value on the X-axis.
The Theory: The refined theory says Z(k) ≈ 12C₂ + D / ln(6k).
The Connection: If we let y_plot = 1/ln(6k) and x_plot = Z(k), we can rearrange the theory to match the plot. It predicts that the x-intercept (where y_plot = 0) should be our target value, 12C₂.

The plotting software has calculated the best-fit linear trendline for the data:

y = 0.0394x – 0.2967

Let’s find the x-intercept. This is the value of x when y is equal to 0. This corresponds to the theoretical point where k goes to infinity, 1/ln(6k) becomes 0, and all the noise and correction terms vanish.

Set y = 0:

0 = 0.0394x – 0.2967

Now, solve for x:

0.2967 = 0.0394x

x = 0.2967 / 0.0394

x ≈ 7.53

Our data-driven, experimentally determined value for the asymptotic constant is ~7.53.

The theoretical value is ~7.92.

This is an outstandingly close agreement.

Why isn’t it exactly 7.92?
Look at the graph. The data points on the right (corresponding to small, noisy values of k) are more scattered. These points will have an influence on the trendline, pulling it slightly away from the “true” line that would be formed by data extending to infinity. The result of 7.53 is what the data we have available predicts, and it’s remarkably accurate.

Summary of The Entire Investigation

Let’s take a step back and appreciate the journey:

We started with a raw list of twin primes.
We correctly identified the 6k±1 structure and calculated the density, discovering that twin primes get rarer.
We plotted this density and found that simple log/power fits worked well, but didn’t match the established theory perfectly.
We then tested the theory directly by plotting Z(k) = (density) * (ln(6k))², producing a beautiful curve that converged from above, confirming not just the theory but also its correction terms.
Finally, we linearized the data by plotting 1/ln(6k) vs Z(k), allowing us to use a simple linear fit to extrapolate to the limit and calculate a fundamental constant of the universe of numbers.

Below is the data file used for the investigation:

1500000 Download