Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normal distribution: generation algorithm #2189

Closed
arthurits opened this issue Oct 22, 2022 · 9 comments · Fixed by #2206 or #2209
Closed

Normal distribution: generation algorithm #2189

arthurits opened this issue Oct 22, 2022 · 9 comments · Fixed by #2206 or #2209

Comments

@arthurits
Copy link
Contributor

I've been using functions RandomNormal from DataGen.cs and Population from Population.cs in order to visualize the distribution of a given variable

public Population(Random rand, int pointCount, double mean = .5, double stdDev = .5)
{
values = DataGen.RandomNormal(rand, pointCount, mean, stdDev);
Recalculate();
}

I'm no expert whatsoever in statistical algorithms, but it seems that both functions make use of RandomNormalValue where some kind of Box-Muller algorithm is implemented to generate gaussian numbers:

public static double RandomNormalValue(Random rand, double mean, double stdDev, double maxSdMultiple = 10)
{
while (true)
{
double u1 = 1.0 - rand.NextDouble();
double u2 = 1.0 - rand.NextDouble();

However, the gaussian data generated seems to be sistematically underestimated as shown below, where the grey vertical line represents the mean and the bell curve appears to be slightly offsetted to the left.
image

There's an article in Wikipedia where the Box-Mueller algorithm is explained. It states that
Suppose U1 and U2 are independent samples chosen from the uniform distribution on the unit interval (0, 1).

Since Random.NextDouble() does in fact return the value 0, the current computation of both u1 and u2 will eventually return some 1s.

Therefore, I've been using the following modification of RandomNormalValue:

public double SampleGaussian(Random random, double mean, double stdDev)
    {
        double u1 = NextDouble(random);
        double u2 = NextDouble(random);

        double y1 = Math.Sqrt(-2.0 * Math.Log(u1)) * Math.Sin(2.0 * Math.PI * u2);    // Math.Cos is also fine
        return mean + stdDev * y1;

        double NextDouble(Random random)
        {
            return ((double)random.Next(1, Int32.MaxValue)) / Int32.MaxValue;    // random.Next includes 1 and exludes Int32MaxValue
        }
    }

This modification provides a more visually mean-centered gaussian curve:
image

@bclehmann
Copy link
Member

bclehmann commented Oct 26, 2022

I wonder if we should be using Double.MaxValue? It may have worse performance because it'll return a subnormal, and we might well just end up getting it rounded to zero, but Int32.MaxValue is really rather small as far as doubles are concerned.

It might need some experimentation to see quite how large we can go without getting floating pointed, it may be quite a ways under Double.MaxValue. But my intuition says it's much much larger than Int32.MaxValue

@arthurits
Copy link
Contributor Author

arthurits commented Oct 28, 2022

I wonder if we should be using Double.MaxValue?

I believe the problem could be that there is no Random method returning values between 1 and Double.MaxValue.
How about using Int64.MaxValue instead along with a call to Random.NextInt64?

Edit: After some quick tests using Int64.MaxValue, the results are worse (the gaussian curve is still offsetted to the left) than using Int32.MaxValue.

@bclehmann
Copy link
Member

I'd value your feedback on #2206, it's simpler and seems to be faster, but I'd like some reassurance that it gives the results you think it should.

It looks like it should be correct to me, but I'd appreciate another set of eyes.

@arthurits
Copy link
Contributor Author

Sure!
It looks great and straightforward. I left a minor observation in #2206 for your consideration.

@bclehmann
Copy link
Member

I left a minor observation in #2206 for your consideration.

Hey I can't see anything, did it maybe fail to send?

@swharden
Copy link
Member

I left a minor observation in #2206 for your consideration.

Hey I can't see anything, did it maybe fail to send?

Thanks again for everyone's input on this! I think it's in an okay state and merged it, but if there was an unresolved concern/idea we can open it up in a new PR 👍

@arthurits
Copy link
Contributor Author

arthurits commented Oct 31, 2022

Hey I can't see anything, did it maybe fail to send?

I'm not sure what happened but I probably messed something, although I can see it:
image

The point was that there is no need for this:

return 1.0 - subtrahend;

since subtrahend already returns values in the open interval (0, 1). So, it would be even more straightforward to simply return subtrahend:

return subtrahend;

@bclehmann
Copy link
Member

Ah, since it says "Pending" it means it was part of a review, and it looks like you didn't complete the review (there's a menu with "Approve", "Request Changes", and "Reject" that you have to click on). Alternatively, you can leave a comment without starting a review, and then the comment will be applied immediately.

At any rate thank you for noticing that, I guess I was a little bit on autopilot :/

@swharden swharden linked a pull request Oct 31, 2022 that will close this issue
@arthurits
Copy link
Contributor Author

My bad @bclehmann! I totally messed the whole comments/review options. I'll pay more attention the next time.
Anyways, it's good to see that @swharden already merged it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants