[RFC] real random and dedup

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[RFC] real random and dedup

Yuval Kashtan
Hello,
attached is a proposal for 2 new Write Data Patterns
1) 'Real' Random data - random data is generated before each write.
2) Random data with DeDup control - similar to the above, with the addition of control over the amount of time each random data will be written (to control dedup rates)

We had a problem with the existing Full random Data pattern, because eventually it was repeating itself (for 4k block storage device this happens after 64GB).
For instance for 64TB we get 1:1,000 de-dupe rate.
To solve this, I had to add the MT random number generator (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html). because windows rand() generates numbers between 0 - 2^15 (32,768)
Then generate new data before each write.
I also had to take into account the transfer request size and fill it correctly, in accordance with the storage block size (to avoid dedup)

the patch is sent only as RFC, because currently it overrides the existing full random and repeating bytes (respectively) implementations.

please comment
:)

TIA,
Yuval Kashtan


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel

random.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] real random and dedup

Fabian Tillier
Hi Yuval,
 
Have you looked at the other random number generators available throught the <random> header?
 
Cheers,
-Fab
2012/3/19 Yuval Kashtan <[hidden email]>
Hello,
attached is a proposal for 2 new Write Data Patterns
1) 'Real' Random data - random data is generated before each write.
2) Random data with DeDup control - similar to the above, with the addition of control over the amount of time each random data will be written (to control dedup rates)

We had a problem with the existing Full random Data pattern, because eventually it was repeating itself (for 4k block storage device this happens after 64GB).
For instance for 64TB we get 1:1,000 de-dupe rate.
To solve this, I had to add the MT random number generator (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html). because windows rand() generates numbers between 0 - 2^15 (32,768)
Then generate new data before each write.
I also had to take into account the transfer request size and fill it correctly, in accordance with the storage block size (to avoid dedup)

the patch is sent only as RFC, because currently it overrides the existing full random and repeating bytes (respectively) implementations.

please comment
:)

TIA,
Yuval Kashtan


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] real random and dedup

Fabian Tillier
Actually, wouldn't using RtlGenRandom to fill the file buffer give you what you want?
 
 
-Fab

On Mon, Mar 19, 2012 at 12:27 PM, Fabian Tillier <[hidden email]> wrote:
Hi Yuval,
 
Have you looked at the other random number generators available throught the <random> header?
 
Cheers,
-Fab
2012/3/19 Yuval Kashtan <[hidden email]>
Hello,
attached is a proposal for 2 new Write Data Patterns
1) 'Real' Random data - random data is generated before each write.
2) Random data with DeDup control - similar to the above, with the addition of control over the amount of time each random data will be written (to control dedup rates)

We had a problem with the existing Full random Data pattern, because eventually it was repeating itself (for 4k block storage device this happens after 64GB).
For instance for 64TB we get 1:1,000 de-dupe rate.
To solve this, I had to add the MT random number generator (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html). because windows rand() generates numbers between 0 - 2^15 (32,768)
Then generate new data before each write.
I also had to take into account the transfer request size and fill it correctly, in accordance with the storage block size (to avoid dedup)

the patch is sent only as RFC, because currently it overrides the existing full random and repeating bytes (respectively) implementations.

please comment
:)

TIA,
Yuval Kashtan


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel




------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [RFC] real random and dedup

Yuval Kashtan
MS advise is to use CryptGenRandom instead of RtlGenRandom

In any case - The advantage of the MT implementation is that it is very fast, produce a very wide range of random numbers and is cross-platform, so it can be used for platforms other than MS as well.

but this is just a proposal. if needed, any good random number generator can be used.

Sincerely,
Yuval Kashtan


On Tue, Mar 20, 2012 at 02:57, Fabian Tillier <[hidden email]> wrote:
Actually, wouldn't using RtlGenRandom to fill the file buffer give you what you want?
 
 
-Fab

On Mon, Mar 19, 2012 at 12:27 PM, Fabian Tillier <[hidden email]> wrote:
Hi Yuval,
 
Have you looked at the other random number generators available throught the <random> header?
 
Cheers,
-Fab
2012/3/19 Yuval Kashtan <[hidden email]>
Hello,
attached is a proposal for 2 new Write Data Patterns
1) 'Real' Random data - random data is generated before each write.
2) Random data with DeDup control - similar to the above, with the addition of control over the amount of time each random data will be written (to control dedup rates)

We had a problem with the existing Full random Data pattern, because eventually it was repeating itself (for 4k block storage device this happens after 64GB).
For instance for 64TB we get 1:1,000 de-dupe rate.
To solve this, I had to add the MT random number generator (http://www.math.sci.​​hiroshima-u.ac.jp/~m-mat/MT/​​emt.html). because windows rand() generates numbers between 0 - 2^15 (32,768)
Then generate new data before each write.
I also had to take into account the transfer request size and fill it correctly, in accordance with the storage block size (to avoid dedup)

the patch is sent only as RFC, because currently it overrides the existing full random and repeating bytes (respectively) implementations.

please comment
:)

TIA,
Yuval Kashtan


------------------------------​​-----------------------------​-​------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-​​msazure
______________________________​​_________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/​​lists/listinfo/iometer-devel





------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Iometer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/iometer-devel
Loading...