What programming language you start with really all depends on where you want to go with programming/coding. The great thing about this field is that there are an absolute abundance of smaller fields that you can go into, all using programming in their own unique ways. For web applications, a good start would be with HTML and later moving your way through CSS, JavaScript, JQuery, PHP, SQL, and any of the JavaScript libraries. Ruby is also a popular choice, so I would recommend checking that out too. For more scientific fields or areas with more machine learning and A.I., Python is generally a great place to start as it is widely used in that field of study. C++ is also a very useful language to know for that, but it can be a little more challenging for beginners. For game and application design, languages such as C#, C, Swift, Kotlin, and Java are most often used for that.
Description
Working with very large data sets is an increasingly common activity in efforts such as web analytics and Internet advertising. Efficiently keeping track of values when you have around 264 possible values is the challenge.
Today's challenge is to read a steady stream of distinct values and report on the first one that recurs. Your program should be able to run an arbitrary number of times with distinct, infinite sequences of input and yield the probabilisticly correct value.
Data source
I spent a good chunk of my morning trying to find a stream of random values for you to consume. I could not find one (e.g. a PRNG as a service) so I decided to use a local PRNG implementation.
For this challenge, please use the following random number generator based on the Isaac design.
https://github.com/dkull/Isaac-CSPRNG/blob/master/Isaac.py
The above code expects a maximum integer passed to the rand() method, and for the purposes of this challenge set it to sys.maxsize. Then emit a steady stream of numbers and use your program to detect the first recurring value.
import sys
import Isaac
i = Isaac.Isaac(noblock=False)
while True:
print(i.rand(sys.maxsize))
Notes
This piece may prove a useful start: PROBABILISTIC DATA STRUCTURES FOR WEB ANALYTICS AND DATA MINING.
Edited to Add
A concrete solution is unlikely to be found since you are sifting through up to 264 possible values. As such, a probabilistically correct solution is adequate. Just no guessing. If you're writing your own PRNG or calling rand(), you're doing this one wrong. Run the above Python code and read the values, that PRNG was chosen because it should stress your program. Don't use your own calls to your PRNG. If you're using a built-in tree, map, or set implementation you're doing this one wrong - it'll blow up.
I developed this challenge because I've been interested in some data science challenges since someone asked for more practical, real world type of challenges. This is a challenge you'd run into in the real world in a variety of fields
Solution
in Java
public class BigData implements Runnable {
public boolean stop = false;
Random r = new Random();
Thread main = new Thread(this);
public static void main(String[] args) {
new BigData().main.start();
}
@Override public void run() {
Map<Integer, Integer> storage = new TreeMap<Integer, Integer>();
Integer number = 0, keyNumber = 0;
do {
keyNumber++;
number = r.nextInt((int)Math.pow(2, 64));
if(storage.containsValue(number))
stop = true;
else
storage.put(keyNumber, number);
} while(!stop);
System.out.println(number);
}
}
Working with very large data sets is an increasingly common activity in efforts such as web analytics and Internet advertising. Efficiently keeping track of values when you have around 264 possible values is the challenge.
Today's challenge is to read a steady stream of distinct values and report on the first one that recurs. Your program should be able to run an arbitrary number of times with distinct, infinite sequences of input and yield the probabilisticly correct value.
Data source
I spent a good chunk of my morning trying to find a stream of random values for you to consume. I could not find one (e.g. a PRNG as a service) so I decided to use a local PRNG implementation.
For this challenge, please use the following random number generator based on the Isaac design.
https://github.com/dkull/Isaac-CSPRNG/blob/master/Isaac.py
The above code expects a maximum integer passed to the rand() method, and for the purposes of this challenge set it to sys.maxsize. Then emit a steady stream of numbers and use your program to detect the first recurring value.
import sys
import Isaac
i = Isaac.Isaac(noblock=False)
while True:
print(i.rand(sys.maxsize))
Notes
This piece may prove a useful start: PROBABILISTIC DATA STRUCTURES FOR WEB ANALYTICS AND DATA MINING.
Edited to Add
A concrete solution is unlikely to be found since you are sifting through up to 264 possible values. As such, a probabilistically correct solution is adequate. Just no guessing. If you're writing your own PRNG or calling rand(), you're doing this one wrong. Run the above Python code and read the values, that PRNG was chosen because it should stress your program. Don't use your own calls to your PRNG. If you're using a built-in tree, map, or set implementation you're doing this one wrong - it'll blow up.
I developed this challenge because I've been interested in some data science challenges since someone asked for more practical, real world type of challenges. This is a challenge you'd run into in the real world in a variety of fields
Solution
in Java
public class BigData implements Runnable {
public boolean stop = false;
Random r = new Random();
Thread main = new Thread(this);
public static void main(String[] args) {
new BigData().main.start();
}
@Override public void run() {
Map<Integer, Integer> storage = new TreeMap<Integer, Integer>();
Integer number = 0, keyNumber = 0;
do {
keyNumber++;
number = r.nextInt((int)Math.pow(2, 64));
if(storage.containsValue(number))
stop = true;
else
storage.put(keyNumber, number);
} while(!stop);
System.out.println(number);
}
}
Comments
Post a Comment