What programming language you start with really all depends on where you want to go with programming/coding. The great thing about this field is that there are an absolute abundance of smaller fields that you can go into, all using programming in their own unique ways. For web applications, a good start would be with HTML and later moving your way through CSS, JavaScript, JQuery, PHP, SQL, and any of the JavaScript libraries. Ruby is also a popular choice, so I would recommend checking that out too. For more scientific fields or areas with more machine learning and A.I., Python is generally a great place to start as it is widely used in that field of study. C++ is also a very useful language to know for that, but it can be a little more challenging for beginners. For game and application design, languages such as C#, C, Swift, Kotlin, and Java are most often used for that.
Problem description
Let's consider a simple search engine: one that searches over a large list of short, pithy sayings. It can take a 5+ letter string as an input, and it returns any sayings that contain that sequence (ignoring whitespace and punctuation). For example:
Search: jacka
Matches: Jack and Jill went up the hill to fetch a pail of water.
All work and no play makes Jack a dull boy.
The Manchester United Junior Athletic Club (MUJAC) karate team was super good at kicking.
Search: layma
Matches: All work and no play makes Jack a dull boy.
The MUJAC playmaker actually kinda sucked at karate.
Typically, a search engine does not provide an easy way to simply search "everything", especially if it is a private service. Having people get access to all your data generally devalues the usefulness of only showing small bits of it (as a search engine does).
We are going to force this (hypothetical) search engine to give us all of its results, by coming up with just the right inputs such that every one of its sayings is output at least once by all those searches. We will also be minimizing the number of searches we do, so we don't "overload" the search engine.
Formal input/output
The input will be a (possibly very long) list of short sayings, one per line. Each has at least 5 letters.
The output must be a list of 5+ letter search queries. Each saying in the input must match at least one of the output queries. Minimize the number of queries you output.
Sample input
Jack and Jill went up the hill to fetch a pail of water.
All work and no play makes Jack and Jill a dull couple.
The Manchester United Junior Athletic Club (MUJAC) karate team was super good at kicking.
The MUJAC playmaker actually kinda sucked at karate.
Sample output
layma
jacka
There are multiple possible valid outputs. For example, this is another solution:
djill
mujac
Also, while this is technically a valid solution, it is not an optimal one, since it does not have the minimum possible (in this case, 2) search queries:
jacka
allwo
thema
themu
Challenge input
Use this file of 3877 one-line UNIX fortunes: https://raw.githubusercontent.com/fsufitch/dailyprogrammer/master/common/oneliners.txt
Notes
This is a hard problem not just via its tag here on /r/dailyprogrammer; it's in a class of problems that is generally known to computer scientists to be difficult to find efficient solutions to. I picked a "5+ letter" limit on the outputs since it makes brute-forcing hard: 265 = 11,881,376 different combinations, checked against 3,877 lines each is 46 billion comparisons. That serves as a very big challenge. If you would like to make it easier while developing, you could turn the 5 character limit down to fewer -- reducing the number of possible outputs.
Solution
in Perl
my %idx-to-line = 0 .. * Z=> lines».lc.map(-> $line is copy { $line ~~ s:g/ <-[A .. z]> //; $line });
my @parts;
for %idx-to-line.values -> Str $line {
for (0 .. $line.chars - 5).map( -> $offset { $line.substr($offset, 5) }) -> $new_part {
@parts.append($new_part) if $new_part ~~ none @parts;
}
}
my Str @best_parts;
while %idx-to-line {
my %occurences_of;
for %idx-to-line.kv -> Str $idx, Str $line {
for @parts.grep( -> Str $part, { $line.contains($part) }) -> Str $match {
%occurences_of{$match}<count>++;
%occurences_of{$match}<indexes_of_lines>.append($idx);
}
}
my Str $best_match =
%occurences_of.keys.sort({ %occurences_of{$^a}<count> <=> %occurences_of{$^b}<count> })[*-1];
@best_parts.push($best_match);
%idx-to-line{ %occurences_of{$best_match}<indexes_of_lines>.flat }:delete;
}
@best_parts».say;
Let's consider a simple search engine: one that searches over a large list of short, pithy sayings. It can take a 5+ letter string as an input, and it returns any sayings that contain that sequence (ignoring whitespace and punctuation). For example:
Search: jacka
Matches: Jack and Jill went up the hill to fetch a pail of water.
All work and no play makes Jack a dull boy.
The Manchester United Junior Athletic Club (MUJAC) karate team was super good at kicking.
Search: layma
Matches: All work and no play makes Jack a dull boy.
The MUJAC playmaker actually kinda sucked at karate.
Typically, a search engine does not provide an easy way to simply search "everything", especially if it is a private service. Having people get access to all your data generally devalues the usefulness of only showing small bits of it (as a search engine does).
We are going to force this (hypothetical) search engine to give us all of its results, by coming up with just the right inputs such that every one of its sayings is output at least once by all those searches. We will also be minimizing the number of searches we do, so we don't "overload" the search engine.
Formal input/output
The input will be a (possibly very long) list of short sayings, one per line. Each has at least 5 letters.
The output must be a list of 5+ letter search queries. Each saying in the input must match at least one of the output queries. Minimize the number of queries you output.
Sample input
Jack and Jill went up the hill to fetch a pail of water.
All work and no play makes Jack and Jill a dull couple.
The Manchester United Junior Athletic Club (MUJAC) karate team was super good at kicking.
The MUJAC playmaker actually kinda sucked at karate.
Sample output
layma
jacka
There are multiple possible valid outputs. For example, this is another solution:
djill
mujac
Also, while this is technically a valid solution, it is not an optimal one, since it does not have the minimum possible (in this case, 2) search queries:
jacka
allwo
thema
themu
Challenge input
Use this file of 3877 one-line UNIX fortunes: https://raw.githubusercontent.com/fsufitch/dailyprogrammer/master/common/oneliners.txt
Notes
This is a hard problem not just via its tag here on /r/dailyprogrammer; it's in a class of problems that is generally known to computer scientists to be difficult to find efficient solutions to. I picked a "5+ letter" limit on the outputs since it makes brute-forcing hard: 265 = 11,881,376 different combinations, checked against 3,877 lines each is 46 billion comparisons. That serves as a very big challenge. If you would like to make it easier while developing, you could turn the 5 character limit down to fewer -- reducing the number of possible outputs.
Solution
in Perl
my %idx-to-line = 0 .. * Z=> lines».lc.map(-> $line is copy { $line ~~ s:g/ <-[A .. z]> //; $line });
my @parts;
for %idx-to-line.values -> Str $line {
for (0 .. $line.chars - 5).map( -> $offset { $line.substr($offset, 5) }) -> $new_part {
@parts.append($new_part) if $new_part ~~ none @parts;
}
}
my Str @best_parts;
while %idx-to-line {
my %occurences_of;
for %idx-to-line.kv -> Str $idx, Str $line {
for @parts.grep( -> Str $part, { $line.contains($part) }) -> Str $match {
%occurences_of{$match}<count>++;
%occurences_of{$match}<indexes_of_lines>.append($idx);
}
}
my Str $best_match =
%occurences_of.keys.sort({ %occurences_of{$^a}<count> <=> %occurences_of{$^b}<count> })[*-1];
@best_parts.push($best_match);
%idx-to-line{ %occurences_of{$best_match}<indexes_of_lines>.flat }:delete;
}
@best_parts».say;
Comments
Post a Comment