What programming language you start with really all depends on where you want to go with programming/coding. The great thing about this field is that there are an absolute abundance of smaller fields that you can go into, all using programming in their own unique ways. For web applications, a good start would be with HTML and later moving your way through CSS, JavaScript, JQuery, PHP, SQL, and any of the JavaScript libraries. Ruby is also a popular choice, so I would recommend checking that out too. For more scientific fields or areas with more machine learning and A.I., Python is generally a great place to start as it is widely used in that field of study. C++ is also a very useful language to know for that, but it can be a little more challenging for beginners. For game and application design, languages such as C#, C, Swift, Kotlin, and Java are most often used for that.
Description
In theoretical computer science, the closest string is an NP-hard computational problem, which tries to find the geometrical center of a set of input strings. To understand the word "center", it is necessary to define a distance between two strings. Usually, this problem is studied with the Hamming distance in mind. This center must be one of the input strings.
In bioinformatics, the closest string problem is an intensively studied facet of the problem of finding signals in DNA. In keeping with the bioinformatics utility, we'll use DNA sequences as examples.
Consider the following DNA sequences:
ATCAATATCAA
ATTAAATAACT
AATCCTTAAAC
CTACTTTCTTT
TCCCATCCTTT
ACTTCAATATA
Using the Hamming distance (the number of different characters between two sequences of the same length), the all-pairs distances of the above 6 sequences puts ATTAAATAACT at the center.
Input Description
You'll be given input with the first line an integer N telling you how many lines to read for the input, then that number of lines of strings. All strings will be the same length. Example:
4
CTCCATCACAC
AATATCTACAT
ACATTCTCCAT
CCTCCCCACTC
Output Description
Your program should emit the string from the input that's closest to all of them. Example:
AATATCTACAT
Challenge Input
11
AACACCCTATA
CTTCATCCACA
TTTCAATTTTC
ACAATCAAACC
ATTCTACAACT
ATTCCTTATTC
ACTTCTCTATT
TAAAACTCACC
CTTTTCCCACC
ACCTTTTCTCA
TACCACTACTT
21
ACAAAATCCTATCAAAAACTACCATACCAAT
ACTATACTTCTAATATCATTCATTACACTTT
TTAACTCCCATTATATATTATTAATTTACCC
CCAACATACTAAACTTATTTTTTAACTACCA
TTCTAAACATTACTCCTACACCTACATACCT
ATCATCAATTACCTAATAATTCCCAATTTAT
TCCCTAATCATACCATTTTACACTCAAAAAC
AATTCAAACTTTACACACCCCTCTCATCATC
CTCCATCTTATCATATAATAAACCAAATTTA
AAAAATCCATCATTTTTTAATTCCATTCCTT
CCACTCCAAACACAAAATTATTACAATAACA
ATATTTACTCACACAAACAATTACCATCACA
TTCAAATACAAATCTCAAAATCACCTTATTT
TCCTTTAACAACTTCCCTTATCTATCTATTC
CATCCATCCCAAAACTCTCACACATAACAAC
ATTACTTATACAAAATAACTACTCCCCAATA
TATATTTTAACCACTTACCAAAATCTCTACT
TCTTTTATATCCATAAATCCAACAACTCCTA
CTCTCAAACATATATTTCTATAACTCTTATC
ACAAATAATAAAACATCCATTTCATTCATAA
CACCACCAAACCTTATAATCCCCAACCACAC
Challenge Output
ATTCTACAACT
TTAACTCCCATTATATATTATTAATTTACCC
Solution
in C++
#include <algorithm>
#include <cstdio>
#include <iostream>
#include <numeric>
#include <string>
#include <vector>
namespace {
template<typename C, typename Distance>
Distance hammingDistance(const C& from, const C& to) {
if (from.size() != to.size()) {
throw std::runtime_error("hamming distance: sequences must be equal size");
}
std::vector<Distance> same(from.size());
std::transform(
from.begin(),
from.end(),
to.begin(),
same.begin(),
[](const auto& a, const auto& b) {
return static_cast<Distance>(a != b);
});
return std::accumulate(same.begin(), same.end(), Distance{0});
}
template<typename C, typename Distance>
Distance levenshteinDistance(const C& from, const C& to) {
const auto cols = from.size() + 1;
const auto rows = to.size() + 1;
std::vector<Distance> lattice(rows * cols);
std::iota(lattice.begin(), lattice.begin() + cols, Distance{0});
for (size_t i = 0; i < rows; i++) {
lattice[i * cols] = i;
}
for (auto& cell : lattice) {
if (cell >= 0) {
continue;
}
auto idx = std::distance(lattice.data(), &cell);
auto r = idx / cols;
auto c = idx % cols;
auto ins = lattice[idx - cols] + 1;
auto del = lattice[idx - 1] + 1;
auto sub =
lattice[idx - cols - 1] + static_cast<Distance>(from[c] != to[r]);
cell = std::min({ ins, del, sub });
}
return lattice.back();
}
template<typename C, typename M>
auto totalDistance(const C& from, const std::vector<C>& strings, M metric) {
using Distance = decltype(metric(from, strings.front()));
return std::accumulate(
strings.begin(),
strings.end(),
Distance{0},
[&from,metric](auto total, const auto& to) {
return total + metric(from, to);
});
}
template<typename C, typename M>
const C& center(const std::vector<C>& strings, M metric) {
return *std::min_element(
strings.begin(),
strings.end(),
[&strings,metric](const auto& a, const auto& b) {
return totalDistance(a, strings, metric)
< totalDistance(b, strings, metric);
});
}
}
int main() {
std::vector<std::string> lines;
std::string line;
std::getline(std::cin, line); // skip number
while (std::getline(std::cin, line)) {
lines.emplace_back(std::move(line));
}
std::puts(center(lines, hammingDistance<std::string, int>).c_str());
}
In theoretical computer science, the closest string is an NP-hard computational problem, which tries to find the geometrical center of a set of input strings. To understand the word "center", it is necessary to define a distance between two strings. Usually, this problem is studied with the Hamming distance in mind. This center must be one of the input strings.
In bioinformatics, the closest string problem is an intensively studied facet of the problem of finding signals in DNA. In keeping with the bioinformatics utility, we'll use DNA sequences as examples.
Consider the following DNA sequences:
ATCAATATCAA
ATTAAATAACT
AATCCTTAAAC
CTACTTTCTTT
TCCCATCCTTT
ACTTCAATATA
Using the Hamming distance (the number of different characters between two sequences of the same length), the all-pairs distances of the above 6 sequences puts ATTAAATAACT at the center.
Input Description
You'll be given input with the first line an integer N telling you how many lines to read for the input, then that number of lines of strings. All strings will be the same length. Example:
4
CTCCATCACAC
AATATCTACAT
ACATTCTCCAT
CCTCCCCACTC
Output Description
Your program should emit the string from the input that's closest to all of them. Example:
AATATCTACAT
Challenge Input
11
AACACCCTATA
CTTCATCCACA
TTTCAATTTTC
ACAATCAAACC
ATTCTACAACT
ATTCCTTATTC
ACTTCTCTATT
TAAAACTCACC
CTTTTCCCACC
ACCTTTTCTCA
TACCACTACTT
21
ACAAAATCCTATCAAAAACTACCATACCAAT
ACTATACTTCTAATATCATTCATTACACTTT
TTAACTCCCATTATATATTATTAATTTACCC
CCAACATACTAAACTTATTTTTTAACTACCA
TTCTAAACATTACTCCTACACCTACATACCT
ATCATCAATTACCTAATAATTCCCAATTTAT
TCCCTAATCATACCATTTTACACTCAAAAAC
AATTCAAACTTTACACACCCCTCTCATCATC
CTCCATCTTATCATATAATAAACCAAATTTA
AAAAATCCATCATTTTTTAATTCCATTCCTT
CCACTCCAAACACAAAATTATTACAATAACA
ATATTTACTCACACAAACAATTACCATCACA
TTCAAATACAAATCTCAAAATCACCTTATTT
TCCTTTAACAACTTCCCTTATCTATCTATTC
CATCCATCCCAAAACTCTCACACATAACAAC
ATTACTTATACAAAATAACTACTCCCCAATA
TATATTTTAACCACTTACCAAAATCTCTACT
TCTTTTATATCCATAAATCCAACAACTCCTA
CTCTCAAACATATATTTCTATAACTCTTATC
ACAAATAATAAAACATCCATTTCATTCATAA
CACCACCAAACCTTATAATCCCCAACCACAC
Challenge Output
ATTCTACAACT
TTAACTCCCATTATATATTATTAATTTACCC
Solution
in C++
#include <algorithm>
#include <cstdio>
#include <iostream>
#include <numeric>
#include <string>
#include <vector>
namespace {
template<typename C, typename Distance>
Distance hammingDistance(const C& from, const C& to) {
if (from.size() != to.size()) {
throw std::runtime_error("hamming distance: sequences must be equal size");
}
std::vector<Distance> same(from.size());
std::transform(
from.begin(),
from.end(),
to.begin(),
same.begin(),
[](const auto& a, const auto& b) {
return static_cast<Distance>(a != b);
});
return std::accumulate(same.begin(), same.end(), Distance{0});
}
template<typename C, typename Distance>
Distance levenshteinDistance(const C& from, const C& to) {
const auto cols = from.size() + 1;
const auto rows = to.size() + 1;
std::vector<Distance> lattice(rows * cols);
std::iota(lattice.begin(), lattice.begin() + cols, Distance{0});
for (size_t i = 0; i < rows; i++) {
lattice[i * cols] = i;
}
for (auto& cell : lattice) {
if (cell >= 0) {
continue;
}
auto idx = std::distance(lattice.data(), &cell);
auto r = idx / cols;
auto c = idx % cols;
auto ins = lattice[idx - cols] + 1;
auto del = lattice[idx - 1] + 1;
auto sub =
lattice[idx - cols - 1] + static_cast<Distance>(from[c] != to[r]);
cell = std::min({ ins, del, sub });
}
return lattice.back();
}
template<typename C, typename M>
auto totalDistance(const C& from, const std::vector<C>& strings, M metric) {
using Distance = decltype(metric(from, strings.front()));
return std::accumulate(
strings.begin(),
strings.end(),
Distance{0},
[&from,metric](auto total, const auto& to) {
return total + metric(from, to);
});
}
template<typename C, typename M>
const C& center(const std::vector<C>& strings, M metric) {
return *std::min_element(
strings.begin(),
strings.end(),
[&strings,metric](const auto& a, const auto& b) {
return totalDistance(a, strings, metric)
< totalDistance(b, strings, metric);
});
}
}
int main() {
std::vector<std::string> lines;
std::string line;
std::getline(std::cin, line); // skip number
while (std::getline(std::cin, line)) {
lines.emplace_back(std::move(line));
}
std::puts(center(lines, hammingDistance<std::string, int>).c_str());
}
Comments
Post a Comment