Skip to main content

Featured Post

Your First Programming Language

What programming language you start with really all depends on where you want to go with programming/coding. The great thing about this field is that there are an absolute abundance of smaller fields that you can go into, all using programming in their own unique ways. For web applications, a good start would be with HTML and later moving your way through CSS, JavaScript, JQuery, PHP, SQL, and any of the JavaScript libraries. Ruby is also a popular choice, so I would recommend checking that out too. For more scientific fields or areas with more machine learning and A.I., Python is generally a great place to start as it is widely used in that field of study. C++ is also a very useful language to know for that, but it can be a little more challenging for beginners. For game and application design, languages such as C#, C, Swift, Kotlin, and Java are most often used for that.

DNA Sequencing

Description

DNA sequences are made up of a 4 character alphabet - A, C, T or G, that describe the nucleotide bases in a gene sequence. To ascertain the sequence of DNA, scientists use chemical methods to identify the component nucleotides in a method called DNA sequencing. DNA shotgun sequencing is a method whereby DNA subsequences of the same larger sequence are produced at massive parallel scale by DNA sequencing methods, and the overlap between segments is used to reconstruct the input gene. This is a fast and accurate method, and is dropping in price. Shotgun sequencing was used to perform the first entire sequence of a human's DNA, for example. For additional background information, see Wikipedia on shotgun sequencing.

You're working in a DNA laboratory and you have to reconstruct a gene's sequence from a series of fragments!

Formal Input Description
You'll be given a series of DNA sequence fragments, which include overlaps with neighbor sequences, but not in any specific order - it's random. Your job is to read in the fragments and reconstruct the input DNA sequence that lead to the fragments.

Formal Output Description
Your program should emit the DNA sequence it calculated.

Sample Input
You'll be given the DNA sequence of one of the strands of DNA (e.g. no complementarity calculations or inferences required) using the DNA alphabet of "a,t,c,g". Assume no read errors, also. Example:

    tgca
    taggcta
    gtcatgcttaggcta
    agcatgctgcag
    tcatgct

Sample Output
Your program should emit the shortest DNA sequence that would contain the above fragments. Example:

    agcatgctgcagtcatgcttaggcta

Challenge Input
    gatccatctggatcctatagttcatggaaagccgctgc
    tatttcaacattaattgttggttttgatacagatggtacacca
    aaaagaaattcaaaaagaacaagaaaaatctgaaaaacaacaaaa
    ggaatgtcaatcctatagattaactgttgaagattcaccatcagttg
    tggaaataaaaatattgaaattgcagtcattagaaataaacaac
    tcaagtagaatatgccatggaagcagtaagaaaaggtactgttg
    tgcaagatcaattagaaaaatcgttaaattagatgaccacatt
    tgtcgttgaagctgaaaaagaaattcaaaaagaacaagaaaaatct
    gaaaaacaacaaaaataaattacatcaaattccttttttt
    caatcgttttattagatgaacaagaaattgataaattagttgc
    aatctttatcaaactgatccatctggatcctatagttcatg
    gaaattgcagtcattagaaataaacaaccaatcgttttattagatg
    atcgttaaattagatgaccacatttgtttaacctttgctggt
    aattatacagacgttagtgaagaggaatcaattaaattagcagtta
    tatactcaaagtggtggtgttagaccatttggtatttcaacattaat
    ttttaggtgttgaaaagaaagcaaccgctaaacttcaaga
    aagaaagcaaccgctaaacttcaagatgcaagatcaattagaaaa
    ccccacctttttttttaattatcttcaagtttttttaaaaaaaaaaaaaaaa
    gaatttttagaaaagaattatacagacgttagtgaagaggaatc
    agtgcaagatacgatagagcaattacagttttctcaccagatg
    aattaaattagcagttagagctttattagagattgttgaaag
    cagttggtgtacgtggtaaagatgttattgttttaggtgttgaa
    ttcaacaacgttatactcaaagtggtggtgttagaccatttgg
    ataaattacatcaaattcctttttttccccacctttttttt
    aattggtcgtagttcaaagagtgttggtgaatttttagaaaag
    aatatatttctaaatttattgctggtattcaacaacgt
    aacaagaaattgataaattagttgctgtcgttgaagctga
    gagctttattagagattgttgaaagtggaaataaaaatatt
    ttaactgccgattcacgtgtattaattagtaaagcattaat
    acgatagagcaattacagttttctcaccagatggtcatctttt
    aaggtactgttgcagttggtgtacgtggtaaagatgttattg
    tgtttaacctttgctggtttaactgccgattcacgtgtattaatt
    aataatataatatatatataaatacataataatgtcaagtgcaagat
    agtaaagcattaatggaatgtcaatcctatagattaactgt
    tgaagattcaccatcagttgaatatatttctaaatttattgctggta
    gaaagccgctgcaattggtcgtagttcaaagagtgttggt
    gtcatctttttcaagtagaatatgccatggaagcagtaagaa
    tgttggttttgatacagatggtacaccaaatctttatcaaact

Challenge Input Solution
    aataatataatatatatataaatacataataatgtcaagtgcaagatacgatagagcaattacagttttctcaccagatggtcatctttttcaagtagaatatgccatggaagcagtaagaaaaggtactgttgcagttggtgtacgtggtaaagatgttattgttttaggtgttgaaaagaaagcaaccgctaaacttcaagatgcaagatcaattagaaaaatcgttaaattagatgaccacatttgtttaacctttgctggtttaactgccgattcacgtgtattaattagtaaagcattaatggaatgtcaatcctatagattaactgttgaagattcaccatcagttgaatatatttctaaatttattgctggtattcaacaacgttatactcaaagtggtggtgttagaccatttggtatttcaacattaattgttggttttgatacagatggtacaccaaatctttatcaaactgatccatctggatcctatagttcatggaaagccgctgcaattggtcgtagttcaaagagtgttggtgaatttttagaaaagaattatacagacgttagtgaagaggaatcaattaaattagcagttagagctttattagagattgttgaaagtggaaataaaaatattgaaattgcagtcattagaaataaacaaccaatcgttttattagatgaacaagaaattgataaattagttgctgtcgttgaagctgaaaaagaaattcaaaaagaacaagaaaaatctgaaaaacaacaaaaataaattacatcaaattcctttttttccccacctttttttttaattatcttcaagtttttttaaaaaaaaaaaaaaaa



Solution
in C++

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include<array>

bool check(std::string s, std::vector<std::string> r) {
    for (auto str: r)
        if (s.find(str) == std::string::npos) {
            std::cout << str << " not found in " << s << std::endl;
            return false;
        }
    return true;
}


/* Returns pair of number x,y
 * x - length of common edge in the end of S and begiinning of T
 * y - length of common edge in the end of T and begiinning of S
 * based on longest common substring algorithm
 * */
std::pair<size_t, size_t> LCSubstr(std::string S, std::string T) {
    size_t m = S.length(), n = T.length();
    std::vector<std::vector<size_t>> L(m);

    for (auto &t : L)
        t.resize(n);


    std::pair<size_t, size_t> x{0, 0};
    for (size_t i = 0; i < m; ++i)
        for (size_t j = 0; j < n; ++j) {
            if (S[i] == T[j]) {
                if (i == 0 || j == 0)
                    L[i][j] = 1;
                else
                    L[i][j] = L[i - 1][j - 1] + 1;

                if (i == m - 1 && j == L[i][j] - 1 && L[i][j] > x.first) {
                    x.first = L[i][j];
                }
                if (j == n - 1 && i == L[i][j] - 1 && L[i][j] > x.second)
                    x.second = L[i][j];
            } else
                L[i][j] = 0;
        }

    return x;
}

int main() {
    std::vector<std::string> reads, r2;

    std::ifstream t;
    t.open("dna.txt");
    while (t) {
        std::string buffer;
        std::getline(t, buffer);
        if (buffer == "")
            continue;
        reads.push_back(buffer);
    }
    t.close();
    r2 = reads;

    //remove substrings
    std::sort(reads.begin(), reads.end(), [](std::string a, std::string b) { return a.length() > b.length(); });
    for (size_t i = 0; i < reads.size(); ++i)
        for (size_t j = i + 1; j < reads.size(); ++j) {
            if (reads[i].find(reads[j]) != std::string::npos) {
                reads.erase(reads.begin() + j);
                --j;
            }
        }

    std::vector<std::vector<std::pair<size_t, size_t>>> lengths(reads.size());
    for (auto &l : lengths)
        l.resize(reads.size());

    //fill data about common edges lengths
    for (size_t i = 0; i < lengths.size(); ++i)
        for (size_t j = i + 1; j < lengths[i].size(); ++j)
            lengths[i][j] = LCSubstr(reads[i], reads[j]);

    while (reads.size() > 1) {

        size_t x = 0, y = 1;
        bool first = true;
        size_t len = lengths[0][1].first;

        //find longest edge
        for (size_t i = 0; i < lengths.size(); ++i)
            for (size_t j = i + 1; j < lengths[i].size(); ++j) {
                if (lengths[i][j].first > len) {
                    len = lengths[i][j].first;
                    x = i;
                    y = j;
                    first = true;
                }
                if (lengths[i][j].second > len) {
                    len = lengths[i][j].second;
                    x = i;
                    y = j;
                    first = false;
                }
            }

        //join strings and information
        if (first) {
            reads[x] += reads[y].substr(len);
            for (size_t i = 0; i < lengths[x].size(); ++i)
                lengths[x][i].first = lengths[y][i].first;
            for (auto &l : lengths)
                l[x].second = l[y].second;
        } else {
            reads[x] = reads[y] + reads[x].substr(len);
            for (size_t i = 0; i < lengths[x].size(); ++i)
                lengths[x][i].second = lengths[y][i].second;
            for (auto &l : lengths)
                l[x].first = l[y].first;
        }

        reads.erase(reads.begin() + y);
        lengths.erase(lengths.begin() + y);
        for (auto &l : lengths)
            l.erase(l.begin() + y);

    }

    std::cout << reads[0] << std::endl;
    std::cout << check(reads[0], r2);

}

Comments

Popular posts from this blog

Decipher A Seven Segment Display

Description Today's challenge will be to create a program to decipher a seven segment display, commonly seen on many older electronic devices. Input Description For this challenge, you will receive 3 lines of input, with each line being 27 characters long (representing 9 total numbers), with the digits spread across the 3 lines. Your job is to return the represented digits. You don't need to account for odd spacing or missing segments. Output Description Your program should print the numbers contained in the display. Challenge Inputs     _  _     _  _  _  _  _   | _| _||_||_ |_   ||_||_|   ||_  _|  | _||_|  ||_| _|     _  _  _  _  _  _  _  _ |_| _| _||_|| ||_ |_| _||_   | _| _||_||_| _||_||_  _|  _  _  _  _  _  _  _  _  _ |_  _||_ |_| _|  ||_ | ||_|  _||_ |_||_| _...

Continued Fraction

Description In mathematics, a continued fraction is an expression obtained through an iterative process of representing a number as the sum of its integer part and the reciprocal of another number, then writing this other number as the sum of its integer part and another reciprocal, and so on. A continued fraction is an expression of the form             1     x + ----------                1         y + -------                   1             z + ----                  ... and so forth, where x, y, z, and such are real numbers, rational numbers, or complex numbers. Using Gauss notation, this may be abbreviated as [x; y, z, ...] To convert a continued fraction to an ordinary fraction, we just simplify from the right side, which may be an improper fraction, ...

Kolakoski Sequence

Description A Kolakoski sequence (A000002) is an infinite sequence of symbols {1, 2} that is its own run-length encoding. It alternates between "runs" of symbols. The sequence begins: 12211212212211211221211212211... The first three symbols of the sequence are 122, which are the output of the first two iterations. After this, on the i-th iteration read the value x[i] of the output (one-indexed). If i is odd, output x[i] copies of the number 1. If i is even, output x[i] copies of the number 2. There is an unproven conjecture that the density of 1s in the sequence is 1/2 (50%). In today's challenge we'll be searching for numerical evidence of this by tallying the ratio of 1s and 2s for some initial N symbols of the sequence. Input Description As input you will receive the number of outputs to generate and tally. Output Description As output, print the ratio of 1s to 2s in the first n symbols. Sample Input 10 100 1000 Sample Output 5:5 49:51 502:498...

Advanced pacman

Description This challenge takes its roots from the world-famous game Pacman. To finish the game, pacman needs to gather all pacgum on the map. The goal of this chalenge is to have a time-limited pacman. Pacman must gather as much pacgum as possible in the given time. To simplify, we will say that 1 move (no diagonals) = 1 unit of time. Formal Inputs & Outputs Input description You will be given a number, the time pacman has to gather as much pacgum as possible, and a table, being the map pacman has to explore. Every square of this map can be one of those things : A number N between (1 and 9) of pacgums that pacman can gather in one unit of time. "X" squares cannot be gone through. "C" will be where pacman starts. "O" (the letter, not zero ) will be a warp to another "O". There can be only 2 "O" on one map; Output description Your program should output the maximum number of pacgums pacman can gather in the given t...

Puzzle Me This

Description First they took our jerbs, now they're taking our puzzles! (with your help) Today we're gonna find a way to solve jigsaw puzzles using computers Input Description As I am no designer the input will be purely numerical, feel free to make some visual version of the jigsaw puzzles :) You will first be given the dimension as X, Y Afterwards you will be given list of puzzle pieces and what type their 4 sides connect to (given as up, right, down, left) Their side-connection is given as a number, They connect with their negated number this means that a 1 and -1 connects, 2 and -2 connects etc. 0 means that it doesnt connect with anything. Assume pieces are rotated in the correct direction. fx: 2, 2 0: 0,1,2,0 1: 0,0,2,-1 2: -2,0,0,2 3: -2,-2,0,0 Output Description Output is a 2D picture/matrix of the pieces in their correct position for the example this would be 0 1 3 2 Challenge Input Challenges are generated, so there is a slight chanc...

Everyone's A Winner

Description Today's challenge comes from the website fivethirtyeight.com, which runs a weekly Riddler column. Today's dailyprogrammer challenge was the riddler on 2018-04-06. From Matt Gold, a chance, perhaps, to redeem your busted bracket: On Monday, Villanova won the NCAA men’s basketball national title. But I recently overheard some boisterous Butler fans calling themselves the “transitive national champions,” because Butler beat Villanova earlier in the season. Of course, other teams also beat Butler during the season and their fans could therefore make exactly the same claim. How many transitive national champions were there this season? Or, maybe more descriptively, how many teams weren’t transitive national champions? (All of this season’s college basketball results are here. To get you started, Villanova lost to Butler, St. John’s, Providence and Creighton this season, all of whom can claim a transitive title. But remember, teams beat those teams, too.) Outp...

Star Battle solver

Background Star Battle is a grid-based logic puzzle. You are given a SxS square grid divided into S connected regions, and a number N. You must find the unique way to place N*S stars into the grid such that: Every row has exactly N stars. Every column has exactly N stars. Every region has exactly N stars. No two stars are horizontally, vertically, or diagonally adjacent. If you would like more information: Star Battle rules and info YouTube tutorial and written tutorial of solving Star Battle puzzles by hand There are many Star Battle puzzles available on Grandmaster Puzzles. Just be aware that some are variants that follow different rules. Challenge Write a program to solve a Star Battle puzzle in a reasonable amount of time. There's no strict time requirement, but you should run your program through to completion for at least one (N, S) = (2, 10) puzzle for it to count as a solution. Feel free to use whatever input/output format is most convenient for you. In the ...