Place the steps involved in creating a dna library into the correct sequence.

Each dNTP contains a phosphate group, a sugar group, and one of four nitrogenous bases [adenine (A),thymine (T), guanine (G), or cytosine (C)]. The dNTPs are strung together in a linear fashion by phosphodiester covalent bonds between the sugar of one dNTP and the phosphate group of the next; this repeated sugar-phosphate pattern makes up the sugar-phosphate backbone.

The nitrogenous bases of the two separate strands are bound together by hydrogen bonds between complementary bases to form the double-stranded DNA helix.

How to Read Sanger Sequencing Results

Reading the Sanger sequencing results properly will depend on which of the two complementary DNA strands is of interest and what primer is available. If the two strands of DNA are A and B and strand A is of interest, but the primer is better for strand B, the output fragments will be identical to strand A. On the other hand, if strand A is of interest and the primer is better for strand A, then the output will be identical to strand B. Accordingly, the output must be converted back to strand A.

So, if the sequence of interest reads “TACG” and the primer is best for that strand, the output will be “ATGC” and, therefore, must be converted back to “TACG”. However, if the primer is better for the complementary strand (“ATGC”), then the output will be “TACG”, which is the correct sequence.

In short, before starting, you need to know what you’re targeting and how you’re going to get there! So keeping this in mind, here is an example of the former example (TACG -> ATGC -> TACG). If the dideoxynucleotides labels are T = yellow, A = pink, C = dark blue, and G = light blue, you will end up with the short sequences primer-A, primer-AT, primer-ATG, and primer-ATGC. Once the fragments have been separated by electrophoresis, the laser will read the fragments in order of length (pink, yellow, light blue, and dark blue) and produce a chromatogram. The computer will convert the letters, so the final sequence is the correct TACG.

Sanger Sequencing vs. PCR

Sanger sequencing and PCR use similar starting materials and can be used in conjunction with each other, but neither can replace the other.

PCR is used to amplify DNA in its entirety. While fragments of varying lengths may be produced by accident (e.g., the DNA polymerase might fall off), the goal is to duplicate the entire DNA sequence. To that end, the “ingredients” are the target DNA, nucleotides, DNA primer, and DNA polymerase (specifically Taq polymerase, which can survive the high temperatures required in PCR).

In contrast, the goal of Sanger sequencing is to generate every possible length of DNA up to the full length of the target DNA. That is why, in addition to the PCR starting materials, the dideoxynucleotides are necessary.

Sanger sequencing and PCR can be brought together when generating the starting material for a Sanger sequencing protocol. PCR can be used to create many copies of the DNA that is to be sequenced.

Having more than one template to work from makes the Sanger protocol more efficient. If the target sequence is 1,000 nucleotides long and there is only one copy of the template, it is going to take longer to generate the 1,000 tagged fragments. However, if there are several copies of the template, in theory it will take less time to generate all 1,000 of the tagged fragments.

A genomic library is a collection of the total genomic DNA from a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy and retrieval of specific clones from the library for analysis.

There are several kinds of vectors available with various insert capacities. Generally, libraries made from organisms with larger genomes require vectors featuring larger inserts, thereby fewer vector molecules are needed to make the library. Researchers can choose a vector also considering the ideal insert size to find the desired number of clones necessary for full genome coverage.

Genomic libraries are commonly used for sequencing applications. They have played an important role in the whole genome sequencing of several organisms, including the human genome and several model organisms.

History[edit]

The first DNA-based genome ever fully sequenced was achieved by two-time Nobel Prize winner, Frederick Sanger, in 1977. Sanger and his team of scientists created a library of the bacteriophage, phi X 174, for use in DNA sequencing. The importance of this success contributed to the ever-increasing demand for sequencing genomes to research gene therapy. Teams are now able to catalog polymorphisms in genomes and investigate those candidate genes contributing to maladies such as Parkinson's disease, Alzheimer's disease, multiple sclerosis, rheumatoid arthritis, and Type 1 diabetes. These are due to the advance of genome-wide association studies from the ability to create and sequence genomic libraries. Prior, linkage and candidate-gene studies were some of the only approaches.

Genomic library construction[edit]

Construction of a genomic library involves creating many recombinant DNA molecules. An organism's genomic DNA is extracted and then digested with a restriction enzyme. For organisms with very small genomes (~10 kb), the digested fragments can be separated by gel electrophoresis. The separated fragments can then be excised and cloned into the vector separately. However, when a large genome is digested with a restriction enzyme, there are far too many fragments to excise individually. The entire set of fragments must be cloned together with the vector, and separation of clones can occur after. In either case, the fragments are ligated into a vector that has been digested with the same restriction enzyme. The vector containing the inserted fragments of genomic DNA can then be introduced into a host organism.

Below are the steps for creating a genomic library from a large genome.

  1. Extract and purify DNA.
  2. Digest the DNA with a restriction enzyme. This creates fragments that are similar in size, each containing one or more genes.
  3. Insert the fragments of DNA into vectors that were cut with the same restriction enzyme. Use the enzyme DNA ligase to seal the DNA fragments into the vector. This creates a large pool of recombinant molecules.
  4. These recombinant molecules are taken up by a host bacterium by transformation, creating a DNA library.

Below is a diagram of the above outlined steps.

Genomic Library Construction

Determining titer of library[edit]

After a genomic library is constructed with a viral vector, such as lambda phage, the titer of the library can be determined. Calculating the titer allows researchers to approximate how many infectious viral particles were successfully created in the library. To do this, dilutions of the library are used to transform cultures of E. coli of known concentrations. The cultures are then plated on agar plates and incubated overnight. The number of viral plaques are counted and can be used to calculate the total number of infectious viral particles in the library. Most viral vectors also carry a marker that allows clones containing an insert to be distinguished from those that do not have an insert. This allows researchers to also determine the percentage of infectious viral particles actually carrying a fragment of the library.

A similar method can be used to titer genomic libraries made with non-viral vectors, such as plasmids and BACs. A test ligation of the library can be used to transform E. coli. The transformation is then spread on agar plates and incubated overnight. The titer of the transformation is determined by counting the number of colonies present on the plates. These vectors generally have a selectable marker allowing the differentiation of clones containing an insert from those that do not. By doing this test, researchers can also determine the efficiency of the ligation and make adjustments as needed to ensure they get the desired number of clones for the library.

Screening library[edit]

Colony Blot Hybridization

In order to isolate clones that contain regions of interest from a library, the library must first be . One method of screening is . Each transformed host cell of a library will contain only one vector with one insert of DNA. The whole library can be plated onto a filter over media. The filter and colonies are prepared for hybridization and then labeled with a probe. The target DNA- insert of interest- can be identified by detection such as autoradiography because of the with the probe as seen below.

Another method of screening is with polymerase chain reaction (PCR). Some libraries are stored as pools of clones and screening by PCR is an efficient way to identify pools containing specific clones.

Types of vectors[edit]

Genome size varies among different organisms and the cloning vector must be selected accordingly. For a large genome, a vector with a large capacity should be chosen so that a relatively small number of clones are sufficient for coverage of the entire genome. However, it is often more difficult to characterize an insert contained in a higher capacity vector.

Below is a table of several kinds of vectors commonly used for genomic libraries and the insert size that each generally holds.

Plasmids[edit]

A plasmid is a double stranded circular DNA molecule commonly used for molecular cloning. Plasmids are generally 2 to 4 kilobase-pairs (kb) in length and are capable of carrying inserts up to 15kb. Plasmids contain an origin of replication allowing them to replicate inside a bacterium independently of the host chromosome. Plasmids commonly carry a gene for antibiotic resistance that allows for the selection of bacterial cells containing the plasmid. Many plasmids also carry a reporter gene that allows researchers to distinguish clones containing an insert from those that do not.

Phage lambda (λ)[edit]

Phage λ is a double-stranded DNA virus that infects E. coli. The λ chromosome is 48.5kb long and can carry inserts up to 25kb. These inserts replace non-essential viral sequences in the λ chromosome, while the genes required for formation of viral particles and infection remain intact. The insert DNA is replicated with the viral DNA; thus, together they are packaged into viral particles. These particles are very efficient at infection and multiplication leading to a higher production of the recombinant λ chromosomes. However, due to the smaller insert size, libraries made with λ phage may require many clones for full genome coverage.

Cosmids[edit]

Cosmid vectors are plasmids that contain a small region of bacteriophage λ DNA called the cos sequence. This sequence allows the cosmid to be packaged into bacteriophage λ particles. These particles- containing a linearized cosmid- are introduced into the host cell by transduction. Once inside the host, the cosmids circularize with the aid of the host's DNA ligase and then function as plasmids. Cosmids are capable of carrying inserts up to 40kb in size.

Bacteriophage P1 vectors[edit]

Bacteriophage P1 vectors can hold inserts 70 – 100kb in size. They begin as linear DNA molecules packaged into bacteriophage P1 particles. These particles are injected into an E. coli strain expressing Cre recombinase. The linear P1 vector becomes circularized by recombination between two loxP sites in the vector. P1 vectors generally contain a gene for antibiotic resistance and a positive selection marker to distinguish clones containing an insert from those that do not. P1 vectors also contain a P1 plasmid replicon, which ensures only one copy of the vector is present in a cell. However, there is a second P1 replicon- called the P1 lytic replicon- that is controlled by an inducible promoter. This promoter allows the amplification of more than one copy of the vector per cell prior to DNA extraction.

P1 artificial chromosomes[edit]

P1 artificial chromosomes (PACs) have features of both P1 vectors and Bacterial Artificial Chromosomes (BACs). Similar to P1 vectors, they contain a plasmid and a lytic replicon as described above. Unlike P1 vectors, they do not need to be packaged into bacteriophage particles for transduction. Instead they are introduced into E. coli as circular DNA molecules through electroporation just as BACs are. Also similar to BACs, these are relatively harder to prepare due to a single origin of replication.

Bacterial artificial chromosomes[edit]

Bacterial artificial chromosomes (BACs) are circular DNA molecules, usually about 7kb in length, that are capable of holding inserts up to 300kb in size. BAC vectors contain a replicon derived from E. coli F factor, which ensures they are maintained at one copy per cell. Once an insert is ligated into a BAC, the BAC is introduced into recombination deficient strains of E. coli by electroporation. Most BAC vectors contain a gene for antibiotic resistance and also a positive selection marker. The figure to the right depicts a BAC vector being cut with a restriction enzyme, followed by the insertion of foreign DNA that is re-annealed by a ligase. Overall, this is a very stable vector, but they may be hard to prepare due to a single origin of replication just like PACs.

Yeast artificial chromosomes[edit]

Yeast artificial chromosomes (YACs) are linear DNA molecules containing the necessary features of an authentic yeast chromosome, including telomeres, a centromere, and an origin of replication. Large inserts of DNA can be ligated into the middle of the YAC so that there is an “arm” of the YAC on either side of the insert. The recombinant YAC is introduced into yeast by transformation; selectable markers present in the YAC allow for the identification of successful transformants. YACs can hold inserts up to 2000kb, but most YAC libraries contain inserts 250-400kb in size. Theoretically there is no upper limit on the size of insert a YAC can hold. It is the quality in the preparation of DNA used for inserts that determines the size limit. The most challenging aspect of using YAC is the fact they are prone to rearrangement.

How to select a vector[edit]

Vector selection requires one to ensure the library made is representative of the entire genome. Any insert of the genome derived from a restriction enzyme should have an equal chance of being in the library compared to any other insert. Furthermore, recombinant molecules should contain large enough inserts ensuring the library size is able to be handled conveniently. This is particularly determined by the number of clones needed to have in a library. The number of clones to get a sampling of all the genes is determined by the size of the organism's genome as well as the average insert size. This is represented by the formula (also known as the Carbon and Clarke formula):

N=ln(1−P)ln(1−f){\displaystyle N={\frac {ln(1-P)}{ln(1-f)}}}

Place the steps involved in creating a dna library into the correct sequence.

where,

N{\displaystyle N}

Place the steps involved in creating a dna library into the correct sequence.
is the necessary number of recombinants

P{\displaystyle P}

Place the steps involved in creating a dna library into the correct sequence.
is the desired probability that any fragment in the genome will occur at least once in the library created

f{\displaystyle f}

Place the steps involved in creating a dna library into the correct sequence.
is the fractional proportion of the genome in a single recombinant

f{\displaystyle f} can be further shown to be:

f=ig{\displaystyle f={\frac {i}{g}}}

Place the steps involved in creating a dna library into the correct sequence.

where,

i{\displaystyle i}

Place the steps involved in creating a dna library into the correct sequence.
is the insert size

g{\displaystyle g}

Place the steps involved in creating a dna library into the correct sequence.
is the genome size

Thus, increasing the insert size (by choice of vector) would allow for fewer clones needed to represent a genome. The proportion of the insert size versus the genome size represents the proportion of the respective genome in a single clone. Here is the equation with all parts considered:

N=ln(1−P)ln(1−ig){\displaystyle N={\frac {ln(1-P)}{ln(1-{\frac {i}{g}})}}}

Place the steps involved in creating a dna library into the correct sequence.

Vector selection example[edit]

The above formula can be used to determine the 99% confidence level that all sequences in a genome are represented by using a vector with an insert size of twenty thousand basepairs (such as the phage lambda vector). The genome size of the organism is three billion basepairs in this example.

N=ln(1−0.99)ln[1−2.0×104basepairs3.0×109basepairs]{\displaystyle N={\frac {ln(1-0.99)}{ln[1-{\frac {2.0\times 10^{4}basepairs}{3.0\times 10^{9}basepairs}}]}}}

Place the steps involved in creating a dna library into the correct sequence.

N=−4.61−6.7×10−6{\displaystyle N={\frac {-4.61}{-6.7\times 10^{-6}}}}

Place the steps involved in creating a dna library into the correct sequence.

N=688,060{\displaystyle N=688,060}

Place the steps involved in creating a dna library into the correct sequence.
clones

Thus, approximately 688,060 clones are required to ensure a 99% probability that a given DNA sequence from this three billion basepair genome will be present in a library using a vector with an insert size of twenty thousand basepairs.

After a library is created, the genome of an organism can be to elucidate how genes affect an organism or to compare similar organisms at the genome-level. The aforementioned genome-wide association studies can identify candidate genes stemming from many functional traits. Genes can be isolated through genomic libraries and used on human cell lines or animal models to further research. Furthermore, creating high-fidelity clones with accurate genome representation and no stability issues would contribute well as intermediates for shotgun sequencing or the study of complete genes in functional analysis.

Hierarchical sequencing[edit]

Whole genome shotgun sequencing versus Hierarchical shotgun sequencing

One major use of genomic libraries is , which is also called top-down, map-based or clone-by-clone sequencing. This strategy was developed in the 1980s for sequencing whole genomes before high throughput techniques for sequencing were available. Individual clones from genomic libraries can be sheared into smaller fragments, usually 500bp to 1000bp, which are more manageable for sequencing. Once a clone from a genomic library is sequenced, the sequence can be used to screen the library for other clones containing inserts which overlap with the sequenced clone. Any new overlapping clones can then be sequenced forming a contig. This technique, called chromosome walking, can be exploited to sequence entire chromosomes.

is another method of genome sequencing that does not require a library of high-capacity vectors. Rather, it uses computer algorithms to assemble short sequence reads to cover the entire genome. Genomic libraries are often used in combination with whole genome shotgun sequencing for this reason. A high resolution map can be created by sequencing both ends of inserts from several clones in a genomic library. This map provides sequences of known distances apart, which can be used to help with the assembly of sequence reads acquired through shotgun sequencing. The human genome sequence, which was declared complete in 2003, was assembled using both a BAC library and shotgun sequencing.

Genome-wide association studies[edit]

Genome-wide association studies are general applications to find specific gene targets and polymorphisms within the human race. In fact, the International HapMap project was created through a partnership of scientists and agencies from several countries to catalog and utilize this data. The goal of this project is to compare genetic sequences of different individuals to elucidate similarities and differences within chromosomal regions. Scientists from all of the participating nations are cataloging these attributes with data from populations of African, Asian, and European ancestry. Such genome-wide assessments may lead to further diagnostic and drug therapies while also helping future teams focus on orchestrating therapeutics with genetic features in mind. These concepts are already being exploited in genetic engineering. For example, a research team has actually constructed a PAC shuttle vector that creates a library representing two-fold coverage of the human genome. This could serve as an incredible resource to identify genes, or sets of genes, causing disease. Moreover, these studies can serve as a powerful way to investigate transcriptional regulation as it has been seen in the study of baculoviruses. Overall, advances in genome library construction and DNA sequencing has allowed for efficient discovery of different molecular targets. Assimilation of these features through such efficient methods can hasten the employment of novel drug candidates.

References[edit]

Further reading[edit]

Klug, Cummings, Spencer, Palladino (2010). Essentials of Genetics. Pearson. pp. 355–264. ISBN 978-0-321-61869-6.{{cite book}}: CS1 maint: multiple names: authors list (link)

What is the correct sequence for DNA?

3'TGACGACTACAACTTAATCT 5' (That is correct.) Adenine bonds with thymine; guanine bonds with cytosine. This is the complement DNA sequence. RNA polymerase reads the 3' to 5' DNA sequence to make mRNA.

Which sequence of steps should be followed when preparing a genomic library?

In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase.

Which methods are used for preparation of DNA library?

The main library preparation methods are ligation-based library preparation tagmentation-based library preparation, and amplicon library preparation.

What is the first step in library preparation for whole genome sequencing quizlet?

What is the first step in a probe-based targeted library preparation? Ten equivalents of a genome were cut into small pieces and sequenced. Computers were then used to put the sequence of the pieces together to determine the sequence of the intact genome.