Gene and Genome Analysis
March 9, 2013
Mapping and amplifying an unknown gene from the plant Arabidopsis thaliana.
The purpose of this experiment was to form a gene model that defined major structural features of an unknown gene from Arabidopsis thaliana and to design PCR primers in order to amplify the gene from the genome. The derived gene map and protein sequence would then be used to predict its function in a future study. To correctly predict the promoter and protein-coding regions of the unknown gene an ab initio method found in Genscan software and a homology based approach were used. PCR primers were designed in order to amplify a region that included restriction enzyme sites. The PCR primers and DNA concentrations were tested to determine the best conditions for PCR. The amplified DNA was then purified, digested with restriction enzymes and assessed via gel electrophoresis, which confirmed the identity of the products and showed that the product was in fact the expected sequence.
Materials & Methods
Building a Working Map of our Unknown Gene:
In order to make a working map of the unknown gene, a web-based software called Web Map was used. Web Map analyzed the inputted sequence and provided a working map that included all 6 bp restriction enzyme sites and complete translations in all six frames. Two models were used to identify the protein encoded by the unknown gene; an ab initio method using Genscan software and a homology based model via GeneSeqer software. The ab initio method uses known rules about features that differ between protein-coding and other non-coding regions, thus Genscan provided the sequence of the predicted protein and the predicted coding sequence of the unknown gene. Using the homology-based model, sequences related to the gene were identified by expressed sequence tags (ESTs). Available ESTs were collected by similarity searching via BLAST to find ESTs with sequences that matched the sequence of the unknown gene. Five near perfect matches with at least a 90% similarity were collected. The EST information was then combined to form a larger contiguous cDNA sequence via CAP3 software. The contig was formed because multiple of the collected ESTs had the potential to overlap. The CAP3 program used the 5 ESTs to generate the smallest possible number of contiguous sequence. Next the EST contig sequence was used with GeneSeqer to accurately predict intron/exon boundaries. A full-length cDNA sequence for the unknown gene was found by BLAST and was compared to the contig that CAP3 formed. Then our gene was found in The Arabidopsis Information Resource (TAIR).
Designing PCR primers: Using a web-based software called Primer3, which is the standard primer picking software, we searched for an amplification region about 1000 base pairs long in an exon that had 2 restriction enzyme sites between 200 and 600 base pairs apart. BamH1 and Spe1 restriction enzymes were chosen because they came the closest to fit the requirements. The right primer was named oLKH_R, while the left primer was named oLKH_L. The left primer was 23 bp long with a Tm of 67.04°C. The right primer was also 23 bp long with a Tm of 66.94°C.
Testing PCR Conditions: The PCR primers were tested in order to determine the best conditions for PCR. The thermocycler was first programmed. The annealing temperature was calculated by subtracted 5°C from the primer with lowest Tm and rounding down to the nearest whole number. So, 5°C was subtracted from 66.94°C which equaled to 61°C after being rounded down. The PCR protocol was programmed as 30 sec. at 95°C and 32 cycles of 15 sec. at 95°C followed by 30 sec. at 61°C and 1 minute for extension. Gel electrophoresis was run for three dilutions of the genomic DNA with PCR products to determine which concentration works best. The dilutions were 1/100, 1/50,…