Bioinformatics

Quality Control

All data which comes into the facility, weather we create it or you give it to use, is subject to quality control testing. For sequencing data, we primarily use fast QC. We use other tools and tests where appropriate.

RNA-Seq Analysis

RNA-Seq requires many of the same steps as microarray analysis, but is confounded by not having well-defined, well designed oligos with good annotation, but rather short reads of uncertain origin. A major step in this analysis is aligning the reads to a reference genome or transcriptome, or creating a de novo transcriptome.

Once these steps are complete, analysis follows a path very similar to microarray analysis. We use STAR for the mapping, and DESeq2 to calculate fold change and significant differential expression. Partek can also be used for the analysis after mapping is done.

What we need from you

Fastq files, and the experimental design. Also, a reference genome, and hopefully an annotation file (.gff, .gtf, .gtf3).

What you will get from us

Pretty much the same as with the microarray analysis, though the QC figures will be different. If there is sufficient sequence, we can also do an analysis of alternative splicing, and search for novel splice sites.

Polymorphism Analysis

Single Nucleotide Polymorphisms and structural polymorphisms (Copy Number Variants, Presence/Absence variants) can be discovered and characterized by resequencing individuals from species for which a reference genome exists. The can also be characterized for new genomes by assembling those genomes from scratch (de novo assembly).

What we need from you

Fastq files, and, if available, a reference genome, and hopefully an annotation file (.gff, .gtf, .gtf3).

What you will get from us

We will perform the assemblies and provide a list of polymorphisms. If needed, we can provide a summary of the effects of those polymorphisms on protein-coding sequences, assuming an annotation file is provided.

Metagenomic Analysis

Increasingly, entire communities of organisms are being sequenced in bulk from samples collected clinically or in the environment. The microbial compositon of these samples can be determined through bulk sequencing, thereby providiing a relatively unbiased view of the diversity of organisms and genes present in the sample, and how these change with environmental or clinical conditions.

What we need from you

An experimental design, fastq files, and, if available, reference genomes, and annotation files (.gff, .gtf, .gtf3). We also need a reference database that contains sequences of organisms that are likely to be represented in the sample, or at least close relatives.

What you will get from us

We will perform the assemblies of the sequences (amplicons or whole genome shotgun), and will return a phylogenetic profile and/or gene composition for the community, within the limits of the provided reference database.

MicroRNA

We will determine the presence, and even relative abundance of, known microRNAs. Finding novel microRNAs, or circular RNA, is something we can do, but it would fall under the category of ‘uncommon tasks’, and would be billed accordingly.

ChIP-Seq and HITS-CLIP

Chromatin immunoprecipitation followed by sequencing determines the genomic binding sites of any protein you have an antibody to. Histone methylation is a popular application of this technique. HITS-CLIP is similar, but immunoprecipitates RNA.

What we need from you

Fastq files and a reference genome.

What you will get from us

Binding sites of your protein.

Microarray Analysis

Microarray analysis generally consists of three parts: normalizing data, calculating fold change and significance, and determining biological significance and pathway analysis. For the first two parts, we primarily use R and Bioconductor. There are several standard procedures we follow to produce publishable results. We also use the Partek Genomics Suite, a commercial product with a graphical interface and automated reference downloading. Partek will do some pathway analysis, though we have not validated its usefulness in that regard. For ontology and pathway analysis we use either DAVID or Ingenuity Pathway Analysis.

What we need from you

If the data is produced in our microarray facility, all we need is the experimental design ie what samples are case vs control, which are biological replicates, which are technical replicates.

If the data is produced elsewhere, in addition to the experimental design we will need all the .cel files produced, as well as the type of array used (this data is encoded in the .cel files, but I like to make sure everything agrees.) Custom arrays will need more information.

What you will get from us

You will get a list of genes with significant changes, GO terms, and possible affected pathways. In addition, we can produce heat maps, clusterings, volcano plots, and various QC figures. This is all quite standard. If your experiment is not so standard, we can adapt our methods to suit your needs.

Department of Biology Eberly College of Arts & Sciences

Sitemap

Quicklinks

University Resources