High Throughput Sequencing of Non-Model Organisms
High-throughput sequencing (HTS) is making the sequencing of complete genomes increasingly affordable; it has even been possible for PhD students at the FBA to assemble high quality vertebrate genomes without external funding. This has also led to the appearance of a number of linked projects that have as their objective the determination of all known species. However, this does not mean an end to the need for sequencing as many research questions can only be addressed through the analysis of how genomes vary, both at the sequence level and at the epigenetic level (i.e. base modifications).
HTS is particularly useful for the study of non-model organsism for which reliable genomic resources are unavailable. Early versions of HTS provided a massive increase in the quantity of sequences over existing technologies (automated Sanger sequencing), but were only able to provide short sequence reads. Both sequence throughphut (the total quantity) and read lengths have increased drastically, and the determination of sequences of several million base pairs from a single DNA molecule have now been determined using Nanopore sequencing.
Long read sequencing is particularly suitable for non-model organisms as it makes it much easier to assemble primary sequences into contigs that can be close to chromosome size. Long reads also make it easy to determine transcriptomes as complete RNA molecules can be sequenced, thus removing the need to assemble transcripts from short reads. This enables both annotation and greatly facilitates the analysis of gene expression. The most recent methodologies also allow the direct detection of base modifications enabling research in epigenetics.
In previous versions of DR425F we have covered short read sequencing with Illumina and Ion torrent based sequencing. But from 2025 onwards we will focus on long-read technologies and Oxford Nanopore (ONT) sequencing specifically. This is because ONT sequencing is particularly suitable for non-model organisms and for small scale exploratory projects where funding may be limited.
Making efficient use of any form of HTS requires an understanding of the full process of sequence generation, from sample preparation to the evaluation and analyses of complete genomes. The course will thus include both laboratory (week 1) and data analysis (week 2) components as well as lectures in topics related to the use and analysis of HTS in general and long-read HTS specifically.
The course includes two full weeks of in person teaching activities and students will also need to allocate time for independent study in order to prepare presentations and to complete the written assignment.
Course components
Laboratory component
1. Sample preparation.
2. DNA extraction and purification.
3. Library preparation.
4. Sequencing, with repeated library loading.
Sequence data analysis and handling
1. Base calling
2. Crash course in using command line interfaces (Bash) on remote servers.
3. Sequence file formats and data encoding.
4. Raw sequence quality assessment.
5. Assembly, and assembly quality estimates
6. Variant detection and base modification analyses (primarily DNA methylation).
Course dates for spring 2025: 5 - 16 May
Knowledge
The candidate should:
- Be at the forefront of knowledge within the academic field in HTS of non-model organisms
- Be able to understand the power and limitations of HTS technologies
- Be able to understand and communicate the specific strength and weaknesses of alternative HTS approaches
- Be able to understand the principles of de novo genome and transcriptome assembly
- Be able of understand how to assess transcript prevalence from RNA-seq data
Skills
The candidate should:
- Be able to prepare a genomic library and perform a sequencing run at an NGS platform
- Be able to make use of basic computational resources for NGS
- Know how to transfer large data sets between computers
- Be able to execute scripts and run extended analyses
- Be able to map short-read sequence data to sequenced genomes and query the mapping for validation
- Read, understand and communicate up-front reviews and research literature on HTS of non-model organisms
General competence
The candidate should:
- Be able to address biological questions in non-model species using short sequence reads
- Be able to choose the most suitable sequencing technology and analytical tools to be used to address the problem at hand
- Be able to convey essential topics, exchange experiences, and keep updated within the field of HTS and NGS platforms
Course dates for spring 2025: 5 - 16 May
Lectures will address how HTS and long read technologies are facilitating genome assembly and downstream analyses. These will not be limited to Nanopore data, but will also cover how the different technologies are used in the full assembly process (scaffolding and refinement) and downstream analyses. The topics included will cover all or most of:
- The history and current developments in HTS.
- The use of different and complementing technologies in genome assembly.
Use of HTS and long reads in:
- Population genomics
- Transcriptome analysis and annotation
- Base modification detection and epigenetics
Note that the specific topics may vary due to the availability of lecturers and funding.
As part of the course students will also be required to present a research article chosen by us and to complete a written assignment. This presentation and accompanying assignment will constitute the course exam.
Note that physical presence at the course is a requirement for obtaining the study points.