Projects

An overview of my bioinformatics projects with links to relevant pages and posts

Methplotlib

A visualization and analysis tool for nucleotide modifications from nanopore sequencing. More information on GitHub, examples in this blog post and published in Bioinformatics.

Nanocode

A suite of Python scripts and modules for working with Oxford Nanopore Technologies long read sequencing data. All are available through pip and bioconda, see the README files on GitHub for further instructions. Published in Bioinformatics.

  • Scripts:
    • NanoPlot
      Create plots from reads (fastq), alignments (bam) and guppy summary files. See also this gallery for examples.
    • NanoComp
      Compare multiple sequencing runs or datasets for read length, quality,  accuracy, and throughput.
    • NanoFilt
      Trim and filter reads (while streaming) on length and quality.
      See also this blog post. Optionally takes an albacore summary file to perform faster and more accurate filtering, see this post.
    • NanoStat
      Performs fast extraction of statistics from reads (fastq), alignments (bam) and guppy summary files.
    • NanoQC
      Investigates and plots sequence composition and quality at read ends similar to FastQC.
  • Modules:
    • nanoget
      Functions for extracting features from reads, alignments and guppy summary data.
    • nanomath
      Functions for mathematical processing and calculating statistics

Structural variant workflow for long reads

I have developed a Snakemake workflow for structural variant analysis from long read sequencing, published together with our analysis of structural variants in NA19240 from PromethION sequencing in Genome Research.

Surpyvor structural variant tools

I have developed surpyvor, a -so far- incomplete Python wrapper around SURVIVOR with sensible default settings, a more convenient command line interface and some additional plots and convenience functions. Published as part of our SV analysis of NA19240 in Genome Research.

DEA.R

A command-line R script for convenient and reproducible differential expression analysis using the DESeq2, edgeR and limma-voom algorithms. The script is available on GitHub. It also performs counting using featureCounts or takes counts as prepared by Salmon. Appropriate plots and various output tables with results and normalized counts are produced. A bash script for creating a test dataset is also available.

enrichr-cli

A command-line Python script to perform gene set enrichment analysis using the Enrichr database. Available on Github.

PromisingPreprints

A Python Twitter bot which tweets as soon as a bioRxiv preprint reaches the top 10% Altmetric attention score for bioRxiv preprints in the first month after publication. See also this post.