{ "nbformat_minor": 0, "nbformat": 4, "cells": [ { "execution_count": null, "cell_type": "code", "source": [ "%matplotlib inline" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "\n# Centromere calling on the duan et al yeast data\n\n\nThis small example shows how to perform a quick centromere call on the first 5\nchromosomes of Duan et al yeast data.\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "import numpy as np\nfrom centurion.externals import iced\nfrom centurion import centromeres_calls\nimport matplotlib.pyplot as plt\nfrom matplotlib import colors" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "Firt load the sample data available in the iced package.\nThe data consists of the first five chromosomes of the budding yeast.\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "counts, lengths = iced.datasets.load_sample_yeast()" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "Then apply centurion's centromere calling algorithm. This yields the\nestimated position of centromeres. The counts argument is a numpy array\ncontaining the contact counts. The lengths is a 1D numpy vector containing\nthe number of bins associated to each chromosomes. As such, the shape of\ncounts ndarray should match the sum of the lengths vector.\nIn addition, we provide the resolution of the data. Here, the data provided\nis at 10kb.\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "centromeres = centromeres_calls.centromeres_calls(\n counts, lengths,\n resolution=10000)" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "Normalize the data for the sake of visualization\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "counts = iced.filter.filter_low_counts(counts, percentage=0.04)\ncounts = iced.normalization.ICE_normalization(counts)" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "And remove the intra chromosomal for the sake of visualization\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "mask = iced.utils.get_intra_mask(lengths)\ncounts[mask] = np.nan" ], "outputs": [], "metadata": { "collapsed": false } }, { "source": [ "In order to visualize the position of centromeres, we need to map the\ncentromeres' position to the position in the ndarray.\n\n" ], "cell_type": "markdown", "metadata": {} }, { "execution_count": null, "cell_type": "code", "source": [ "centro = centromeres / 10000\ncentro[1:] += lengths.cumsum()[:-1]\n\nfig, ax = plt.subplots()\nax.matshow(counts, cmap=\"RdBu\", norm=colors.LogNorm())\n[ax.axhline(i, color=\"#000000\", linestyle=\"--\") for i in centro]\n[ax.axvline(i, color=\"#000000\", linestyle=\"--\") for i in centro]" ], "outputs": [], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "Python 2", "name": "python2", "language": "python" }, "language_info": { "mimetype": "text/x-python", "nbconvert_exporter": "python", "name": "python", "file_extension": ".py", "version": "2.7.12", "pygments_lexer": "ipython2", "codemirror_mode": { "version": 2, "name": "ipython" } } } }