Analyze a sharded dataset#

import lamindb as ln
import lnschema_bionty as lb

ln.track()
💡 loaded instance: testuser1/test-facs (lamindb 0.54.2)
💡 notebook imports: lamindb==0.54.2 lnschema_bionty==0.31.2 scanpy==1.9.5
💡 Transform(id='zzJzdgJ763Dyz8', name='Analyze a sharded dataset', short_name='facs3', version='0', type=notebook, updated_at=2023-09-27 19:04:03, created_by_id='DzTjkKse')
💡 Run(id='DRCTFuwx8bCnWhtb0XV7', run_at=2023-09-27 19:04:03, transform_id='zzJzdgJ763Dyz8', created_by_id='DzTjkKse')
ln.Dataset.filter().df()
name description version hash reference reference_type transform_id run_id file_id initial_version_id updated_at created_by_id
id
8RZdIbll16NTrAwo7lRL My versioned FACS dataset None 1 fnzTGHE8BlkiMMIqHLDjyA None None OWuTtS4SAponz8 6LxzHJKBOJu5s56VPocZ 8RZdIbll16NTrAwo7lRL None 2023-09-27 19:03:42 DzTjkKse
8RZdIbll16NTrAwo7l1h My versioned FACS dataset None 2 H2N4gXSjQN7Qy3LOcETW None None SmQmhrhigFPLz8 iKuM6oyOKGVBDvx0YQ7P None 8RZdIbll16NTrAwo7lRL 2023-09-27 19:03:53 DzTjkKse
dataset = ln.Dataset.filter(name="My versioned FACS dataset", version="2").one()
adata = dataset.load()
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1838: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

The AnnData has the reference to the individual files in the .obs annotations:

adata.obs.file_id.cat.categories
Index(['8RZdIbll16NTrAwo7lRL', 'RY2suClyjVaJf7WYC0SH'], dtype='object')

By default, the intersection of features is used:

adata.var.index
Index(['CD57', 'Cd19', 'Cd4', 'CD8', 'CD3', 'CD27', 'Cd14', 'Ccr7', 'CD127',
       'CD28'],
      dtype='object')

Let us create a plot:

markers = lb.CellMarker.lookup()
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name, save="_cd14")
filepath = "figures/pca_cd14"
WARNING: saving figure to file figures/pca_cd14.pdf
https://d33wubrfki0l68.cloudfront.net/6e47b2a90ee1573df9c36d67be4db88d37f0c8e6/d37be/_images/3de88ed8d1398feb1e5ecb2d51b0dd7652c5a8ea3f788809935b77065838ff14.png
file = ln.File("./figures/pca_cd14.pdf", description="My result on CD14")
file.save()
file.view_flow()
https://d33wubrfki0l68.cloudfront.net/081382525d44be51e0acb277af23a43684a5b8f3/3e7dc/_images/3da488b78650e98bf54f2e6cf21a6e0337d3ed3604cc07d17fa3d39a5860c7e5.svg
# clean up test instance
!lamin delete --force test-facs
!rm -r test-flow
💡 deleting instance testuser1/test-facs
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-facs.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs
rm: cannot remove 'test-flow': No such file or directory