How to use the RNAlysis analysis reports
+How to use the RNAlysis analysis reports
Understanding the Graph
+Understanding the Graph
Each node represents a data table, gene set, or function call. Arrows show the analysis workflow direction.
Node Details at a Glance
+Node Details at a Glance
Hover over a node to see a preview and list of parameters associated with that step in the analysis.
Accessing Associated Files
+Accessing Associated Files
Click on the link in the hover popup to open the file associated with that node.
Fitting the Graph to View
+Fitting the Graph to View
Click the box-shaped button on the bottom-right to fit the entire graph into your current view.
Tracing Analysis Paths
+Tracing Analysis Paths
Click on a node to highlight the path of analysis that led to it. Click again anywhere on the graph to reset the view.
Filtering by Node Type
+Filtering by Node Type
Click on a node type in the legend to highlight all nodes of that type across the graph. Click again to reset the view.
API Reference
+API Reference
Warning
Private functions are not meant to be used out of context, and doing so may lead to unexpected results.
rnalysis.fastq module
+rnalysis.fastq module
The fastq module provides a unified programmatic interface to external tools that process FASTQ files. Those currently include the CutAdapt adapter-trimming tool, the kallisto RNA-sequencing quantification tool, the bowtie2 alignment tool, and the featureCounts feature counting tool.
@@ -134,7 +134,7 @@rnalysis.fastq moduleBases: _FASTQPipeline
-
-_func_signature(func: function, args: tuple, kwargs: dict)
+_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a string functions signature for the given function and arguments.
- Parameters:
@@ -175,7 +175,7 @@ rnalysis.fastq module
-
-_readable_func_signature(func: function, args: tuple, kwargs: dict)
+_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters:
@@ -196,7 +196,7 @@ rnalysis.fastq module
-
-export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str]
+export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -216,7 +216,7 @@ rnalysis.fastq module
-
-classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline
+classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -263,7 +263,7 @@ rnalysis.fastq moduleBases: _FASTQPipeline
-
-_func_signature(func: function, args: tuple, kwargs: dict)
+_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a string functions signature for the given function and arguments.
- Parameters:
@@ -304,7 +304,7 @@ rnalysis.fastq module
-
-_readable_func_signature(func: function, args: tuple, kwargs: dict)
+_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters:
@@ -325,7 +325,7 @@ rnalysis.fastq module
-
-export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str]
+export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -345,7 +345,7 @@ rnalysis.fastq module
-
-classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline
+classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -392,7 +392,7 @@ rnalysis.fastq moduleBases: GenericPipeline
, ABC
-
-_func_signature(func: function, args: tuple, kwargs: dict)
+_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a string functions signature for the given function and arguments.
- Parameters:
@@ -433,7 +433,7 @@ rnalysis.fastq module
-
-_readable_func_signature(func: function, args: tuple, kwargs: dict)
+_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters:
@@ -454,7 +454,7 @@ rnalysis.fastq module
-
-export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str]
+export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -474,7 +474,7 @@ rnalysis.fastq module
-
-classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline
+classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters:
@@ -517,32 +517,32 @@ rnalysis.fastq module
-
-rnalysis.fastq._merge_kallisto_outputs(output_folder: Union[str, Path], new_sample_names: List[str])
+rnalysis.fastq._merge_kallisto_outputs(output_folder: str | Path, new_sample_names: List[str])
output a merged csv file of transcript estimated counts, and a merged csv file of transcript estimated TPMs.
-
-rnalysis.fastq.bowtie2_align_paired_end(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], index_file: Union[str, Path], bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto', 'smart']] = 'smart', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', mate_orientations: Literal['fwd-rev', 'rev-fwd', 'fwd-fwd'] = 'fwd-rev', min_fragment_length: NonNegativeInt = 0, max_fragment_length: PositiveInt = 500, allow_individual_alignment: bool = True, allow_disconcordant_alignment: bool = True, random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
+rnalysis.fastq.bowtie2_align_paired_end(r1_files: List[str], r2_files: List[str], output_folder: str | Path, index_file: str | Path, bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto', 'smart'] = 'smart', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', mate_orientations: Literal['fwd-rev', 'rev-fwd', 'fwd-fwd'] = 'fwd-rev', min_fragment_length: NonNegativeInt = 0, max_fragment_length: PositiveInt = 500, allow_individual_alignment: bool = True, allow_disconcordant_alignment: bool = True, random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
Align paired-end reads from FASTQ files to a reference sequence using the bowtie2 aligner. The FASTQ file pairs will be individually aligned, and the aligned SAM files will be saved in the output folder. You can read more about how bowtie2 works in the bowtie2 manual.
- Parameters:
-r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the aligned reads, as well as the log files, will be saved.
index_file (str or Path) – Path to a pre-built bowtie2 index of the target genome. Can either be downloaded from the bowtie2 website (menu on the right), or generated manually from FASTA files using the function ‘bowtie2_create_index’. Note that bowtie2 indices are composed of multiple files ending with the ‘.bt2’ suffix. All of those files should be in the same location. It is enough to specify the path to one of those files (for example, ‘path/to/index.1.bt2’), or to the main name of the index (for example, ‘path/to/index’).
bowtie2_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of bowtie2. For example: ‘C:/Program Files/bowtie2-2.5.1’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
mode ('end-to-end' or 'local' (default='end-to-end')) – determines the alignment mode of bowtie2. end-to-end mode will look for alignments involving all the read characters. local mode will allow ‘clipping’ of nucleotides from both sides of the read, if that maximizes the alignment score.
settings_preset ('very-sensitive', 'sensitive', 'fast', or 'very-fast' (default='very-sensitive')) – determines the alignment sensitivity preset. Higher sensitivity will result in more accurate alignments, but will take longer to calculate. You can read more about the settings presets in the bowtie2 manual.
ignore_qualities (bool (default=False)) – if True, bowtie2 will ignore the qualities of the reads and treat them all as maximum quality.
quality_score_type ('phred33', 'phred64', 'solexa-quals', or 'int-quals' (default='phred33')) – determines the encoding type of the read quality scores. Most modern sequencing setups use phred+33.
-mate_orientations ('fwd-rev', 'rev-fwd', or 'fwd-fwd' (default='fwd-rev')) –
+mate_orientations ('fwd-rev', 'rev-fwd', or 'fwd-fwd' (default='fwd-rev'))
min_fragment_length (int >= 0 (default=0)) – The minimum fragment length for valid paired-end alignments.
max_fragment_length (int > 0 (default=500)) – The maximum fragment length for valid paired-end alignments.
-allow_individual_alignment (bool (default=) –
-allow_disconcordant_alignment (bool (default=) –
+allow_individual_alignment (bool (default=)
+allow_disconcordant_alignment (bool (default=)
random_seed (int >=0 (default=0)) – determines the seed for pseudo-random number generator.
threads (int > 0 (default=1)) – number of threads to run bowtie2-build on. More threads will generally make index building faster.
@@ -552,7 +552,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.bowtie2_align_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], index_file: Union[str, Path], bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
+rnalysis.fastq.bowtie2_align_single_end(fastq_folder: str | Path, output_folder: str | Path, index_file: str | Path, bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
Align single-end reads from FASTQ files to a reference sequence using the bowtie2 aligner. The FASTQ files will be individually aligned, and the aligned SAM files will be saved in the output folder. You can read more about how bowtie2 works in the bowtie2 manual.
- Parameters:
@@ -562,7 +562,7 @@ rnalysis.fastq moduleindex_file (str or Path) –
Path to a pre-built bowtie2 index of the target genome. Can either be downloaded from the bowtie2 website (menu on the right), or generated manually from FASTA files using the function ‘bowtie2_create_index’. Note that bowtie2 indices are composed of multiple files ending with the ‘.bt2’ suffix. All of those files should be in the same location. It is enough to specify the path to one of those files (for example, ‘path/to/index.1.bt2’), or to the main name of the index (for example, ‘path/to/index’).
bowtie2_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of bowtie2. For example: ‘C:/Program Files/bowtie2-2.5.1’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
mode ('end-to-end' or 'local' (default='end-to-end')) – determines the alignment mode of bowtie2. end-to-end mode will look for alignments involving all the read characters. local mode will allow ‘clipping’ of nucleotides from both sides of the read, if that maximizes the alignment score.
settings_preset ('very-sensitive', 'sensitive', 'fast', or 'very-fast' (default='very-sensitive')) – determines the alignment sensitivity preset. Higher sensitivity will result in more accurate alignments, but will take longer to calculate. You can read more about the settings presets in the bowtie2 manual.
ignore_qualities (bool (default=False)) – if True, bowtie2 will ignore the qualities of the reads and treat them all as maximum quality.
@@ -576,12 +576,12 @@ rnalysis.fastq module
-
-rnalysis.fastq.bowtie2_create_index(genome_fastas: List[Union[str, Path]], output_folder: Union[str, Path], index_name: Union[str, Literal['auto']] = 'auto', bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', random_seed: Optional[NonNegativeInt] = None, threads: PositiveInt = 1)
+rnalysis.fastq.bowtie2_create_index(genome_fastas: List[str | Path], output_folder: str | Path, index_name: str | Literal['auto'] = 'auto', bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', random_seed: NonNegativeInt | None = None, threads: PositiveInt = 1)
builds a bowtie index from FASTA formatted files of target sequences (genome). The index files will be saved in the same folder as your first FASTA file, with the .bt2 suffix. Be aware that there are pre-built bowtie2 indices for popular model organisms. These can be downloaded from the bowtie2 website (from menu on the right).
- Parameters:
-genome_fastas (list of str or Path) – Path to the FASTA file/files which contain reference sequences to be aligned to.
+genome_fastas (list of str or Path) – Path to the FASTA file/files which contain reference sequences to be aligned to.
output_folder (str or Path) – Path to the folder in which the bowtie2 index files will be saved.
index_name (str or 'auto' (default='auto')) – The basename of the index files. bowtie2 will create files named index_name.1.bt2, index_name.2.bt2, index_name.3.bt2, index_name.4.bt2, index_name.rev.1.bt2, and index_name.rev.2.bt2. if index_name=’auto’, the index name used will be the stem of the first supplied genome FASTA file (for example: if the first genome FASTA file is ‘path/to/genome.fa.gz’, the index name will be ‘genome’).
bowtie2_installation_folder – Path to the installation folder of bowtie2. For example:
@@ -598,7 +598,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.convert_sam_format(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam')
+rnalysis.fastq.convert_sam_format(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam')
Convert SAM files to BAM files or vice versa using Picard SamFormatConverter.
@@ -615,7 +615,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.fastq_to_sam_paired(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Union[Literal['auto'], Literal['phred33', 'phred64', 'solexa-quals', 'int-quals']] = 'auto')
+rnalysis.fastq.fastq_to_sam_paired(r1_files: List[str], r2_files: List[str], output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Literal['auto'] | Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'auto')
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Returns:
@@ -637,7 +637,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.fastq_to_sam_single(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Union[Literal['auto'], Literal['phred33', 'phred64', 'solexa-quals', 'int-quals']] = 'auto')
+rnalysis.fastq.fastq_to_sam_single(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Literal['auto'] | Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'auto')
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Returns:
@@ -659,7 +659,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.featurecounts_paired_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, require_both_mapped: bool = True, count_chimeric_fragments: bool = False, min_fragment_length: NonNegativeInt = 50, max_fragment_length: Optional[PositiveInt] = 600, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
+rnalysis.fastq.featurecounts_paired_end(input_folder: str | Path, output_folder: str | Path, gtf_file: str | Path, gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, require_both_mapped: bool = True, count_chimeric_fragments: bool = False, min_fragment_length: NonNegativeInt = 50, max_fragment_length: PositiveInt | None = 600, report_read_assignment: Literal['bam', 'sam', 'core'] | None = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
Assign mapped paired-end sequencing reads to specified genomic features using RSubread featureCounts. Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.
- Parameters:
@@ -670,7 +670,7 @@ rnalysis.fastq modulegtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.
gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair aligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair aligns to the reverse strand of a transcript.
min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria.
count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).
@@ -697,7 +697,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.featurecounts_single_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
+rnalysis.fastq.featurecounts_single_end(input_folder: str | Path, output_folder: str | Path, gtf_file: str | Path, gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, report_read_assignment: Literal['bam', 'sam', 'core'] | None = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
Assign mapped single-end sequencing reads to specified genomic features using RSubread featureCounts.
Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.
@@ -709,7 +709,7 @@ rnalysis.fastq modulegtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.
gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the alphabetical order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the alphabetical order of the files in the directory.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the reads align to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the reads align to the reverse strand of a transcript.
min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted.
count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).
@@ -732,7 +732,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.find_duplicates(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', duplicate_handling: Literal['mark', 'remove_optical', 'remove_all'] = 'remove_all', duplicate_scoring_strategy: Literal['reference_length', 'sum_of_base_qualities', 'random'] = 'sum_of_base_qualities', optical_duplicate_pixel_distance: int = 100)
+rnalysis.fastq.find_duplicates(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', duplicate_handling: Literal['mark', 'remove_optical', 'remove_all'] = 'remove_all', duplicate_scoring_strategy: Literal['reference_length', 'sum_of_base_qualities', 'random'] = 'sum_of_base_qualities', optical_duplicate_pixel_distance: int = 100)
Find duplicate reads in SAM/BAM files using Picard MarkDuplicates.
- Parameters:
@@ -740,7 +740,7 @@ rnalysis.fastq moduleinput_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to sort.
output_folder (str or Path) – Path to a folder in which the sorted SAM/BAM files will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
output_format ('sam' or 'bam' (default='bam')) – Format of the output file.
duplicate_handling ('mark', 'remove_optical', or 'remove_all' (default='remove_all')) – How to handle detected duplicate reads. If ‘mark’, duplicate reads will be marked with a 1024 flag. If ‘remove_optical’, ‘optical’ duplicates and other duplicates that appear to have arisen from the sequencing process instead of the library preparation process will be removed. If ‘remove_all’, all duplicate reads will be removed.
duplicate_scoring_strategy ('reference_length', 'sum_of_base_qualities', or 'random' (default='sum_of_base_qualities')) – How to score duplicate reads. If ‘reference_length’, the length of the reference sequence will be used. If ‘sum_of_base_qualities’, the sum of the base qualities will be used.
@@ -752,7 +752,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.kallisto_create_index(transcriptome_fasta: Union[str, Path], kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', kmer_length: PositiveInt = 31, make_unique: bool = False)
+rnalysis.fastq.kallisto_create_index(transcriptome_fasta: str | Path, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', kmer_length: PositiveInt = 31, make_unique: bool = False)
builds a kallisto index from a FASTA formatted file of target sequences (transcriptome). The index file will be saved in the same folder as your FASTA file, with the .idx suffix. Be aware that there are pre-built kallisto indices for popular model organisms. These can be downloaded from the kallisto transcriptome indices site.
- Parameters:
@@ -768,21 +768,21 @@ rnalysis.fastq module
-
-rnalysis.fastq.kallisto_quantify_paired_end(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], index_file: Union[str, Path], gtf_file: Union[str, Path], kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto', 'smart']] = 'smart', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: Optional[PositiveInt] = None, **legacy_args) CountFilter
+rnalysis.fastq.kallisto_quantify_paired_end(r1_files: List[str], r2_files: List[str], output_folder: str | Path, index_file: str | Path, gtf_file: str | Path, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto', 'smart'] = 'smart', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: PositiveInt | None = None, **legacy_args) CountFilter
Quantify transcript abundance in paired-end mRNA sequencing data using kallisto. The FASTQ file pairs will be individually quantified and saved in the output folder, each in its own sub-folder. Alongside these files, three .csv files will be saved: a per-transcript count estimate table, a per-transcript TPM estimate table, and a per-gene scaled output table. The per-gene scaled output table is generated using the scaledTPM method (scaling the TPM estimates up to the library size) as described by Soneson et al 2015 and used in the tximport R package. This table format is considered un-normalized for library size, and can therefore be used directly by count-based statistical inference tools such as DESeq2.
RNAlysis will return this table once the analysis is finished.
- Parameters:
-summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm')) –
-r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm'))
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the quantified results, as well as the log files, will be saved. The individual output of each pair of FASTQ files will reside in a different sub-folder within the output folder, and a summarized results table will be saved in the output folder itself.
index_file (str or Path) –
Path to a pre-built kallisto index of the target transcriptome. Can either be downloaded from the kallisto transcriptome indices site, or generated manually from a FASTA file using the function kallisto_create_index.
gtf_file (str or Path) – Path to a GTF annotation file. This file will be used to map per-transcript abundances to per-gene estimated counts. The transcript names in the GTF files should match the ones in the index file - we recommend downloading cDNA FASTA/index files and GTF files from the same data source.
kallisto_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of kallisto. For example: ‘C:/Program Files/kallisto’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair pseudoaligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair pseudoaligns to the reverse strand of a transcript.
summation_method – Determines the method used to sum the transcript-level abundances to gene-level abundances. ‘scaled_tpm’ sums the transcript TPM estimates the gene level, and then scales then to the library size. ‘raw’ sums the transcript estimated counts to the gene level without scaling.
learn_bias (bool (default=False)) – if True, kallisto learns parameters for a model of sequences specific bias and corrects the abundances accordlingly. Note that this feature is not supported by kallisto versions beyond 0.48.0.
@@ -795,13 +795,13 @@ rnalysis.fastq module
-
-rnalysis.fastq.kallisto_quantify_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], index_file: Union[str, Path], gtf_file: Union[str, Path], average_fragment_length: float, stdev_fragment_length: float, kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: Optional[PositiveInt] = None, **legacy_args) CountFilter
+rnalysis.fastq.kallisto_quantify_single_end(fastq_folder: str | Path, output_folder: str | Path, index_file: str | Path, gtf_file: str | Path, average_fragment_length: float, stdev_fragment_length: float, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: PositiveInt | None = None, **legacy_args) CountFilter
Quantify transcript abundance in single-end mRNA sequencing data using kallisto. The FASTQ files will be individually quantified and saved in the output folder, each in its own sub-folder. Alongside these files, three .csv files will be saved: a per-transcript count estimate table, a per-transcript TPM estimate table, and a per-gene scaled output table. The per-gene scaled output table is generated using the scaledTPM method (scaling the TPM estimates up to the library size) as described by Soneson et al 2015 and used in the tximport R package. This table format is considered un-normalized for library size, and can therefore be used directly by count-based statistical inference tools such as DESeq2.
RNAlysis will return this table once the analysis is finished.
- Parameters:
-summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm')) –
+summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm'))
fastq_folder (str or Path) – Path to the folder containing the FASTQ files you want to quantify
output_folder (str/Path to an existing folder) – Path to a folder in which the quantified results, as well as the log files, will be saved. The individual output of each pair of FASTQ files will reside in a different sub-folder within the output folder, and a summarized results table will be saved in the output folder itself.
index_file (str or Path) –
Path to a pre-built kallisto index of the target transcriptome. Can either be downloaded from the kallisto transcriptome indices site, or generated manually from a FASTA file using the function kallisto_create_index.
@@ -810,7 +810,7 @@ rnalysis.fastq moduleaverage_fragment_length (float > 0) – Estimated average fragment length. Typical Illumina libraries produce fragment lengths ranging from 180–200bp, but it’s best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer.
stdev_fragment_length (float > 0) – Estimated standard deviation of fragment length. Typical Illumina libraries produce fragment lengths ranging from 180–200bp, but it’s best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer.
kallisto_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of kallisto. For example: ‘C:/Program Files/kallisto’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair pseudoaligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair pseudoaligns to the reverse strand of a transcript.
summation_method – Determines the method used to sum the transcript-level abundances to gene-level abundances. ‘scaled_tpm’ sums the transcript TPM estimates the gene level, and then scales then to the library size. ‘raw’ sums the transcript estimated counts to the gene level without scaling.
learn_bias (bool (default=False)) – if True, kallisto learns parameters for a model of sequences specific bias and corrects the abundances accordlingly. Note that this feature is not supported by kallisto versions beyond 0.48.0.
@@ -823,7 +823,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.sam_to_fastq_paired(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: Optional[PositiveInt] = None, return_new_filenames: bool = False)
+rnalysis.fastq.sam_to_fastq_paired(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: PositiveInt | None = None, return_new_filenames: bool = False)
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
@@ -848,7 +848,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.sam_to_fastq_single(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: Optional[PositiveInt] = None)
+rnalysis.fastq.sam_to_fastq_single(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: PositiveInt | None = None)
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Parameters:
@@ -856,7 +856,7 @@ rnalysis.fastq moduleinput_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to convert.
output_folder (str or Path) – Path to a folder in which the converted FASTQ files will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
re_reverse_reads (bool (default=True)) – Re-reverse bases and qualities of reads with the negative-strand flag before writing them to FASTQ.
include_non_primary_alignments (bool (default=False)) – If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.
quality_trim (positive int or None (default=None)) – If enabled, End-trim reads using the phred/bwa quality trimming algorithm and this quality.
@@ -873,7 +873,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.shortstack_align_smallrna(fastq_folder: Union[str, Path], output_folder: Union[str, Path], genome_fasta: Union[str, Path], shortstack_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', known_rnas: Optional[Union[str, Path]] = None, trim_adapter: Optional[Union[str, Literal['autotrim']]] = None, autotrim_key: str = 'TCGGACCAGGCTTCATTCCCC', multimap_mode: Literal['fractional', 'unique', 'random'] = 'fractional', align_only: bool = False, show_secondary_alignments: bool = False, dicer_min_length: PositiveInt = 21, dicer_max_length: PositiveInt = 24, loci_file: Optional[Union[str, Path]] = None, locus: Optional[str] = None, search_microrna: Union[None, Literal['de-novo', 'known-rnas']] = 'known-rnas', strand_cutoff: Fraction = 0.8, min_coverage: float = 2, pad: PositiveInt = 75, threads: PositiveInt = 1)
+rnalysis.fastq.shortstack_align_smallrna(fastq_folder: str | Path, output_folder: str | Path, genome_fasta: str | Path, shortstack_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', known_rnas: str | Path | None = None, trim_adapter: str | Literal['autotrim'] | None = None, autotrim_key: str = 'TCGGACCAGGCTTCATTCCCC', multimap_mode: Literal['fractional', 'unique', 'random'] = 'fractional', align_only: bool = False, show_secondary_alignments: bool = False, dicer_min_length: PositiveInt = 21, dicer_max_length: PositiveInt = 24, loci_file: str | Path | None = None, locus: str | None = None, search_microrna: None | Literal['de-novo', 'known-rnas'] = 'known-rnas', strand_cutoff: Fraction = 0.8, min_coverage: float = 2, pad: PositiveInt = 75, threads: PositiveInt = 1)
Align small RNA single-end reads from FASTQ files to a reference sequence using the ShortStack aligner (version 4). ShortStack is currently not supported on computers running Windows.
- Parameters:
@@ -882,7 +882,7 @@ rnalysis.fastq moduleoutput_folder (str/Path to an existing folder) – Path to a folder in which the aligned reads, as well as the log files, will be saved.
genome_fasta (str or Path) – Path to the FASTA file which contain the reference sequences to be aligned to.
shortstack_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of ShortStack. For example: ‘/home/myuser/anaconda3/envs/myenv/bin’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
known_rnas (str, Path, or None (default=None)) – Path to FASTA-formatted file of known small RNAs. FASTA must be formatted such that a single RNA sequence is on one line only. ATCGUatcgu characters are acceptable. These RNAs are typically the sequences of known microRNAs. For instance, a FASTA file of mature miRNAs pulled from https://www.mirbase.org. Providing these data increases the accuracy of MIRNA locus identification.
trim_adapter (str, 'autotrim', or None (default=None)) – Determines whether ShortStack will attempt to trim the supplied reads. If trim_adapter is not provided (default), no trimming will be run. If trim_adapter is set to ‘autotrim’, ShortStack will automatically infer the 3’ adapter sequence of the untrimmed reads, and the uses that to coordinate read trimming. If trim_adapter is a DNA sequence, ShortStack will trim the reads using the given DNA sequence as the 3’ adapter.
autotrim_key (str (default="TCGGACCAGGCTTCATTCCCC" (miR166))) – A DNA sequence to use as a known suffix during the autotrim procedure. This parameter is used only if trim_adapter is set to ‘autotrim’. ShortStack’s autotrim discovers the 3’ adapter by scanning for reads that begin with the sequence given by autotrim_key. This should be the sequence of a small RNA that is known to be highly abundant in all the libraries. The default sequence is for miR166, a microRNA that is present in nearly all plants at high levels. For non-plant experiments, or if the default is not working well, consider providing an alternative to the default.
@@ -905,7 +905,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.sort_sam(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', sort_order: Literal['coordinate', 'queryname', 'duplicate'] = 'coordinate')
+rnalysis.fastq.sort_sam(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', sort_order: Literal['coordinate', 'queryname', 'duplicate'] = 'coordinate')
Sort SAM/BAM files using Picard SortSam.
@@ -922,20 +922,20 @@ rnalysis.fastq module
-
-rnalysis.fastq.trim_adapters_paired_end(r1_files: List[Union[str, Path]], r2_files: List[Union[str, Path]], output_folder: Union[str, Path], three_prime_adapters_r1: Union[None, str, List[str]], three_prime_adapters_r2: Union[None, str, List[str]], five_prime_adapters_r1: Union[None, str, List[str]] = None, five_prime_adapters_r2: Union[None, str, List[str]] = None, any_position_adapters_r1: Union[None, str, List[str]] = None, any_position_adapters_r2: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False)
+rnalysis.fastq.trim_adapters_paired_end(r1_files: List[str | Path], r2_files: List[str | Path], output_folder: str | Path, three_prime_adapters_r1: None | str | List[str], three_prime_adapters_r2: None | str | List[str], five_prime_adapters_r1: None | str | List[str] = None, five_prime_adapters_r2: None | str | List[str] = None, any_position_adapters_r1: None | str | List[str] = None, any_position_adapters_r2: None | str | List[str] = None, new_sample_names: List[str] | Literal['auto'] = 'auto', quality_trimming: NonNegativeInt | None = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: PositiveInt | None = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False)
Trim adapters from paired-end reads using CutAdapt.
- Parameters:
-r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.
-three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.
-three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.
-five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.
-five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.
-any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.
-any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.
+three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.
+three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.
+five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.
+five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.
+any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.
+any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.
quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.
trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.
minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.
@@ -947,7 +947,7 @@ rnalysis.fastq moduleallow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.
parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.
gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
@@ -955,16 +955,16 @@ rnalysis.fastq module
-
-rnalysis.fastq.trim_adapters_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], three_prime_adapters: Union[None, str, List[str]], five_prime_adapters: Union[None, str, List[str]] = None, any_position_adapters: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)
+rnalysis.fastq.trim_adapters_single_end(fastq_folder: str | Path, output_folder: str | Path, three_prime_adapters: None | str | List[str], five_prime_adapters: None | str | List[str] = None, any_position_adapters: None | str | List[str] = None, new_sample_names: List[str] | Literal['auto'] = 'auto', quality_trimming: NonNegativeInt | None = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: PositiveInt | None = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)
Trim adapters from single-end reads using CutAdapt.
- Parameters:
fastq_folder (str/Path to an existing folder) – Path to the folder containing your untrimmed FASTQ files
output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.
-three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.
-five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.
-any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.
+three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.
+five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.
+any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.
quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.
trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.
minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.
@@ -975,7 +975,7 @@ rnalysis.fastq moduleallow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.
parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.
gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.
@@ -983,7 +983,7 @@ rnalysis.fastq module
-
-rnalysis.fastq.validate_sam(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', verbose: bool = True, is_bisulfite_sequenced: bool = False)
+rnalysis.fastq.validate_sam(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', verbose: bool = True, is_bisulfite_sequenced: bool = False)
Validate SAM/BAM files using Picard ValidateSamFile.
- Parameters:
@@ -991,7 +991,7 @@ rnalysis.fastq moduleinput_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to validate.
output_folder (str or Path) – Path to a folder in which the validation reports will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
verbose (bool (default=True)) – If True, the validation report will be verbose. If False, the validation report will be a summary.
is_bisulfite_sequenced (bool (default=False)) – Indicates whether the SAM/BAM file consists of bisulfite sequenced reads. If so, C->T is not counted as en error in computer the value of the NM tag.
@@ -1007,13 +1007,13 @@ rnalysis.fastq module
-rnalysis.filtering module
+rnalysis.filtering module
This module can filter, normalize, intersect and visualize tabular data such as read counts and differential expression data.
Any tabular data saved in a csv format can be imported. Use this module to perform various filtering operations on your data, normalize your data, perform set operations (union, intersection, etc), run basic exploratory analyses and plots (such as PCA, clustergram, violin plots, scatter, etc), save the filtered data to your computer, and more.
When you save filtered/modified data, its new file name will include by default all of the operations performed on it, in the order they were performed, to allow easy traceback of your analyses.
-
-class rnalysis.filtering.CountFilter(fname: Union[str, Path, tuple], drop_columns: Union[str, List[str]] = None, is_normalized: bool = False, suppress_warnings: bool = False)
+class rnalysis.filtering.CountFilter(fname: str | Path | tuple, drop_columns: str | List[str] = None, is_normalized: bool = False, suppress_warnings: bool = False)
Bases: Filter
A class that receives a count matrix and can filter it according to various characteristics.
Attributes
@@ -1035,7 +1035,7 @@ rnalysis.filtering module
-
-_avg_subsamples(sample_grouping: GroupedColumns, function: Literal['mean', 'median', 'geometric_mean'] = 'mean', new_column_names: Union[Literal['auto'], Literal['display'], List[str]] = 'display')
+_avg_subsamples(sample_grouping: GroupedColumns, function: Literal['mean', 'median', 'geometric_mean'] = 'mean', new_column_names: Literal['auto'] | Literal['display'] | List[str] = 'display')
Avarages subsamples/replicates according to the specified sample list. Every member in the sample list should be either a name of a single sample (str), or a list of multiple sample names to be averaged (list).
- Parameters:
@@ -1104,7 +1104,7 @@ rnalysis.filtering module
-
-static _pca_plot(final_df: DataFrame, pc1_var: float, pc2_var: float, sample_grouping: GroupedColumns, labels: bool, title: str, title_fontsize: float, label_fontsize: float, tick_fontsize: float, proportional_axes: bool, plot_grid: bool, legend: Optional[List[str]]) Figure
+static _pca_plot(final_df: DataFrame, pc1_var: float, pc2_var: float, sample_grouping: GroupedColumns, labels: bool, title: str, title_fontsize: float, label_fontsize: float, tick_fontsize: float, proportional_axes: bool, plot_grid: bool, legend: List[str] | None) Figure
Internal method, used to plot the results from CountFilter.pca().
- Parameters:
@@ -1145,13 +1145,13 @@ rnalysis.filtering module
-
-_sort(by: Union[str, List[str]], ascending: Union[bool, List[bool]] = True, na_position: str = 'last')
+_sort(by: str | List[str], ascending: bool | List[bool] = True, na_position: str = 'last')
Sort the rows by the values of specified column or columns.
- Parameters:
-by (str or list of str) – Names of the column or columns to sort by.
-ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
+by (str or list of str) – Names of the column or columns to sort by.
+ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
na_position ('first' or 'last', default 'last') – If ‘first’, puts NaNs at the beginning; if ‘last’, puts NaNs at the end.
inplace (bool, default True) – If True, perform operation in-place. Otherwise, returns a sorted copy of the Filter object without modifying the original.
@@ -1164,13 +1164,13 @@ rnalysis.filtering module
-
-average_replicate_samples(sample_grouping: GroupedColumns, new_column_names: Union[Literal['auto'], List[str]] = 'auto', function: Literal['mean', 'median', 'geometric_mean'] = 'mean', inplace: bool = True) CountFilter
+average_replicate_samples(sample_grouping: GroupedColumns, new_column_names: Literal['auto'] | List[str] = 'auto', function: Literal['mean', 'median', 'geometric_mean'] = 'mean', inplace: bool = True) CountFilter
Average the expression values of gene expression for each group of replicate samples. Each group of samples (e.g. biological/technical replicates)
- Parameters:
-sample_grouping (nested list of column names) – grouping of the samples into conditions. Each grouping should containg all replicates of the same condition. Each condition will be averaged separately.
-new_column_names (list of str or 'auto' (default='auto') – names to be given to the columns in the new count matrix. Each new name should match a group of samples to be averaged. If `new_column_names`=’auto’, names will be generated automatically.
+sample_grouping (nested list of column names) – grouping of the samples into conditions. Each grouping should containg all replicates of the same condition. Each condition will be averaged separately.
+new_column_names (list of str or 'auto' (default='auto') – names to be given to the columns in the new count matrix. Each new name should match a group of samples to be averaged. If `new_column_names`=’auto’, names will be generated automatically.
function ('mean', 'median', or 'geometric_mean' (default='mean')) – the function which will be used to average the values within each group.
inplace (bool (default=True)) – If True (default), averaging will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected.
@@ -1180,7 +1180,7 @@ rnalysis.filtering module
-
-biotypes_from_gtf(gtf_path: Union[str, Path], attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', long_format: bool = False) DataFrame
+biotypes_from_gtf(gtf_path: str | Path, attribute_name: Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'] | str = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', long_format: bool = False) DataFrame
Returns a DataFrame describing the biotypes in the table and their count. The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.
- Parameters:
@@ -1199,7 +1199,7 @@ rnalysis.filtering module
-
-biotypes_from_ref_table(long_format: bool = False, ref: Union[str, Path, Literal['predefined']] = 'predefined') DataFrame
+biotypes_from_ref_table(long_format: bool = False, ref: str | Path | Literal['predefined'] = 'predefined') DataFrame
Returns a DataFrame describing the biotypes in the table and their count. The data about feature biotypes is drawn from a Biotype Reference Table supplied by the user.
@@ -1239,7 +1239,7 @@ rnalysis.filtering module
-
-box_plot(samples: Union[GroupedColumns, Literal['all']] = 'all', notch: bool = True, scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
+box_plot(samples: GroupedColumns | Literal['all'] = 'all', notch: bool = True, scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
Generates a box plot of the specified samples in the CountFilter object in log10 scale. Can plot both single samples and average multiple replicates. It is recommended to use this function on normalized values and not on absolute read values. The box indicates 25% and 75% percentiles, and the white dot indicates the median.
- Parameters:
@@ -1251,7 +1251,7 @@ rnalysis.filtering moduleReturn type:
@@ -1261,14 +1261,14 @@ rnalysis.filtering module
-
+
-
-clustergram(sample_names: Union[ColumnNames, Literal['all']] = 'all', metric: Union[Literal['Correlation', 'Cosine', 'Euclidean', 'Jaccard'], str] = 'Euclidean', linkage: Literal['Single', 'Average', 'Complete', 'Ward', 'Weighted', 'Centroid', 'Median'] = 'Average', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, tick_fontsize: float = 12, colormap: ColorMap = 'inferno', colormap_label: Union[Literal['auto'], str] = 'auto', cluster_columns: bool = True, log_transform: bool = True, z_score_rows: bool = False) Figure
+clustergram(sample_names: ColumnNames | Literal['all'] = 'all', metric: Literal['Correlation', 'Cosine', 'Euclidean', 'Jaccard'] | str = 'Euclidean', linkage: Literal['Single', 'Average', 'Complete', 'Ward', 'Weighted', 'Centroid', 'Median'] = 'Average', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, tick_fontsize: float = 12, colormap: ColorMap = 'inferno', colormap_label: Literal['auto'] | str = 'auto', cluster_columns: bool = True, log_transform: bool = True, z_score_rows: bool = False) Figure
Performs hierarchical clustering and plots a clustergram on the base-2 log of a given set of samples.
@@ -1314,11 +1314,11 @@ rnalysis.filtering module
-
-describe(percentiles: Union[float, List[float]] = (0.01, 0.25, 0.5, 0.75, 0.99)) DataFrame
+describe(percentiles: float | List[float] = (0.01, 0.25, 0.5, 0.75, 0.99)) DataFrame
Generate descriptive statistics that summarize the central tendency, dispersion and shape of the dataset’s distribution, excluding NaN values. For more information see the documentation of pandas.DataFrame.describe.
- Parameters:
-percentiles (list-like of floats (default=(0.01, 0.25, 0.5, 0.75, 0.99))) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
+percentiles (list-like of floats (default=(0.01, 0.25, 0.5, 0.75, 0.99))) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
- Returns:
Summary statistics of the dataset.
@@ -1374,7 +1374,7 @@ rnalysis.filtering module
-
-difference(*others: Union[Filter, set], return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
+difference(*others: Filter | set, return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
Keep only the features that exist in the first Filter object/set but NOT in the others. Can be done inplace on the first Filter object, or return a set/string of features.
- Parameters:
@@ -1412,16 +1412,16 @@ rnalysis.filtering module
-
-differential_expression_deseq2(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Union[Literal['auto'], Iterable[str]] = 'auto', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, return_design_matrix: bool = False, scaling_factors: Optional[Union[str, Path]] = None, cooks_cutoff: bool = True, return_code: bool = False) Tuple[DESeqFilter, ...]
+differential_expression_deseq2(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Literal['auto'] | Iterable[str] = 'auto', r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, return_design_matrix: bool = False, scaling_factors: str | Path | None = None, cooks_cutoff: bool = True, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the DESeq2 algorithm. The count matrix you are analyzing should be unnormalized (meaning, raw read counts). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
- Parameters:
design_matrix (str or Path) – path to a csv file containing the experiment’s design matrix. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix.
-comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
-lrt_factors (Iterable of factor names (default=tuple())) – optionally, specify factors to be tested using the likelihood ratio test (LRT). If the factors are a continuous variable, you can also specify the number of polynomial degree to fit.
-covariates (Iterable of covariate names (default=tuple())) – optionally, specify a list of continuous covariates to include in the analysis. The covariates should be column names in the design matrix. The reported fold change values correspond to the expected fold change for every increase of 1 unit in the covariate.
-model_factors (Iterable of factor names or 'auto' (default='auto')) – optionally, specify a list of factors to include in the differential expression model. If ‘auto’, all factors in the design matrix will be included.
+comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
+lrt_factors (Iterable of factor names (default=tuple())) – optionally, specify factors to be tested using the likelihood ratio test (LRT). If the factors are a continuous variable, you can also specify the number of polynomial degree to fit.
+covariates (Iterable of covariate names (default=tuple())) – optionally, specify a list of continuous covariates to include in the analysis. The covariates should be column names in the design matrix. The reported fold change values correspond to the expected fold change for every increase of 1 unit in the covariate.
+model_factors (Iterable of factor names or 'auto' (default='auto')) – optionally, specify a list of factors to include in the differential expression model. If ‘auto’, all factors in the design matrix will be included.
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
output_folder (str, Path, or None) – Path to a folder in which the analysis results, as well as the log files and R script used to generate them, will be saved. if output_folder is None, the results will not be saved to a specified directory.
return_design_matrix (bool (default=False)) – if True, the function will return the sanitized design matrix used in the analysis.
@@ -1436,13 +1436,13 @@ rnalysis.filtering module
-
-differential_expression_deseq2_simplified(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
+differential_expression_deseq2_simplified(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the DESeq2 algorithm. The simplified mode supports only pairwise comparisons. The count matrix you are analyzing should be unnormalized (meaning, raw read counts). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
- Parameters:
design_matrix (str or Path) – path to a csv file containing the experiment’s design matrix. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix.
-comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
+comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
output_folder (str, Path, or None) – Path to a folder in which the analysis results, as well as the log files and R script used to generate them, will be saved. if output_folder is None, the results will not be saved to a specified directory.
return_design_matrix (bool (default=False)) – if True, the function will return the sanitized design matrix used in the analysis.
@@ -1457,7 +1457,7 @@ rnalysis.filtering module
-
-differential_expression_limma_voom(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Union[Literal['auto'], Iterable[str]] = 'auto', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, random_effect: Optional[str] = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
+differential_expression_limma_voom(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Literal['auto'] | Iterable[str] = 'auto', r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, random_effect: str | None = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the Limma-Voom pipeline. The count matrix you are analyzing should be normalized (typically to Reads Per Million). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
@@ -1497,7 +1497,7 @@ rnalysis.filtering module
-
-differential_expression_limma_voom_simplified(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, random_effect: Optional[str] = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
+differential_expression_limma_voom_simplified(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, random_effect: str | None = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the Limma-Voom pipeline. The simplified mode supports only pairwise comparisons. The count matrix you are analyzing should be normalized (typically to Reads Per Million). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
@@ -1556,7 +1556,7 @@ rnalysis.filtering module
- Parameters:
-columns (str or list of str) – The names of the column/columns to be dropped fro mthe table.
+columns (str or list of str) – The names of the column/columns to be dropped fro mthe table.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1568,7 +1568,7 @@ rnalysis.filtering module
-
-enhanced_box_plot(samples: Union[GroupedColumns, Literal['all']] = 'all', scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
+enhanced_box_plot(samples: GroupedColumns | Literal['all'] = 'all', scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
Generates an enhanced box-plot of the specified samples in the CountFilter object in log10 scale. Can plot both single samples and average multiple replicates. It is recommended to use this function on normalized values and not on absolute read values. The box indicates 25% and 75% percentiles, and the white dot indicates the median.
-
-filter_biotype_from_gtf(gtf_path: Union[str, Path], biotype: Union[Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'], str, List[str]] = 'protein_coding', attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', opposite: bool = False, inplace: bool = True)
+filter_biotype_from_gtf(gtf_path: str | Path, biotype: Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'] | str | List[str] = 'protein_coding', attribute_name: Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'] | str = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', opposite: bool = False, inplace: bool = True)
Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.
- Parameters:
gtf_path (str or Path) – Path to your GTF (Gene transfer format) file. The file should match the type of gene names/IDs you use in your table, and should contain an attribute describing biotype.
-biotype (str or list of strings) – the biotypes which will not be filtered out.
+biotype (str or list of strings) – the biotypes which will not be filtered out.
attribute_name (str (default='gene_biotype')) – name of the attribute in your GTF file that describes feature biotype.
feature_type ('gene' or 'transcript' (default='gene')) – determined whether the features/rows in your data table describe individual genes or transcripts.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
@@ -1614,12 +1614,12 @@ rnalysis.filtering module
-
-filter_biotype_from_ref_table(biotype: Union[Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'], str, List[str]] = 'protein_coding', ref: Union[str, Path, Literal['predefined']] = 'predefined', opposite: bool = False, inplace: bool = True)
+filter_biotype_from_ref_table(biotype: Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'] | str | List[str] = 'protein_coding', ref: str | Path | Literal['predefined'] = 'predefined', opposite: bool = False, inplace: bool = True)
Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a Biotype Reference Table supplied by the user.
- Parameters:
-biotype (string or list of strings) – the biotypes which will not be filtered out.
+biotype (string or list of strings) – the biotypes which will not be filtered out.
ref – Name of the biotype reference file used to determine biotypes. Default is the path defined by the user in the settings.yaml file.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1645,12 +1645,12 @@ rnalysis.filtering module
-
-filter_by_attribute(attributes: Union[str, List[str]] = None, mode: Literal['union', 'intersection'] = 'union', ref: Union[str, Path, Literal['predefined']] = 'predefined', opposite: bool = False, inplace: bool = True)
+filter_by_attribute(attributes: str | List[str] = None, mode: Literal['union', 'intersection'] = 'union', ref: str | Path | Literal['predefined'] = 'predefined', opposite: bool = False, inplace: bool = True)
Filters features according to user-defined attributes from an Attribute Reference Table. When multiple attributes are given, filtering can be done in ‘union’ mode (where features that belong to at least one attribute are not filtered out), or in ‘intersection’ mode (where only features that belong to ALL attributes are not filtered out). To learn more about user-defined attributes and Attribute Reference Tables, read the user guide.
- Parameters:
-attributes (string or list of strings, which are column titles in the user-defined Attribute Reference Table.) – attributes to filter by.
+attributes (string or list of strings, which are column titles in the user-defined Attribute Reference Table.) – attributes to filter by.
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
ref (str or pathlib.Path (default='predefined')) – filename/path of the attribute reference table to be used as reference.
opposite (bool (default=False)) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
@@ -1701,12 +1701,12 @@ rnalysis.filtering module
-
-filter_by_go_annotations(go_ids: Union[str, List[str]], mode: Literal['union', 'intersection'] = 'union', organism: Union[str, int, Literal['auto'], Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', propagate_annotations: bool = True, evidence_types: Union[Literal['any', 'experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'], Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']]] = 'any', excluded_evidence_types: Union[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'], Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']]] = (), databases: Union[str, Iterable[str]] = 'any', excluded_databases: Union[str, Iterable[str]] = (), qualifiers: Union[Literal['any', 'not', 'contributes_to', 'colocalizes_with'], Iterable[Literal['not', 'contributes_to', 'colocalizes_with']]] = 'any', excluded_qualifiers: Union[Literal['not', 'contributes_to', 'colocalizes_with'], Iterable[Literal['not', 'contributes_to', 'colocalizes_with']]] = 'not', opposite: bool = False, inplace: bool = True)
+filter_by_go_annotations(go_ids: str | List[str], mode: Literal['union', 'intersection'] = 'union', organism: str | int | Literal['auto'] | Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', propagate_annotations: bool = True, evidence_types: Literal['any', 'experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'] | Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']] = 'any', excluded_evidence_types: Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'] | Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']] = (), databases: str | Iterable[str] = 'any', excluded_databases: str | Iterable[str] = (), qualifiers: Literal['any', 'not', 'contributes_to', 'colocalizes_with'] | Iterable[Literal['not', 'contributes_to', 'colocalizes_with']] = 'any', excluded_qualifiers: Literal['not', 'contributes_to', 'colocalizes_with'] | Iterable[Literal['not', 'contributes_to', 'colocalizes_with']] = 'not', opposite: bool = False, inplace: bool = True)
Filters genes according to GO annotations, keeping only genes that are annotated with a specific GO term. When multiple GO terms are given, filtering can be done in ‘union’ mode (where genes that belong to at least one GO term are not filtered out), or in ‘intersection’ mode (where only genes that belong to ALL GO terms are not filtered out).
- Parameters:
-go_ids (str or list of str) –
+go_ids (str or list of str)
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
@@ -1740,12 +1740,12 @@ rnalysis.filtering module
-
-filter_by_kegg_annotations(kegg_ids: Union[str, List[str]], mode: Literal['union', 'intersection'] = 'union', organism: Union[str, int, Literal['auto'], Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', opposite: bool = False, inplace: bool = True)
+filter_by_kegg_annotations(kegg_ids: str | List[str], mode: Literal['union', 'intersection'] = 'union', organism: str | int | Literal['auto'] | Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', opposite: bool = False, inplace: bool = True)
Filters genes according to KEGG pathways, keeping only genes that belong to specific KEGG pathway. When multiple KEGG IDs are given, filtering can be done in ‘union’ mode (where genes that belong to at least one pathway are not filtered out), or in ‘intersection’ mode (where only genes that belong to ALL pathways are not filtered out).
- Parameters:
-kegg_ids (str or list of str) – the KEGG pathway IDs according to which the table will be filtered. An example for a legal KEGG pathway ID would be ‘path:cel04020’ for the C. elegans calcium signaling pathway.
+kegg_ids (str or list of str) – the KEGG pathway IDs according to which the table will be filtered. An example for a legal KEGG pathway ID would be ‘path:cel04020’ for the C. elegans calcium signaling pathway.
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
@@ -1763,12 +1763,12 @@ rnalysis.filtering module
-
-filter_by_row_name(row_names: Union[str, List[str]], opposite: bool = False, inplace: bool = True)
+filter_by_row_name(row_names: str | List[str], opposite: bool = False, inplace: bool = True)
Filter out specific rows from the table by their name (index).
- Parameters:
-row_names (str or list of str) – list of row names to be removed from the table.
+row_names (str or list of str) – list of row names to be removed from the table.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1852,7 +1852,7 @@ rnalysis.filtering module
-
-filter_missing_values(columns: Union[ColumnNames, Literal['all']] = 'all', opposite: bool = False, inplace: bool = True)
+filter_missing_values(columns: ColumnNames | Literal['all'] = 'all', opposite: bool = False, inplace: bool = True)
Remove all rows whose values in the specified columns are missing (NaN).
@@ -1915,14 +1915,14 @@ rnalysis.filtering module
-
-filter_top_n(by: ColumnNames, n: PositiveInt = 100, ascending: Union[bool, List[bool]] = True, na_position: str = 'last', opposite: bool = False, inplace: bool = True)
+filter_top_n(by: ColumnNames, n: PositiveInt = 100, ascending: bool | List[bool] = True, na_position: str = 'last', opposite: bool = False, inplace: bool = True)
Sort the rows by the values of specified column or columns, then keep only the top ‘n’ rows.
- Parameters:
-by (name of column/columns (str/List[str])) – Names of the column or columns to sort and then filter by.
+by (name of column/columns (str/List[str])) – Names of the column or columns to sort and then filter by.
n (int) – How many features to keep in the Filter object.
-ascending (bool or list of bools (default=True)) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
+ascending (bool or list of bools (default=True)) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
na_position ('first' or 'last', default 'last') – If ‘first’, puts NaNs at the beginning; if ‘last’, puts NaNs at the end.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1952,7 +1952,7 @@ rnalysis.filtering module
-
-find_paralogs_ensembl(organism: Union[Literal['auto'], str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_percent_identity: bool = True)
+find_paralogs_ensembl(organism: Literal['auto'] | str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_percent_identity: bool = True)
Find paralogs within the same species using the Ensembl database.
- Parameters:
@@ -1970,7 +1970,7 @@ rnalysis.filtering module
-
-find_paralogs_panther(organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto')
+find_paralogs_panther(organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto')
Find paralogs within the same species using the PantherDB database.
- Parameters:
@@ -1997,8 +1997,8 @@ rnalysis.filtering module
- Parameters:
-numerator (str, or list of strs) – the CountFilter columns to be used as the numerator. If multiple arguments are given in a list, they will be averaged.
-denominator (str, or list of strs) – the CountFilter columns to be used as the denominator. If multiple arguments are given in a list, they will be averaged.
+numerator (str, or list of strs) – the CountFilter columns to be used as the numerator. If multiple arguments are given in a list, they will be averaged.
+denominator (str, or list of strs) – the CountFilter columns to be used as the denominator. If multiple arguments are given in a list, they will be averaged.
numer_name (str or 'default') – name to give the numerator condition. If ‘default’, the name will be generarated automatically from the names of numerator columns.
denom_name (str or 'default') – name to give the denominator condition. If ‘default’, the name will be generarated automatically from the names of denominator columns.
@@ -2198,7 +2198,7 @@ rnalysis.filtering module
-
-intersection(*others: Union[Filter, set], return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
+intersection(*others: Filter | set, return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
Keep only the features that exist in ALL of the given Filter objects/sets. Can be done either inplace on the first Filter object, or return a set/string of features.
- Parameters:
@@ -2232,18 +2232,18 @@ rnalysis.filtering module
-
-ma_plot(ref_column: Union[Literal['auto'], ColumnName] = 'auto', columns: Union[ColumnNames, Literal['all']] = 'all', split_plots: bool = False, title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: Union[float, Literal['auto']] = 'auto', tick_fontsize: float = 12) List[Figure]
+ma_plot(ref_column: Literal['auto'] | ColumnName = 'auto', columns: ColumnNames | Literal['all'] = 'all', split_plots: bool = False, title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float | Literal['auto'] = 'auto', tick_fontsize: float = 12) List[Figure]
Generates M-A (log-ratio vs. log-intensity) plots for selected columns in the dataset. This plot is particularly useful for indicating whether a dataset is properly normalized.
- Parameters:
-ref_column (name of a column or 'auto' (default='auto')) – the column to be used as reference for MA plot. If ‘auto’, then the reference column will be chosen automatically to be the column whose upper quartile is closest to the mean upper quartile.
-columns (str or list of str) – A list of the column names to generate an MA plot for.
+ref_column (name of a column or 'auto' (default='auto')) – the column to be used as reference for MA plot. If ‘auto’, then the reference column will be chosen automatically to be the column whose upper quartile is closest to the mean upper quartile.
+columns (str or list of str) – A list of the column names to generate an MA plot for.
split_plots (bool (default=False)) – if True, each individual MA plot will be plotted in its own Figure. Otherwise, all MA plots will be plotted on the same Figure.
title (str or 'auto' (default='auto')) – The title of the plot. If ‘auto’, a title will be generated automatically.
title_fontsize (float (default=30)) – determines the font size of the graph title.
label_fontsize (float (default=15)
-:param tick_fontsize: determines the font size of the X and Y tick labels.) – determines the font size of the X and Y axis labels.
+:param tick_fontsize: determines the font size of the X and Y tick labels.) – determines the font size of the X and Y axis labels.
- Return type:
@@ -2254,7 +2254,7 @@ rnalysis.filtering module
-
-majority_vote_intersection(*others: Union[Filter, set], majority_threshold: float = 0.5, return_type: Literal['set', 'str'] = 'set')
+majority_vote_intersection(*others: Filter | set, majority_threshold: float = 0.5, return_type: Literal['set', 'str'] = 'set')
Returns a set/string of the features that appear in at least (majority_threhold * 100)% of the given Filter objects/sets. Majority-vote intersection with majority_threshold=0 is equivalent to Union. Majority-vote intersection with majority_threshold=1 is equivalent to Intersection.
- Parameters:
@@ -2286,7 +2286,7 @@ rnalysis.filtering module
-
-map_orthologs_ensembl(map_to_organism: Union[str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']], map_from_organism: Union[Literal['auto'], str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_percent_identity: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
+map_orthologs_ensembl(map_to_organism: str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'], map_from_organism: Literal['auto'] | str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_percent_identity: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the Ensembl database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters:
@@ -2308,7 +2308,7 @@ rnalysis.filtering module
-
-map_orthologs_orthoinspector(map_to_organism: Union[str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']], map_from_organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
+map_orthologs_orthoinspector(map_to_organism: str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'], map_from_organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the OrthoInspector database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters:
@@ -2329,7 +2329,7 @@ rnalysis.filtering module
-
-map_orthologs_panther(map_to_organism: Union[str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']], map_from_organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_least_diverged: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
+map_orthologs_panther(map_to_organism: str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'], map_from_organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_least_diverged: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the PantherDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters:
@@ -2351,7 +2351,7 @@ rnalysis.filtering module
-
-map_orthologs_phylomedb(map_to_organism: Union[str, int, Literal], map_from_organism: Union[Literal['auto'], str, int, Literal] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', consistency_score_threshold: Fraction = 0.5, filter_consistency_score: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
+map_orthologs_phylomedb(map_to_organism: str | int | Literal, map_from_organism: Literal['auto'] | str | int | Literal = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', consistency_score_threshold: Fraction = 0.5, filter_consistency_score: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the PhylomeDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
@@ -2419,7 +2419,7 @@ rnalysis.filtering module
- Parameters:
-
_FASTQPipeline
Returns a string functions signature for the given function and arguments.
- Parameters: @@ -175,7 +175,7 @@
- -_readable_func_signature(func: function, args: tuple, kwargs: dict) +_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters: @@ -196,7 +196,7 @@
- -export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str] +export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters: @@ -216,7 +216,7 @@
- -classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline +classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters: @@ -263,7 +263,7 @@
- -_func_signature(func: function, args: tuple, kwargs: dict) +_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a string functions signature for the given function and arguments.
- Parameters: @@ -304,7 +304,7 @@
- -_readable_func_signature(func: function, args: tuple, kwargs: dict) +_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters: @@ -325,7 +325,7 @@
- -export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str] +export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters: @@ -345,7 +345,7 @@
- -classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline +classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters: @@ -392,7 +392,7 @@
- -_func_signature(func: function, args: tuple, kwargs: dict) +_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a string functions signature for the given function and arguments.
- Parameters: @@ -433,7 +433,7 @@
- -_readable_func_signature(func: function, args: tuple, kwargs: dict) +_readable_func_signature(func: LambdaType, args: tuple, kwargs: dict)
Returns a human-readable string functions signature for the given function and arguments.
- Parameters: @@ -454,7 +454,7 @@
- -export_pipeline(filename: Optional[Union[str, Path]]) Union[None, str] +export_pipeline(filename: str | Path | None) None | str
Export a Pipeline to a Pipeline YAML file or YAML-like string.
- Parameters: @@ -474,7 +474,7 @@
- -classmethod import_pipeline(filename: Union[str, Path]) GenericPipeline +classmethod import_pipeline(filename: str | Path) GenericPipeline
Import a Pipeline from a Pipeline YAML file or YAML-like string.
- Parameters: @@ -517,32 +517,32 @@
- -rnalysis.fastq._merge_kallisto_outputs(output_folder: Union[str, Path], new_sample_names: List[str]) +rnalysis.fastq._merge_kallisto_outputs(output_folder: str | Path, new_sample_names: List[str])
output a merged csv file of transcript estimated counts, and a merged csv file of transcript estimated TPMs.
rnalysis.fastq module
- -rnalysis.fastq.bowtie2_align_paired_end(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], index_file: Union[str, Path], bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto', 'smart']] = 'smart', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', mate_orientations: Literal['fwd-rev', 'rev-fwd', 'fwd-fwd'] = 'fwd-rev', min_fragment_length: NonNegativeInt = 0, max_fragment_length: PositiveInt = 500, allow_individual_alignment: bool = True, allow_disconcordant_alignment: bool = True, random_seed: NonNegativeInt = 0, threads: PositiveInt = 1) +rnalysis.fastq.bowtie2_align_paired_end(r1_files: List[str], r2_files: List[str], output_folder: str | Path, index_file: str | Path, bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto', 'smart'] = 'smart', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', mate_orientations: Literal['fwd-rev', 'rev-fwd', 'fwd-fwd'] = 'fwd-rev', min_fragment_length: NonNegativeInt = 0, max_fragment_length: PositiveInt = 500, allow_individual_alignment: bool = True, allow_disconcordant_alignment: bool = True, random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
Align paired-end reads from FASTQ files to a reference sequence using the bowtie2 aligner. The FASTQ file pairs will be individually aligned, and the aligned SAM files will be saved in the output folder. You can read more about how bowtie2 works in the bowtie2 manual.
- Parameters:
-
-
r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the aligned reads, as well as the log files, will be saved.
index_file (str or Path) – Path to a pre-built bowtie2 index of the target genome. Can either be downloaded from the bowtie2 website (menu on the right), or generated manually from FASTA files using the function ‘bowtie2_create_index’. Note that bowtie2 indices are composed of multiple files ending with the ‘.bt2’ suffix. All of those files should be in the same location. It is enough to specify the path to one of those files (for example, ‘path/to/index.1.bt2’), or to the main name of the index (for example, ‘path/to/index’).
bowtie2_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of bowtie2. For example: ‘C:/Program Files/bowtie2-2.5.1’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
mode ('end-to-end' or 'local' (default='end-to-end')) – determines the alignment mode of bowtie2. end-to-end mode will look for alignments involving all the read characters. local mode will allow ‘clipping’ of nucleotides from both sides of the read, if that maximizes the alignment score.
settings_preset ('very-sensitive', 'sensitive', 'fast', or 'very-fast' (default='very-sensitive')) – determines the alignment sensitivity preset. Higher sensitivity will result in more accurate alignments, but will take longer to calculate. You can read more about the settings presets in the bowtie2 manual.
ignore_qualities (bool (default=False)) – if True, bowtie2 will ignore the qualities of the reads and treat them all as maximum quality.
quality_score_type ('phred33', 'phred64', 'solexa-quals', or 'int-quals' (default='phred33')) – determines the encoding type of the read quality scores. Most modern sequencing setups use phred+33.
-mate_orientations ('fwd-rev', 'rev-fwd', or 'fwd-fwd' (default='fwd-rev')) –
+mate_orientations ('fwd-rev', 'rev-fwd', or 'fwd-fwd' (default='fwd-rev'))
min_fragment_length (int >= 0 (default=0)) – The minimum fragment length for valid paired-end alignments.
max_fragment_length (int > 0 (default=500)) – The maximum fragment length for valid paired-end alignments.
-allow_individual_alignment (bool (default=) –
-allow_disconcordant_alignment (bool (default=) –
+allow_individual_alignment (bool (default=)
+allow_disconcordant_alignment (bool (default=)
random_seed (int >=0 (default=0)) – determines the seed for pseudo-random number generator.
threads (int > 0 (default=1)) – number of threads to run bowtie2-build on. More threads will generally make index building faster.
rnalysis.fastq module
- -rnalysis.fastq.bowtie2_align_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], index_file: Union[str, Path], bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)
+rnalysis.fastq.bowtie2_align_single_end(fastq_folder: str | Path, output_folder: str | Path, index_file: str | Path, bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', mode: Literal['end-to-end', 'local'] = 'end-to-end', settings_preset: Literal['very-fast', 'fast', 'sensitive', 'very-sensitive'] = 'very-sensitive', ignore_qualities: bool = False, quality_score_type: Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'phred33', random_seed: NonNegativeInt = 0, threads: PositiveInt = 1)Align single-end reads from FASTQ files to a reference sequence using the bowtie2 aligner. The FASTQ files will be individually aligned, and the aligned SAM files will be saved in the output folder. You can read more about how bowtie2 works in the bowtie2 manual.
- Parameters: @@ -562,7 +562,7 @@
bowtie2_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of bowtie2. For example: ‘C:/Program Files/bowtie2-2.5.1’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
mode ('end-to-end' or 'local' (default='end-to-end')) – determines the alignment mode of bowtie2. end-to-end mode will look for alignments involving all the read characters. local mode will allow ‘clipping’ of nucleotides from both sides of the read, if that maximizes the alignment score.
settings_preset ('very-sensitive', 'sensitive', 'fast', or 'very-fast' (default='very-sensitive')) – determines the alignment sensitivity preset. Higher sensitivity will result in more accurate alignments, but will take longer to calculate. You can read more about the settings presets in the bowtie2 manual.
ignore_qualities (bool (default=False)) – if True, bowtie2 will ignore the qualities of the reads and treat them all as maximum quality.
@@ -576,12 +576,12 @@ - -rnalysis.fastq.bowtie2_create_index(genome_fastas: List[Union[str, Path]], output_folder: Union[str, Path], index_name: Union[str, Literal['auto']] = 'auto', bowtie2_installation_folder: Union[str, Path, Literal['auto']] = 'auto', random_seed: Optional[NonNegativeInt] = None, threads: PositiveInt = 1) +rnalysis.fastq.bowtie2_create_index(genome_fastas: List[str | Path], output_folder: str | Path, index_name: str | Literal['auto'] = 'auto', bowtie2_installation_folder: str | Path | Literal['auto'] = 'auto', random_seed: NonNegativeInt | None = None, threads: PositiveInt = 1)
builds a bowtie index from FASTA formatted files of target sequences (genome). The index files will be saved in the same folder as your first FASTA file, with the .bt2 suffix. Be aware that there are pre-built bowtie2 indices for popular model organisms. These can be downloaded from the bowtie2 website (from menu on the right).
- Parameters:
-
-
genome_fastas (list of str or Path) – Path to the FASTA file/files which contain reference sequences to be aligned to.
+genome_fastas (list of str or Path) – Path to the FASTA file/files which contain reference sequences to be aligned to.
output_folder (str or Path) – Path to the folder in which the bowtie2 index files will be saved.
index_name (str or 'auto' (default='auto')) – The basename of the index files. bowtie2 will create files named index_name.1.bt2, index_name.2.bt2, index_name.3.bt2, index_name.4.bt2, index_name.rev.1.bt2, and index_name.rev.2.bt2. if index_name=’auto’, the index name used will be the stem of the first supplied genome FASTA file (for example: if the first genome FASTA file is ‘path/to/genome.fa.gz’, the index name will be ‘genome’).
bowtie2_installation_folder – Path to the installation folder of bowtie2. For example:
@@ -598,7 +598,7 @@ - -rnalysis.fastq.convert_sam_format(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam') +rnalysis.fastq.convert_sam_format(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam')
Convert SAM files to BAM files or vice versa using Picard SamFormatConverter.
rnalysis.fastq module
@@ -615,7 +615,7 @@ - -rnalysis.fastq.fastq_to_sam_paired(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Union[Literal['auto'], Literal['phred33', 'phred64', 'solexa-quals', 'int-quals']] = 'auto') +rnalysis.fastq.fastq_to_sam_paired(r1_files: List[str], r2_files: List[str], output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Literal['auto'] | Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'auto')
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Returns: @@ -637,7 +637,7 @@
- -rnalysis.fastq.fastq_to_sam_single(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Union[Literal['auto'], Literal['phred33', 'phred64', 'solexa-quals', 'int-quals']] = 'auto') +rnalysis.fastq.fastq_to_sam_single(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', quality_score_type: Literal['auto'] | Literal['phred33', 'phred64', 'solexa-quals', 'int-quals'] = 'auto')
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Returns: @@ -659,7 +659,7 @@
- -rnalysis.fastq.featurecounts_paired_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, require_both_mapped: bool = True, count_chimeric_fragments: bool = False, min_fragment_length: NonNegativeInt = 50, max_fragment_length: Optional[PositiveInt] = 600, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame] +rnalysis.fastq.featurecounts_paired_end(input_folder: str | Path, output_folder: str | Path, gtf_file: str | Path, gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, require_both_mapped: bool = True, count_chimeric_fragments: bool = False, min_fragment_length: NonNegativeInt = 50, max_fragment_length: PositiveInt | None = 600, report_read_assignment: Literal['bam', 'sam', 'core'] | None = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
Assign mapped paired-end sequencing reads to specified genomic features using RSubread featureCounts. Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.
- Parameters: @@ -670,7 +670,7 @@
gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair aligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair aligns to the reverse strand of a transcript.
min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria.
count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).
@@ -697,7 +697,7 @@ - -rnalysis.fastq.featurecounts_single_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame] +rnalysis.fastq.featurecounts_single_end(input_folder: str | Path, output_folder: str | Path, gtf_file: str | Path, gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, report_read_assignment: Literal['bam', 'sam', 'core'] | None = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]
Assign mapped single-end sequencing reads to specified genomic features using RSubread featureCounts. Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.
-
@@ -709,7 +709,7 @@
gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the alphabetical order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the alphabetical order of the files in the directory.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the reads align to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the reads align to the reverse strand of a transcript.
min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted.
count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).
@@ -732,7 +732,7 @@ - -rnalysis.fastq.find_duplicates(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', duplicate_handling: Literal['mark', 'remove_optical', 'remove_all'] = 'remove_all', duplicate_scoring_strategy: Literal['reference_length', 'sum_of_base_qualities', 'random'] = 'sum_of_base_qualities', optical_duplicate_pixel_distance: int = 100) +rnalysis.fastq.find_duplicates(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', output_format: Literal['sam', 'bam'] = 'bam', duplicate_handling: Literal['mark', 'remove_optical', 'remove_all'] = 'remove_all', duplicate_scoring_strategy: Literal['reference_length', 'sum_of_base_qualities', 'random'] = 'sum_of_base_qualities', optical_duplicate_pixel_distance: int = 100)
Find duplicate reads in SAM/BAM files using Picard MarkDuplicates.
- Parameters: @@ -740,7 +740,7 @@
output_folder (str or Path) – Path to a folder in which the sorted SAM/BAM files will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
output_format ('sam' or 'bam' (default='bam')) – Format of the output file.
duplicate_handling ('mark', 'remove_optical', or 'remove_all' (default='remove_all')) – How to handle detected duplicate reads. If ‘mark’, duplicate reads will be marked with a 1024 flag. If ‘remove_optical’, ‘optical’ duplicates and other duplicates that appear to have arisen from the sequencing process instead of the library preparation process will be removed. If ‘remove_all’, all duplicate reads will be removed.
duplicate_scoring_strategy ('reference_length', 'sum_of_base_qualities', or 'random' (default='sum_of_base_qualities')) – How to score duplicate reads. If ‘reference_length’, the length of the reference sequence will be used. If ‘sum_of_base_qualities’, the sum of the base qualities will be used.
@@ -752,7 +752,7 @@ - -rnalysis.fastq.kallisto_create_index(transcriptome_fasta: Union[str, Path], kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', kmer_length: PositiveInt = 31, make_unique: bool = False) +rnalysis.fastq.kallisto_create_index(transcriptome_fasta: str | Path, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', kmer_length: PositiveInt = 31, make_unique: bool = False)
builds a kallisto index from a FASTA formatted file of target sequences (transcriptome). The index file will be saved in the same folder as your FASTA file, with the .idx suffix. Be aware that there are pre-built kallisto indices for popular model organisms. These can be downloaded from the kallisto transcriptome indices site.
- Parameters: @@ -768,21 +768,21 @@
- -rnalysis.fastq.kallisto_quantify_paired_end(r1_files: List[str], r2_files: List[str], output_folder: Union[str, Path], index_file: Union[str, Path], gtf_file: Union[str, Path], kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto', 'smart']] = 'smart', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: Optional[PositiveInt] = None, **legacy_args) CountFilter +rnalysis.fastq.kallisto_quantify_paired_end(r1_files: List[str], r2_files: List[str], output_folder: str | Path, index_file: str | Path, gtf_file: str | Path, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto', 'smart'] = 'smart', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: PositiveInt | None = None, **legacy_args) CountFilter
Quantify transcript abundance in paired-end mRNA sequencing data using kallisto. The FASTQ file pairs will be individually quantified and saved in the output folder, each in its own sub-folder. Alongside these files, three .csv files will be saved: a per-transcript count estimate table, a per-transcript TPM estimate table, and a per-gene scaled output table. The per-gene scaled output table is generated using the scaledTPM method (scaling the TPM estimates up to the library size) as described by Soneson et al 2015 and used in the tximport R package. This table format is considered un-normalized for library size, and can therefore be used directly by count-based statistical inference tools such as DESeq2. RNAlysis will return this table once the analysis is finished.
- Parameters:
-
-
summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm')) –
-r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm'))
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the quantified results, as well as the log files, will be saved. The individual output of each pair of FASTQ files will reside in a different sub-folder within the output folder, and a summarized results table will be saved in the output folder itself.
index_file (str or Path) –
Path to a pre-built kallisto index of the target transcriptome. Can either be downloaded from the kallisto transcriptome indices site, or generated manually from a FASTA file using the function kallisto_create_index.
gtf_file (str or Path) – Path to a GTF annotation file. This file will be used to map per-transcript abundances to per-gene estimated counts. The transcript names in the GTF files should match the ones in the index file - we recommend downloading cDNA FASTA/index files and GTF files from the same data source.
kallisto_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of kallisto. For example: ‘C:/Program Files/kallisto’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair pseudoaligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair pseudoaligns to the reverse strand of a transcript.
summation_method – Determines the method used to sum the transcript-level abundances to gene-level abundances. ‘scaled_tpm’ sums the transcript TPM estimates the gene level, and then scales then to the library size. ‘raw’ sums the transcript estimated counts to the gene level without scaling.
learn_bias (bool (default=False)) – if True, kallisto learns parameters for a model of sequences specific bias and corrects the abundances accordlingly. Note that this feature is not supported by kallisto versions beyond 0.48.0.
@@ -795,13 +795,13 @@ - -rnalysis.fastq.kallisto_quantify_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], index_file: Union[str, Path], gtf_file: Union[str, Path], average_fragment_length: float, stdev_fragment_length: float, kallisto_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: Optional[PositiveInt] = None, **legacy_args) CountFilter +rnalysis.fastq.kallisto_quantify_single_end(fastq_folder: str | Path, output_folder: str | Path, index_file: str | Path, gtf_file: str | Path, average_fragment_length: float, stdev_fragment_length: float, kallisto_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', summation_method: Literal['scaled_tpm', 'raw'] = 'scaled_tpm', bootstrap_samples: PositiveInt | None = None, **legacy_args) CountFilter
Quantify transcript abundance in single-end mRNA sequencing data using kallisto. The FASTQ files will be individually quantified and saved in the output folder, each in its own sub-folder. Alongside these files, three .csv files will be saved: a per-transcript count estimate table, a per-transcript TPM estimate table, and a per-gene scaled output table. The per-gene scaled output table is generated using the scaledTPM method (scaling the TPM estimates up to the library size) as described by Soneson et al 2015 and used in the tximport R package. This table format is considered un-normalized for library size, and can therefore be used directly by count-based statistical inference tools such as DESeq2. RNAlysis will return this table once the analysis is finished.
- Parameters:
-
-
summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm')) –
+summation_method ('scaled_tpm' or 'raw' (default='scaled_tpm'))
fastq_folder (str or Path) – Path to the folder containing the FASTQ files you want to quantify
output_folder (str/Path to an existing folder) – Path to a folder in which the quantified results, as well as the log files, will be saved. The individual output of each pair of FASTQ files will reside in a different sub-folder within the output folder, and a summarized results table will be saved in the output folder itself.
index_file (str or Path) –
Path to a pre-built kallisto index of the target transcriptome. Can either be downloaded from the kallisto transcriptome indices site, or generated manually from a FASTA file using the function kallisto_create_index.
@@ -810,7 +810,7 @@rnalysis.fastq module
average_fragment_length (float > 0) – Estimated average fragment length. Typical Illumina libraries produce fragment lengths ranging from 180–200bp, but it’s best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer.
stdev_fragment_length (float > 0) – Estimated standard deviation of fragment length. Typical Illumina libraries produce fragment lengths ranging from 180–200bp, but it’s best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer.
kallisto_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of kallisto. For example: ‘C:/Program Files/kallisto’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair pseudoaligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair pseudoaligns to the reverse strand of a transcript.
summation_method – Determines the method used to sum the transcript-level abundances to gene-level abundances. ‘scaled_tpm’ sums the transcript TPM estimates the gene level, and then scales then to the library size. ‘raw’ sums the transcript estimated counts to the gene level without scaling.
learn_bias (bool (default=False)) – if True, kallisto learns parameters for a model of sequences specific bias and corrects the abundances accordlingly. Note that this feature is not supported by kallisto versions beyond 0.48.0.
@@ -823,7 +823,7 @@ - -rnalysis.fastq.sam_to_fastq_paired(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: Optional[PositiveInt] = None, return_new_filenames: bool = False) +rnalysis.fastq.sam_to_fastq_paired(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: PositiveInt | None = None, return_new_filenames: bool = False)
Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
rnalysis.fastq module
rnalysis.fastq module
- -rnalysis.fastq.sam_to_fastq_single(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: Optional[PositiveInt] = None)
+rnalysis.fastq.sam_to_fastq_single(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', re_reverse_reads: bool = True, include_non_primary_alignments: bool = False, quality_trim: PositiveInt | None = None)Convert SAM/BAM files to FASTQ files using Picard SamToFastq.
- Parameters: @@ -856,7 +856,7 @@
output_folder (str or Path) – Path to a folder in which the converted FASTQ files will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
re_reverse_reads (bool (default=True)) – Re-reverse bases and qualities of reads with the negative-strand flag before writing them to FASTQ.
include_non_primary_alignments (bool (default=False)) – If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.
quality_trim (positive int or None (default=None)) – If enabled, End-trim reads using the phred/bwa quality trimming algorithm and this quality.
@@ -873,7 +873,7 @@ - -rnalysis.fastq.shortstack_align_smallrna(fastq_folder: Union[str, Path], output_folder: Union[str, Path], genome_fasta: Union[str, Path], shortstack_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', known_rnas: Optional[Union[str, Path]] = None, trim_adapter: Optional[Union[str, Literal['autotrim']]] = None, autotrim_key: str = 'TCGGACCAGGCTTCATTCCCC', multimap_mode: Literal['fractional', 'unique', 'random'] = 'fractional', align_only: bool = False, show_secondary_alignments: bool = False, dicer_min_length: PositiveInt = 21, dicer_max_length: PositiveInt = 24, loci_file: Optional[Union[str, Path]] = None, locus: Optional[str] = None, search_microrna: Union[None, Literal['de-novo', 'known-rnas']] = 'known-rnas', strand_cutoff: Fraction = 0.8, min_coverage: float = 2, pad: PositiveInt = 75, threads: PositiveInt = 1) +rnalysis.fastq.shortstack_align_smallrna(fastq_folder: str | Path, output_folder: str | Path, genome_fasta: str | Path, shortstack_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', known_rnas: str | Path | None = None, trim_adapter: str | Literal['autotrim'] | None = None, autotrim_key: str = 'TCGGACCAGGCTTCATTCCCC', multimap_mode: Literal['fractional', 'unique', 'random'] = 'fractional', align_only: bool = False, show_secondary_alignments: bool = False, dicer_min_length: PositiveInt = 21, dicer_max_length: PositiveInt = 24, loci_file: str | Path | None = None, locus: str | None = None, search_microrna: None | Literal['de-novo', 'known-rnas'] = 'known-rnas', strand_cutoff: Fraction = 0.8, min_coverage: float = 2, pad: PositiveInt = 75, threads: PositiveInt = 1)
Align small RNA single-end reads from FASTQ files to a reference sequence using the ShortStack aligner (version 4). ShortStack is currently not supported on computers running Windows.
- Parameters: @@ -882,7 +882,7 @@
genome_fasta (str or Path) – Path to the FASTA file which contain the reference sequences to be aligned to.
shortstack_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of ShortStack. For example: ‘/home/myuser/anaconda3/envs/myenv/bin’. if installation folder is set to ‘auto’, RNAlysis will attempt to find it automatically.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
known_rnas (str, Path, or None (default=None)) – Path to FASTA-formatted file of known small RNAs. FASTA must be formatted such that a single RNA sequence is on one line only. ATCGUatcgu characters are acceptable. These RNAs are typically the sequences of known microRNAs. For instance, a FASTA file of mature miRNAs pulled from https://www.mirbase.org. Providing these data increases the accuracy of MIRNA locus identification.
trim_adapter (str, 'autotrim', or None (default=None)) – Determines whether ShortStack will attempt to trim the supplied reads. If trim_adapter is not provided (default), no trimming will be run. If trim_adapter is set to ‘autotrim’, ShortStack will automatically infer the 3’ adapter sequence of the untrimmed reads, and the uses that to coordinate read trimming. If trim_adapter is a DNA sequence, ShortStack will trim the reads using the given DNA sequence as the 3’ adapter.
autotrim_key (str (default="TCGGACCAGGCTTCATTCCCC" (miR166))) – A DNA sequence to use as a known suffix during the autotrim procedure. This parameter is used only if trim_adapter is set to ‘autotrim’. ShortStack’s autotrim discovers the 3’ adapter by scanning for reads that begin with the sequence given by autotrim_key. This should be the sequence of a small RNA that is known to be highly abundant in all the libraries. The default sequence is for miR166, a microRNA that is present in nearly all plants at high levels. For non-plant experiments, or if the default is not working well, consider providing an alternative to the default.
@@ -905,7 +905,7 @@ - -rnalysis.fastq.sort_sam(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', sort_order: Literal['coordinate', 'queryname', 'duplicate'] = 'coordinate') +rnalysis.fastq.sort_sam(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', new_sample_names: List[str] | Literal['auto'] = 'auto', sort_order: Literal['coordinate', 'queryname', 'duplicate'] = 'coordinate')
Sort SAM/BAM files using Picard SortSam.
rnalysis.fastq module
output_folder (str/Path to an existing folder) – Path to a folder in which the aligned reads, as well as the log files, will be saved.
rnalysis.fastq module
rnalysis.fastq module
input_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to convert.
rnalysis.fastq module
rnalysis.fastq module
@@ -922,20 +922,20 @@ - -rnalysis.fastq.trim_adapters_paired_end(r1_files: List[Union[str, Path]], r2_files: List[Union[str, Path]], output_folder: Union[str, Path], three_prime_adapters_r1: Union[None, str, List[str]], three_prime_adapters_r2: Union[None, str, List[str]], five_prime_adapters_r1: Union[None, str, List[str]] = None, five_prime_adapters_r2: Union[None, str, List[str]] = None, any_position_adapters_r1: Union[None, str, List[str]] = None, any_position_adapters_r2: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False) +rnalysis.fastq.trim_adapters_paired_end(r1_files: List[str | Path], r2_files: List[str | Path], output_folder: str | Path, three_prime_adapters_r1: None | str | List[str], three_prime_adapters_r2: None | str | List[str], five_prime_adapters_r1: None | str | List[str] = None, five_prime_adapters_r2: None | str | List[str] = None, any_position_adapters_r1: None | str | List[str] = None, any_position_adapters_r2: None | str | List[str] = None, new_sample_names: List[str] | Literal['auto'] = 'auto', quality_trimming: NonNegativeInt | None = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: PositiveInt | None = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False)
Trim adapters from paired-end reads using CutAdapt.
- Parameters:
-
-
r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
-r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
+r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.
+r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.
output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.
-three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.
-three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.
-five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.
-five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.
-any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.
-any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.
+three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.
+three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.
+five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.
+five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.
+any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.
+any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.
quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.
trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.
minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.
@@ -947,7 +947,7 @@ parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.
gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.
rnalysis.fastq module
allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.
rnalysis.fastq module
- -rnalysis.fastq.trim_adapters_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], three_prime_adapters: Union[None, str, List[str]], five_prime_adapters: Union[None, str, List[str]] = None, any_position_adapters: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)
+rnalysis.fastq.trim_adapters_single_end(fastq_folder: str | Path, output_folder: str | Path, three_prime_adapters: None | str | List[str], five_prime_adapters: None | str | List[str] = None, any_position_adapters: None | str | List[str] = None, new_sample_names: List[str] | Literal['auto'] = 'auto', quality_trimming: NonNegativeInt | None = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: PositiveInt | None = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)Trim adapters from single-end reads using CutAdapt.
- Parameters:
fastq_folder (str/Path to an existing folder) – Path to the folder containing your untrimmed FASTQ files
output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.
-three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.
-five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.
-any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.
+three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.
+five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.
+any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.
quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.
trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.
minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.
@@ -975,7 +975,7 @@ parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.
gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.
rnalysis.fastq module
allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.
rnalysis.fastq module
- -rnalysis.fastq.validate_sam(input_folder: Union[str, Path], output_folder: Union[str, Path], picard_installation_folder: Union[str, Path, Literal['auto']] = 'auto', verbose: bool = True, is_bisulfite_sequenced: bool = False)
+rnalysis.fastq.validate_sam(input_folder: str | Path, output_folder: str | Path, picard_installation_folder: str | Path | Literal['auto'] = 'auto', verbose: bool = True, is_bisulfite_sequenced: bool = False)Validate SAM/BAM files using Picard ValidateSamFile.
- Parameters: @@ -991,7 +991,7 @@
output_folder (str or Path) – Path to a folder in which the validation reports will be saved.
picard_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of Picard. For example: ‘C:/Program Files/Picard’
-new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
+new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each converted sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the files in the directory.
verbose (bool (default=True)) – If True, the validation report will be verbose. If False, the validation report will be a summary.
is_bisulfite_sequenced (bool (default=False)) – Indicates whether the SAM/BAM file consists of bisulfite sequenced reads. If so, C->T is not counted as en error in computer the value of the NM tag.
@@ -1007,13 +1007,13 @@ - -class rnalysis.filtering.CountFilter(fname: Union[str, Path, tuple], drop_columns: Union[str, List[str]] = None, is_normalized: bool = False, suppress_warnings: bool = False) +class rnalysis.filtering.CountFilter(fname: str | Path | tuple, drop_columns: str | List[str] = None, is_normalized: bool = False, suppress_warnings: bool = False)
Bases:
Filter
A class that receives a count matrix and can filter it according to various characteristics.
Attributes
@@ -1035,7 +1035,7 @@rnalysis.filtering module
- -_avg_subsamples(sample_grouping: GroupedColumns, function: Literal['mean', 'median', 'geometric_mean'] = 'mean', new_column_names: Union[Literal['auto'], Literal['display'], List[str]] = 'display')
+_avg_subsamples(sample_grouping: GroupedColumns, function: Literal['mean', 'median', 'geometric_mean'] = 'mean', new_column_names: Literal['auto'] | Literal['display'] | List[str] = 'display')Avarages subsamples/replicates according to the specified sample list. Every member in the sample list should be either a name of a single sample (str), or a list of multiple sample names to be averaged (list).
- Parameters: @@ -1104,7 +1104,7 @@
- -static _pca_plot(final_df: DataFrame, pc1_var: float, pc2_var: float, sample_grouping: GroupedColumns, labels: bool, title: str, title_fontsize: float, label_fontsize: float, tick_fontsize: float, proportional_axes: bool, plot_grid: bool, legend: Optional[List[str]]) Figure +static _pca_plot(final_df: DataFrame, pc1_var: float, pc2_var: float, sample_grouping: GroupedColumns, labels: bool, title: str, title_fontsize: float, label_fontsize: float, tick_fontsize: float, proportional_axes: bool, plot_grid: bool, legend: List[str] | None) Figure
Internal method, used to plot the results from CountFilter.pca().
- Parameters: @@ -1145,13 +1145,13 @@
- -_sort(by: Union[str, List[str]], ascending: Union[bool, List[bool]] = True, na_position: str = 'last') +_sort(by: str | List[str], ascending: bool | List[bool] = True, na_position: str = 'last')
Sort the rows by the values of specified column or columns.
- Parameters:
-
-
by (str or list of str) – Names of the column or columns to sort by.
-ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
+by (str or list of str) – Names of the column or columns to sort by.
+ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
na_position ('first' or 'last', default 'last') – If ‘first’, puts NaNs at the beginning; if ‘last’, puts NaNs at the end.
inplace (bool, default True) – If True, perform operation in-place. Otherwise, returns a sorted copy of the Filter object without modifying the original.
rnalysis.filtering module
- -average_replicate_samples(sample_grouping: GroupedColumns, new_column_names: Union[Literal['auto'], List[str]] = 'auto', function: Literal['mean', 'median', 'geometric_mean'] = 'mean', inplace: bool = True) CountFilter
+average_replicate_samples(sample_grouping: GroupedColumns, new_column_names: Literal['auto'] | List[str] = 'auto', function: Literal['mean', 'median', 'geometric_mean'] = 'mean', inplace: bool = True) CountFilterAverage the expression values of gene expression for each group of replicate samples. Each group of samples (e.g. biological/technical replicates)
- Parameters:
-
-
sample_grouping (nested list of column names) – grouping of the samples into conditions. Each grouping should containg all replicates of the same condition. Each condition will be averaged separately.
-new_column_names (list of str or 'auto' (default='auto') – names to be given to the columns in the new count matrix. Each new name should match a group of samples to be averaged. If `new_column_names`=’auto’, names will be generated automatically.
+sample_grouping (nested list of column names) – grouping of the samples into conditions. Each grouping should containg all replicates of the same condition. Each condition will be averaged separately.
+new_column_names (list of str or 'auto' (default='auto') – names to be given to the columns in the new count matrix. Each new name should match a group of samples to be averaged. If `new_column_names`=’auto’, names will be generated automatically.
function ('mean', 'median', or 'geometric_mean' (default='mean')) – the function which will be used to average the values within each group.
inplace (bool (default=True)) – If True (default), averaging will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected.
rnalysis.filtering module
- -biotypes_from_gtf(gtf_path: Union[str, Path], attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', long_format: bool = False) DataFrame
+biotypes_from_gtf(gtf_path: str | Path, attribute_name: Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'] | str = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', long_format: bool = False) DataFrameReturns a DataFrame describing the biotypes in the table and their count. The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.
- Parameters: @@ -1199,7 +1199,7 @@
- -biotypes_from_ref_table(long_format: bool = False, ref: Union[str, Path, Literal['predefined']] = 'predefined') DataFrame +biotypes_from_ref_table(long_format: bool = False, ref: str | Path | Literal['predefined'] = 'predefined') DataFrame
Returns a DataFrame describing the biotypes in the table and their count. The data about feature biotypes is drawn from a Biotype Reference Table supplied by the user.
rnalysis.filtering module
- -box_plot(samples: Union[GroupedColumns, Literal['all']] = 'all', notch: bool = True, scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
+box_plot(samples: GroupedColumns | Literal['all'] = 'all', notch: bool = True, scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) FigureGenerates a box plot of the specified samples in the CountFilter object in log10 scale. Can plot both single samples and average multiple replicates. It is recommended to use this function on normalized values and not on absolute read values. The box indicates 25% and 75% percentiles, and the white dot indicates the median.
- Parameters: @@ -1251,7 +1251,7 @@
rnalysis.filtering moduleReturn type: @@ -1261,14 +1261,14 @@
rnalysis.filtering module
- +
rnalysis.filtering module
- -clustergram(sample_names: Union[ColumnNames, Literal['all']] = 'all', metric: Union[Literal['Correlation', 'Cosine', 'Euclidean', 'Jaccard'], str] = 'Euclidean', linkage: Literal['Single', 'Average', 'Complete', 'Ward', 'Weighted', 'Centroid', 'Median'] = 'Average', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, tick_fontsize: float = 12, colormap: ColorMap = 'inferno', colormap_label: Union[Literal['auto'], str] = 'auto', cluster_columns: bool = True, log_transform: bool = True, z_score_rows: bool = False) Figure +clustergram(sample_names: ColumnNames | Literal['all'] = 'all', metric: Literal['Correlation', 'Cosine', 'Euclidean', 'Jaccard'] | str = 'Euclidean', linkage: Literal['Single', 'Average', 'Complete', 'Ward', 'Weighted', 'Centroid', 'Median'] = 'Average', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, tick_fontsize: float = 12, colormap: ColorMap = 'inferno', colormap_label: Literal['auto'] | str = 'auto', cluster_columns: bool = True, log_transform: bool = True, z_score_rows: bool = False) Figure
Performs hierarchical clustering and plots a clustergram on the base-2 log of a given set of samples.
rnalysis.filtering module
- -describe(percentiles: Union[float, List[float]] = (0.01, 0.25, 0.5, 0.75, 0.99)) DataFrame
+describe(percentiles: float | List[float] = (0.01, 0.25, 0.5, 0.75, 0.99)) DataFrameGenerate descriptive statistics that summarize the central tendency, dispersion and shape of the dataset’s distribution, excluding NaN values. For more information see the documentation of pandas.DataFrame.describe.
- Parameters: -
percentiles (list-like of floats (default=(0.01, 0.25, 0.5, 0.75, 0.99))) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
+percentiles (list-like of floats (default=(0.01, 0.25, 0.5, 0.75, 0.99))) – The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
- Returns:
Summary statistics of the dataset.
@@ -1374,7 +1374,7 @@rnalysis.filtering module
- -difference(*others: Union[Filter, set], return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
+difference(*others: Filter | set, return_type: Literal['set', 'str'] = 'set', inplace: bool = False)Keep only the features that exist in the first Filter object/set but NOT in the others. Can be done inplace on the first Filter object, or return a set/string of features.
- Parameters: @@ -1412,16 +1412,16 @@
- -differential_expression_deseq2(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Union[Literal['auto'], Iterable[str]] = 'auto', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, return_design_matrix: bool = False, scaling_factors: Optional[Union[str, Path]] = None, cooks_cutoff: bool = True, return_code: bool = False) Tuple[DESeqFilter, ...] +differential_expression_deseq2(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Literal['auto'] | Iterable[str] = 'auto', r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, return_design_matrix: bool = False, scaling_factors: str | Path | None = None, cooks_cutoff: bool = True, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the DESeq2 algorithm. The count matrix you are analyzing should be unnormalized (meaning, raw read counts). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
- Parameters:
design_matrix (str or Path) – path to a csv file containing the experiment’s design matrix. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix.
-comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
-lrt_factors (Iterable of factor names (default=tuple())) – optionally, specify factors to be tested using the likelihood ratio test (LRT). If the factors are a continuous variable, you can also specify the number of polynomial degree to fit.
-covariates (Iterable of covariate names (default=tuple())) – optionally, specify a list of continuous covariates to include in the analysis. The covariates should be column names in the design matrix. The reported fold change values correspond to the expected fold change for every increase of 1 unit in the covariate.
-model_factors (Iterable of factor names or 'auto' (default='auto')) – optionally, specify a list of factors to include in the differential expression model. If ‘auto’, all factors in the design matrix will be included.
+comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
+lrt_factors (Iterable of factor names (default=tuple())) – optionally, specify factors to be tested using the likelihood ratio test (LRT). If the factors are a continuous variable, you can also specify the number of polynomial degree to fit.
+covariates (Iterable of covariate names (default=tuple())) – optionally, specify a list of continuous covariates to include in the analysis. The covariates should be column names in the design matrix. The reported fold change values correspond to the expected fold change for every increase of 1 unit in the covariate.
+model_factors (Iterable of factor names or 'auto' (default='auto')) – optionally, specify a list of factors to include in the differential expression model. If ‘auto’, all factors in the design matrix will be included.
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
output_folder (str, Path, or None) – Path to a folder in which the analysis results, as well as the log files and R script used to generate them, will be saved. if output_folder is None, the results will not be saved to a specified directory.
return_design_matrix (bool (default=False)) – if True, the function will return the sanitized design matrix used in the analysis.
@@ -1436,13 +1436,13 @@ - -differential_expression_deseq2_simplified(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...] +differential_expression_deseq2_simplified(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the DESeq2 algorithm. The simplified mode supports only pairwise comparisons. The count matrix you are analyzing should be unnormalized (meaning, raw read counts). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
- Parameters:
design_matrix (str or Path) – path to a csv file containing the experiment’s design matrix. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix.
-comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
+comparisons (Iterable of tuple(factor, numerator_value, denominator_value)) – specifies what comparisons to build results tables out of. each individual comparison should be a tuple with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change.
r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’
output_folder (str, Path, or None) – Path to a folder in which the analysis results, as well as the log files and R script used to generate them, will be saved. if output_folder is None, the results will not be saved to a specified directory.
return_design_matrix (bool (default=False)) – if True, the function will return the sanitized design matrix used in the analysis.
@@ -1457,7 +1457,7 @@ - -differential_expression_limma_voom(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Union[Literal['auto'], Iterable[str]] = 'auto', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, random_effect: Optional[str] = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...] +differential_expression_limma_voom(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], covariates: Iterable[str] = (), lrt_factors: Iterable[str] = (), model_factors: Literal['auto'] | Iterable[str] = 'auto', r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, random_effect: str | None = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the Limma-Voom pipeline. The count matrix you are analyzing should be normalized (typically to Reads Per Million). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
-
@@ -1497,7 +1497,7 @@
- -differential_expression_limma_voom_simplified(design_matrix: Union[str, Path], comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', output_folder: Optional[Union[str, Path]] = None, random_effect: Optional[str] = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...] +differential_expression_limma_voom_simplified(design_matrix: str | Path, comparisons: Iterable[Tuple[str, str, str]], r_installation_folder: str | Path | Literal['auto'] = 'auto', output_folder: str | Path | None = None, random_effect: str | None = None, quality_weights: bool = False, return_design_matrix: bool = False, return_code: bool = False) Tuple[DESeqFilter, ...]
Run differential expression analysis on the count matrix using the Limma-Voom pipeline. The simplified mode supports only pairwise comparisons. The count matrix you are analyzing should be normalized (typically to Reads Per Million). The analysis will be based on a design matrix supplied by the user. The design matrix should contain at least two columns: the first column contains all the sample names, and each of the following columns contains an experimental design factor (e.g. ‘condition’, ‘replicate’, etc). (see the User Guide and Tutorial for a complete example). The analysis formula will contain all the factors in the design matrix. To run this function, a version of R must be installed.
-
@@ -1556,7 +1556,7 @@
- Parameters:
-
-
columns (str or list of str) – The names of the column/columns to be dropped fro mthe table.
+columns (str or list of str) – The names of the column/columns to be dropped fro mthe table.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1568,7 +1568,7 @@ - -enhanced_box_plot(samples: Union[GroupedColumns, Literal['all']] = 'all', scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure +enhanced_box_plot(samples: GroupedColumns | Literal['all'] = 'all', scatter: bool = False, ylabel: str = 'log10(Normalized reads + 1)', title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float = 16, tick_fontsize: float = 12) Figure
Generates an enhanced box-plot of the specified samples in the CountFilter object in log10 scale. Can plot both single samples and average multiple replicates. It is recommended to use this function on normalized values and not on absolute read values. The box indicates 25% and 75% percentiles, and the white dot indicates the median.
rnalysis.filtering module
rnalysis.filtering module
- -filter_biotype_from_gtf(gtf_path: Union[str, Path], biotype: Union[Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'], str, List[str]] = 'protein_coding', attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', opposite: bool = False, inplace: bool = True) +filter_biotype_from_gtf(gtf_path: str | Path, biotype: Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'] | str | List[str] = 'protein_coding', attribute_name: Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'] | str = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', opposite: bool = False, inplace: bool = True)
Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.
- Parameters:
gtf_path (str or Path) – Path to your GTF (Gene transfer format) file. The file should match the type of gene names/IDs you use in your table, and should contain an attribute describing biotype.
-biotype (str or list of strings) – the biotypes which will not be filtered out.
+biotype (str or list of strings) – the biotypes which will not be filtered out.
attribute_name (str (default='gene_biotype')) – name of the attribute in your GTF file that describes feature biotype.
feature_type ('gene' or 'transcript' (default='gene')) – determined whether the features/rows in your data table describe individual genes or transcripts.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
@@ -1614,12 +1614,12 @@ - -filter_biotype_from_ref_table(biotype: Union[Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'], str, List[str]] = 'protein_coding', ref: Union[str, Path, Literal['predefined']] = 'predefined', opposite: bool = False, inplace: bool = True) +filter_biotype_from_ref_table(biotype: Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'] | str | List[str] = 'protein_coding', ref: str | Path | Literal['predefined'] = 'predefined', opposite: bool = False, inplace: bool = True)
Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a Biotype Reference Table supplied by the user.
- Parameters:
-
-
biotype (string or list of strings) – the biotypes which will not be filtered out.
+biotype (string or list of strings) – the biotypes which will not be filtered out.
ref – Name of the biotype reference file used to determine biotypes. Default is the path defined by the user in the settings.yaml file.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1645,12 +1645,12 @@ - -filter_by_attribute(attributes: Union[str, List[str]] = None, mode: Literal['union', 'intersection'] = 'union', ref: Union[str, Path, Literal['predefined']] = 'predefined', opposite: bool = False, inplace: bool = True) +filter_by_attribute(attributes: str | List[str] = None, mode: Literal['union', 'intersection'] = 'union', ref: str | Path | Literal['predefined'] = 'predefined', opposite: bool = False, inplace: bool = True)
Filters features according to user-defined attributes from an Attribute Reference Table. When multiple attributes are given, filtering can be done in ‘union’ mode (where features that belong to at least one attribute are not filtered out), or in ‘intersection’ mode (where only features that belong to ALL attributes are not filtered out). To learn more about user-defined attributes and Attribute Reference Tables, read the user guide.
- Parameters:
-
-
attributes (string or list of strings, which are column titles in the user-defined Attribute Reference Table.) – attributes to filter by.
+attributes (string or list of strings, which are column titles in the user-defined Attribute Reference Table.) – attributes to filter by.
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
ref (str or pathlib.Path (default='predefined')) – filename/path of the attribute reference table to be used as reference.
opposite (bool (default=False)) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
@@ -1701,12 +1701,12 @@ - -filter_by_go_annotations(go_ids: Union[str, List[str]], mode: Literal['union', 'intersection'] = 'union', organism: Union[str, int, Literal['auto'], Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', propagate_annotations: bool = True, evidence_types: Union[Literal['any', 'experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'], Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']]] = 'any', excluded_evidence_types: Union[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'], Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']]] = (), databases: Union[str, Iterable[str]] = 'any', excluded_databases: Union[str, Iterable[str]] = (), qualifiers: Union[Literal['any', 'not', 'contributes_to', 'colocalizes_with'], Iterable[Literal['not', 'contributes_to', 'colocalizes_with']]] = 'any', excluded_qualifiers: Union[Literal['not', 'contributes_to', 'colocalizes_with'], Iterable[Literal['not', 'contributes_to', 'colocalizes_with']]] = 'not', opposite: bool = False, inplace: bool = True) +filter_by_go_annotations(go_ids: str | List[str], mode: Literal['union', 'intersection'] = 'union', organism: str | int | Literal['auto'] | Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', propagate_annotations: bool = True, evidence_types: Literal['any', 'experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'] | Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']] = 'any', excluded_evidence_types: Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic'] | Iterable[Literal['experimental', 'phylogenetic', 'computational', 'author', 'curator', 'electronic']] = (), databases: str | Iterable[str] = 'any', excluded_databases: str | Iterable[str] = (), qualifiers: Literal['any', 'not', 'contributes_to', 'colocalizes_with'] | Iterable[Literal['not', 'contributes_to', 'colocalizes_with']] = 'any', excluded_qualifiers: Literal['not', 'contributes_to', 'colocalizes_with'] | Iterable[Literal['not', 'contributes_to', 'colocalizes_with']] = 'not', opposite: bool = False, inplace: bool = True)
Filters genes according to GO annotations, keeping only genes that are annotated with a specific GO term. When multiple GO terms are given, filtering can be done in ‘union’ mode (where genes that belong to at least one GO term are not filtered out), or in ‘intersection’ mode (where only genes that belong to ALL GO terms are not filtered out).
- Parameters:
-
-
go_ids (str or list of str) –
+go_ids (str or list of str)
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
@@ -1740,12 +1740,12 @@ - -filter_by_kegg_annotations(kegg_ids: Union[str, List[str]], mode: Literal['union', 'intersection'] = 'union', organism: Union[str, int, Literal['auto'], Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', opposite: bool = False, inplace: bool = True) +filter_by_kegg_annotations(kegg_ids: str | List[str], mode: Literal['union', 'intersection'] = 'union', organism: str | int | Literal['auto'] | Literal['Arabodopsis thaliana', 'Caenorhabditis elegans', 'Danio rerio', 'Drosophila melanogaster', 'Escherichia coli', 'Homo sapiens', 'Mus musculus', 'Saccharomyces cerevisiae', 'Schizosaccharomyces pombe'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', opposite: bool = False, inplace: bool = True)
Filters genes according to KEGG pathways, keeping only genes that belong to specific KEGG pathway. When multiple KEGG IDs are given, filtering can be done in ‘union’ mode (where genes that belong to at least one pathway are not filtered out), or in ‘intersection’ mode (where only genes that belong to ALL pathways are not filtered out).
- Parameters:
-
-
kegg_ids (str or list of str) – the KEGG pathway IDs according to which the table will be filtered. An example for a legal KEGG pathway ID would be ‘path:cel04020’ for the C. elegans calcium signaling pathway.
+kegg_ids (str or list of str) – the KEGG pathway IDs according to which the table will be filtered. An example for a legal KEGG pathway ID would be ‘path:cel04020’ for the C. elegans calcium signaling pathway.
mode ('union' or 'intersection'.) – If ‘union’, filters out every genomic feature that does not belong to one or more of the indicated attributes. If ‘intersection’, filters out every genomic feature that does not belong to ALL of the indicated attributes.
@@ -1763,12 +1763,12 @@ - -filter_by_row_name(row_names: Union[str, List[str]], opposite: bool = False, inplace: bool = True) +filter_by_row_name(row_names: str | List[str], opposite: bool = False, inplace: bool = True)
Filter out specific rows from the table by their name (index).
- Parameters:
-
-
row_names (str or list of str) – list of row names to be removed from the table.
+row_names (str or list of str) – list of row names to be removed from the table.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
rnalysis.filtering module
- -filter_missing_values(columns: Union[ColumnNames, Literal['all']] = 'all', opposite: bool = False, inplace: bool = True)
+filter_missing_values(columns: ColumnNames | Literal['all'] = 'all', opposite: bool = False, inplace: bool = True)Remove all rows whose values in the specified columns are missing (NaN).
rnalysis.filtering module
- -filter_top_n(by: ColumnNames, n: PositiveInt = 100, ascending: Union[bool, List[bool]] = True, na_position: str = 'last', opposite: bool = False, inplace: bool = True)
+filter_top_n(by: ColumnNames, n: PositiveInt = 100, ascending: bool | List[bool] = True, na_position: str = 'last', opposite: bool = False, inplace: bool = True)Sort the rows by the values of specified column or columns, then keep only the top ‘n’ rows.
- Parameters:
-
-
by (name of column/columns (str/List[str])) – Names of the column or columns to sort and then filter by.
+by (name of column/columns (str/List[str])) – Names of the column or columns to sort and then filter by.
n (int) – How many features to keep in the Filter object.
-ascending (bool or list of bools (default=True)) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
+ascending (bool or list of bools (default=True)) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, it must have the same length as ‘by’.
na_position ('first' or 'last', default 'last') – If ‘first’, puts NaNs at the beginning; if ‘last’, puts NaNs at the end.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.
@@ -1952,7 +1952,7 @@ - -find_paralogs_ensembl(organism: Union[Literal['auto'], str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_percent_identity: bool = True) +find_paralogs_ensembl(organism: Literal['auto'] | str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_percent_identity: bool = True)
Find paralogs within the same species using the Ensembl database.
- Parameters: @@ -1970,7 +1970,7 @@
- -find_paralogs_panther(organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto') +find_paralogs_panther(organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto')
Find paralogs within the same species using the PantherDB database.
- Parameters: @@ -1997,8 +1997,8 @@
- Parameters:
-
-
numerator (str, or list of strs) – the CountFilter columns to be used as the numerator. If multiple arguments are given in a list, they will be averaged.
-denominator (str, or list of strs) – the CountFilter columns to be used as the denominator. If multiple arguments are given in a list, they will be averaged.
+numerator (str, or list of strs) – the CountFilter columns to be used as the numerator. If multiple arguments are given in a list, they will be averaged.
+denominator (str, or list of strs) – the CountFilter columns to be used as the denominator. If multiple arguments are given in a list, they will be averaged.
numer_name (str or 'default') – name to give the numerator condition. If ‘default’, the name will be generarated automatically from the names of numerator columns.
denom_name (str or 'default') – name to give the denominator condition. If ‘default’, the name will be generarated automatically from the names of denominator columns.
rnalysis.filtering module
- -intersection(*others: Union[Filter, set], return_type: Literal['set', 'str'] = 'set', inplace: bool = False)
+intersection(*others: Filter | set, return_type: Literal['set', 'str'] = 'set', inplace: bool = False)Keep only the features that exist in ALL of the given Filter objects/sets. Can be done either inplace on the first Filter object, or return a set/string of features.
- Parameters: @@ -2232,18 +2232,18 @@
- -ma_plot(ref_column: Union[Literal['auto'], ColumnName] = 'auto', columns: Union[ColumnNames, Literal['all']] = 'all', split_plots: bool = False, title: Union[str, Literal['auto']] = 'auto', title_fontsize: float = 20, label_fontsize: Union[float, Literal['auto']] = 'auto', tick_fontsize: float = 12) List[Figure] +ma_plot(ref_column: Literal['auto'] | ColumnName = 'auto', columns: ColumnNames | Literal['all'] = 'all', split_plots: bool = False, title: str | Literal['auto'] = 'auto', title_fontsize: float = 20, label_fontsize: float | Literal['auto'] = 'auto', tick_fontsize: float = 12) List[Figure]
Generates M-A (log-ratio vs. log-intensity) plots for selected columns in the dataset. This plot is particularly useful for indicating whether a dataset is properly normalized.
- Parameters:
-
-
ref_column (name of a column or 'auto' (default='auto')) – the column to be used as reference for MA plot. If ‘auto’, then the reference column will be chosen automatically to be the column whose upper quartile is closest to the mean upper quartile.
-columns (str or list of str) – A list of the column names to generate an MA plot for.
+ref_column (name of a column or 'auto' (default='auto')) – the column to be used as reference for MA plot. If ‘auto’, then the reference column will be chosen automatically to be the column whose upper quartile is closest to the mean upper quartile.
+columns (str or list of str) – A list of the column names to generate an MA plot for.
split_plots (bool (default=False)) – if True, each individual MA plot will be plotted in its own Figure. Otherwise, all MA plots will be plotted on the same Figure.
title (str or 'auto' (default='auto')) – The title of the plot. If ‘auto’, a title will be generated automatically.
title_fontsize (float (default=30)) – determines the font size of the graph title.
label_fontsize (float (default=15) -:param tick_fontsize: determines the font size of the X and Y tick labels.) – determines the font size of the X and Y axis labels.
+:param tick_fontsize: determines the font size of the X and Y tick labels.) – determines the font size of the X and Y axis labels.
- Return type: @@ -2254,7 +2254,7 @@
- -majority_vote_intersection(*others: Union[Filter, set], majority_threshold: float = 0.5, return_type: Literal['set', 'str'] = 'set') +majority_vote_intersection(*others: Filter | set, majority_threshold: float = 0.5, return_type: Literal['set', 'str'] = 'set')
Returns a set/string of the features that appear in at least (majority_threhold * 100)% of the given Filter objects/sets. Majority-vote intersection with majority_threshold=0 is equivalent to Union. Majority-vote intersection with majority_threshold=1 is equivalent to Intersection.
- Parameters: @@ -2286,7 +2286,7 @@
- -map_orthologs_ensembl(map_to_organism: Union[str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']], map_from_organism: Union[Literal['auto'], str, int, Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_percent_identity: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True) +map_orthologs_ensembl(map_to_organism: str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'], map_from_organism: Literal['auto'] | str | int | Literal['Acanthochromis polyacanthus', 'Accipiter nisus', 'Ailuropoda melanoleuca', 'Amazona collaria', 'Amphilophus citrinellus', 'Amphiprion ocellaris', 'Amphiprion percula', 'Anabas testudineus', 'Anas platyrhynchos', 'Anas platyrhynchos platyrhynchos', 'Anas zonorhyncha', 'Anolis carolinensis', 'Anser brachyrhynchus', 'Anser cygnoides', 'Aotus nancymaae', 'Apteryx haastii', 'Apteryx owenii', 'Apteryx rowi', 'Aquila chrysaetos chrysaetos', 'Astatotilapia calliptera', 'Astyanax mexicanus', 'Astyanax mexicanus pachon', 'Athene cunicularia', 'Balaenoptera musculus', 'Betta splendens', 'Bison bison bison', 'Bos grunniens', 'Bos indicus hybrid', 'Bos mutus', 'Bos taurus', 'Bos taurus hybrid', 'Bubo bubo', 'Buteo japonicus', 'Caenorhabditis elegans', 'Cairina moschata domestica', 'Calidris pugnax', 'Calidris pygmaea', 'Callithrix jacchus', 'Callorhinchus milii', 'Camarhynchus parvulus', 'Camelus dromedarius', 'Canis lupus dingo', 'Canis lupus familiaris', 'Canis lupus familiarisbasenji', 'Canis lupus familiarisboxer', 'Canis lupus familiarisgreatdane', 'Canis lupus familiarisgsd', 'Capra hircus', 'Capra hircus blackbengal', 'Carassius auratus', 'Carlito syrichta', 'Castor canadensis', 'Catagonus wagneri', 'Catharus ustulatus', 'Cavia aperea', 'Cavia porcellus', 'Cebus imitator', 'Cercocebus atys', 'Cervus hanglu yarkandensis', 'Chelonoidis abingdonii', 'Chelydra serpentina', 'Chinchilla lanigera', 'Chlorocebus sabaeus', 'Choloepus hoffmanni', 'Chrysemys picta bellii', 'Chrysolophus pictus', 'Ciona intestinalis', 'Ciona savignyi', 'Clupea harengus', 'Colobus angolensis palliatus', 'Corvus moneduloides', 'Cottoperca gobio', 'Coturnix japonica', 'Cricetulus griseus chok1gshd', 'Cricetulus griseus crigri', 'Cricetulus griseus picr', 'Crocodylus porosus', 'Cyanistes caeruleus', 'Cyclopterus lumpus', 'Cynoglossus semilaevis', 'Cyprinodon variegatus', 'Cyprinus carpio carpio', 'Cyprinus carpio germanmirror', 'Cyprinus carpio hebaored', 'Cyprinus carpio huanghe', 'Danio rerio', 'Dasypus novemcinctus', 'Delphinapterus leucas', 'Denticeps clupeoides', 'Dicentrarchus labrax', 'Dipodomys ordii', 'Dromaius novaehollandiae', 'Drosophila melanogaster', 'Echeneis naucrates', 'Echinops telfairi', 'Electrophorus electricus', 'Eptatretus burgeri', 'Equus asinus', 'Equus caballus', 'Erinaceus europaeus', 'Erpetoichthys calabaricus', 'Erythrura gouldiae', 'Esox lucius', 'Falco tinnunculus', 'Felis catus', 'Ficedula albicollis', 'Fukomys damarensis', 'Fundulus heteroclitus', 'Gadus morhua', 'Gadus morhua gca010882105v1', 'Gallus gallus', 'Gallus gallus gca000002315v5', 'Gallus gallus gca016700215v2', 'Gambusia affinis', 'Gasterosteus aculeatus', 'Gasterosteus aculeatus gca006229185v1', 'Gasterosteus aculeatus gca006232265v1', 'Gasterosteus aculeatus gca006232285v1', 'Geospiza fortis', 'Gopherus agassizii', 'Gopherus evgoodei', 'Gorilla gorilla', 'Gouania willdenowi', 'Haplochromis burtoni', 'Heterocephalus glaber female', 'Heterocephalus glaber male', 'Hippocampus comes', 'Homo sapiens', 'Hucho hucho', 'Ictalurus punctatus', 'Ictidomys tridecemlineatus', 'Jaculus jaculus', 'Junco hyemalis', 'Kryptolebias marmoratus', 'Labrus bergylta', 'Larimichthys crocea', 'Lates calcarifer', 'Laticauda laticaudata', 'Latimeria chalumnae', 'Lepidothrix coronata', 'Lepisosteus oculatus', 'Leptobrachium leishanense', 'Lonchura striata domestica', 'Loxodonta africana', 'Lynx canadensis', 'Macaca fascicularis', 'Macaca mulatta', 'Macaca nemestrina', 'Malurus cyaneus samueli', 'Manacus vitellinus', 'Mandrillus leucophaeus', 'Marmota marmota marmota', 'Mastacembelus armatus', 'Maylandia zebra', 'Meleagris gallopavo', 'Melopsittacus undulatus', 'Meriones unguiculatus', 'Mesocricetus auratus', 'Microcebus murinus', 'Microtus ochrogaster', 'Mola mola', 'Monodelphis domestica', 'Monodon monoceros', 'Monopterus albus', 'Moschus moschiferus', 'Mus caroli', 'Mus musculus', 'Mus musculus 129s1svimj', 'Mus musculus aj', 'Mus musculus akrj', 'Mus musculus balbcj', 'Mus musculus c3hhej', 'Mus musculus c57bl6nj', 'Mus musculus casteij', 'Mus musculus cbaj', 'Mus musculus dba2j', 'Mus musculus fvbnj', 'Mus musculus lpj', 'Mus musculus nodshiltj', 'Mus musculus nzohlltj', 'Mus musculus pwkphj', 'Mus musculus wsbeij', 'Mus pahari', 'Mus spicilegus', 'Mus spretus', 'Mustela putorius furo', 'Myotis lucifugus', 'Myripristis murdjan', 'Naja naja', 'Nannospalax galili', 'Neogobius melanostomus', 'Neolamprologus brichardi', 'Neovison vison', 'Nomascus leucogenys', 'Notamacropus eugenii', 'Notechis scutatus', 'Nothobranchius furzeri', 'Nothoprocta perdicaria', 'Numida meleagris', 'Ochotona princeps', 'Octodon degus', 'Oncorhynchus kisutch', 'Oncorhynchus mykiss', 'Oncorhynchus tshawytscha', 'Oreochromis aureus', 'Oreochromis niloticus', 'Ornithorhynchus anatinus', 'Oryctolagus cuniculus', 'Oryzias javanicus', 'Oryzias latipes', 'Oryzias latipes hni', 'Oryzias latipes hsok', 'Oryzias melastigma', 'Oryzias sinensis', 'Otolemur garnettii', 'Otus sunia', 'Ovis aries', 'Ovis aries rambouillet', 'Pan paniscus', 'Pan troglodytes', 'Panthera leo', 'Panthera pardus', 'Panthera tigris altaica', 'Papio anubis', 'Parambassis ranga', 'Paramormyrops kingsleyae', 'Parus major', 'Pavo cristatus', 'Pelodiscus sinensis', 'Pelusios castaneus', 'Periophthalmus magnuspinnatus', 'Peromyscus maniculatus bairdii', 'Petromyzon marinus', 'Phascolarctos cinereus', 'Phasianus colchicus', 'Phocoena sinus', 'Physeter catodon', 'Piliocolobus tephrosceles', 'Podarcis muralis', 'Poecilia formosa', 'Poecilia latipinna', 'Poecilia mexicana', 'Poecilia reticulata', 'Pogona vitticeps', 'Pongo abelii', 'Procavia capensis', 'Prolemur simus', 'Propithecus coquereli', 'Pseudonaja textilis', 'Pteropus vampyrus', 'Pundamilia nyererei', 'Pygocentrus nattereri', 'Rattus norvegicus', 'Rattus norvegicus shrspbbbutx', 'Rattus norvegicus shrutx', 'Rattus norvegicus wkybbb', 'Rhinolophus ferrumequinum', 'Rhinopithecus bieti', 'Rhinopithecus roxellana', 'Saccharomyces cerevisiae', 'Saimiri boliviensis boliviensis', 'Salarias fasciatus', 'Salmo salar', 'Salmo salar gca021399835v1', 'Salmo salar gca923944775v1', 'Salmo salar gca931346935v2', 'Salmo trutta', 'Salvator merianae', 'Sander lucioperca', 'Sarcophilus harrisii', 'Sciurus vulgaris', 'Scleropages formosus', 'Scophthalmus maximus', 'Serinus canaria', 'Seriola dumerili', 'Seriola lalandi dorsalis', 'Sinocyclocheilus anshuiensis', 'Sinocyclocheilus grahami', 'Sinocyclocheilus rhinocerous', 'Sorex araneus', 'Sparus aurata', 'Spermophilus dauricus', 'Sphaeramia orbicularis', 'Sphenodon punctatus', 'Stachyris ruficeps', 'Stegastes partitus', 'Strigops habroptila', 'Strix occidentalis caurina', 'Struthio camelus australis', 'Suricata suricatta', 'Sus scrofa', 'Sus scrofa bamei', 'Sus scrofa berkshire', 'Sus scrofa hampshire', 'Sus scrofa jinhua', 'Sus scrofa landrace', 'Sus scrofa largewhite', 'Sus scrofa meishan', 'Sus scrofa pietrain', 'Sus scrofa rongchang', 'Sus scrofa tibetan', 'Sus scrofa usmarc', 'Sus scrofa wuzhishan', 'Taeniopygia guttata', 'Takifugu rubripes', 'Terrapene carolina triunguis', 'Tetraodon nigroviridis', 'Theropithecus gelada', 'Tupaia belangeri', 'Tursiops truncatus', 'Urocitellus parryii', 'Ursus americanus', 'Ursus maritimus', 'Ursus thibetanus thibetanus', 'Varanus komodoensis', 'Vicugna pacos', 'Vombatus ursinus', 'Vulpes vulpes', 'Xenopus tropicalis', 'Xiphophorus couchianus', 'Xiphophorus maculatus', 'Zalophus californianus', 'Zonotrichia albicollis', 'Zosterops lateralis melanops'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_percent_identity: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the Ensembl database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters: @@ -2308,7 +2308,7 @@
- -map_orthologs_orthoinspector(map_to_organism: Union[str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']], map_from_organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True) +map_orthologs_orthoinspector(map_to_organism: str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'], map_from_organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the OrthoInspector database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters: @@ -2329,7 +2329,7 @@
- -map_orthologs_panther(map_to_organism: Union[str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']], map_from_organism: Union[Literal['auto'], str, int, Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium']] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', filter_least_diverged: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True) +map_orthologs_panther(map_to_organism: str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'], map_from_organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', filter_least_diverged: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the PantherDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
- Parameters: @@ -2351,7 +2351,7 @@
- -map_orthologs_phylomedb(map_to_organism: Union[str, int, Literal], map_from_organism: Union[Literal['auto'], str, int, Literal] = 'auto', gene_id_type: Union[str, Literal['auto'], Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB']] = 'auto', consistency_score_threshold: Fraction = 0.5, filter_consistency_score: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True) +map_orthologs_phylomedb(map_to_organism: str | int | Literal, map_from_organism: Literal['auto'] | str | int | Literal = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', consistency_score_threshold: Fraction = 0.5, filter_consistency_score: bool = True, non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)
Map genes to their nearest orthologs in a different species using the PhylomeDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
-
@@ -2419,7 +2419,7 @@
- Parameters:
-
-
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.filtering module
rnalysis.fastq module
input_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to validate.
rnalysis.fastq module
- rnalysis.filtering module
+rnalysis.filtering module
This module can filter, normalize, intersect and visualize tabular data such as read counts and differential expression data. Any tabular data saved in a csv format can be imported. Use this module to perform various filtering operations on your data, normalize your data, perform set operations (union, intersection, etc), run basic exploratory analyses and plots (such as PCA, clustergram, violin plots, scatter, etc), save the filtered data to your computer, and more. When you save filtered/modified data, its new file name will include by default all of the operations performed on it, in the order they were performed, to allow easy traceback of your analyses.
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
input_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to sort.
rnalysis.fastq module
rnalysis.fastq module
gtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.
rnalysis.fastq module
rnalysis.fastq module
gtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
index_file (str or Path) –
Path to a pre-built bowtie2 index of the target genome. Can either be downloaded from the bowtie2 website (menu on the right), or generated manually from FASTA files using the function ‘bowtie2_create_index’. Note that bowtie2 indices are composed of multiple files ending with the ‘.bt2’ suffix. All of those files should be in the same location. It is enough to specify the path to one of those files (for example, ‘path/to/index.1.bt2’), or to the main name of the index (for example, ‘path/to/index’).
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
Bases:
GenericPipeline
,ABC
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
rnalysis.fastq module
Bases:
_FASTQPipeline
rnalysis.fastq module
rnalysis.fastq module