Getting started with swisspalmR
The purpose of the swisspalmR package is to make data from SwissPalm, a database on protein S-palmitoylation more accessible to R users. This is achieved by using httr2 to mimic the HTTP requests usually sent by a browser to SwissPalm. Check out the vignette on implementation details for more information.
The SwissPalm website
SwissPalm data is easily accessed through web browsers, with a simple interface. Users may perform several actions:
- Use the central text bar and ‘Search’ button to get palmitoylation data for a few protein/gene names, separated by commas
- Use the ‘Batch search’ button to upload a file with many gene identifiers at once
- Use the ‘Palmitoyl-proteome comparison’ button to perform check a user’s protein/gene list against published palmitoyl-proteomes for a variety of species.
The header bar contains even more options, which you can check out yourself below:
The SwissPalm database is an excellent source of information for researchers interested in S-palmitoylation, but the database has not been made easily accessible before in R. The swisspalmR package resolves this by making the same HTTP requests used by the SwissPalm website to send and then retrieve S-palmitoylation data in web browsers.
Using swisspalmR
To use swisspalmR, install the package from GitHub.
# install.packages("devtools")
devtools::install_github("simpar1471/swisspalmR")
At present, swisspalmR has one function, swissPalm()
,
which can retrieve the protein-level information on
S-palmitoylation which is usually accessible from https://www.swisspalm.org/proteins. This function
returns a 25-column data frame with protein information (if available)
which includes:
- Which organism the queried gene/protein ID is from
- Whether the protein/gene associated with a queried ID is reviewed in UniProt
- Whether the protein is palmitolyated, and if so:
- At which residues palmitoylation occurs
- The number of cysteine residues
- Which other PATs/APTs palmitoylate/depalmitoylate the queried protein
This information and other information from swissPalm()
is useful for determining the pattern of palmitoylation in your own
samples.
To get data on S-palmitoylation irrespective of any other
factors, provide a character vector of protein identifiers to
swissPalm()
. These identifiers should be in one of the
formats in the dropdown menu below:
Valid formats
- UniProt AC
- UniProt secondary AC
- UniProt ID
- UniProt gene name
- Ensembl protein
- Ensembl gene
- Refseq protein ID
- IPI ID
- UniGene ID
- PomBase ID
- MGI ID
- RGD ID
- TAIR protein ID
- EuPathDb ID
Let’s say you have a UniProt accession, a UniProt ID, and an Ensembl
ID. As SwissPalm recognises all of these, swissPalm()
will
give you information on each. Where a single gene name
(e.g. "CALML5"
) maps onto multiple proteins, each of these
proteins will have their own row.
inputs <- c("Q4WCM2", "ENSP00000453745", "CALML5")
# Include only first five columns to restrict printed output
swissPalm(inputs)[, 1:5]
#> Query_identifier UniProt_AC UniProt_ID UniProt_status
#> 1 ENSP00000453745 P08912 ACM5_HUMAN Reviewed
#> 2 CALML5 Q9NZT1 CALL5_HUMAN Reviewed
#> 3 CALML5 E2RHU8 E2RHU8_CANLF Unreviewed
#> 4 CALML5 A4IFQ6 A4IFQ6_BOVIN Unreviewed
#> 5 Q4WCM2 Q4WCM2 Q4WCM2_ASPFU Unreviewed
#> 6 CALML5 F7HMI0 F7HMI0_MACMU Unreviewed
#> 7 CALML5 F6ZF46 F6ZF46_HORSE Unreviewed
#> 8 CALML5 G1T3Q4 G1T3Q4_RABIT Unreviewed
#> 9 CALML5 F1RYW2 F1RYW2_PIG Unreviewed
#> Organism
#> 1 Homo sapiens
#> 2 Homo sapiens
#> 3 Canis familiaris
#> 4 Bos taurus
#> 5 Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163)
#> 6 Macaca mulatta
#> 7 Equus caballus
#> 8 Oryctolagus cuniculus
#> 9 Sus scrofa
Any query identifiers not found in SwissPalm will have
NA
in most columns, but the Found_in_SwissPalm
column will tell you why they were not found in the SwissPalm
search:
Limiting your searches
Sometimes, you may want to limit the search space to only the species
against which you are checking, or to only check for data from specific
sources. The species
and dataset
parameters in
swissPalm()
let you do this.
You can check the package objects swisspalmR::species
or
swisspalmR::datasets
to see what the available
species
/dataset
parameter values are:
# There are 92 species available as of Sep 12th 2023
swisspalmR::species[1:5]
#> All species in SwissPalm Homo sapiens Mus musculus
#> "" "1" "2"
#> Rattus norvegicus Saccharomyces cerevisiae
#> "3" "6"
# There are 7 datasets available as of Sep 12th 2023
swisspalmR::datasets
#> Dataset 1: All proteins
#> "all"
#> Dataset 2: Proteins predicted to be palmitoylated
#> "pred"
#> Dataset 3: Palmitoylation validated or found in at least 1 palmitoyl-proteome (SwissPalm annotated)
#> "palm"
#> Dataset 4: Palmitoylation validated proteins
#> "targ"
#> Dataset 5: Palmitoylation validated proteins or found in palmitoyl-proteomes using 2 independent methods
#> "meth"
#> Dataset 6: Found in palmitoyl-proteomes using 2 independent methods
#> "meth2"
#> Dataset 7: Dataset 6 grouped by gene
#> "validated_dataset"
For example, a gene name can be shared by many species. I could
filter the results of a SwissPalm search down to a species of interest,
such as horses, by telling swissPalm()
I want only that
data:
gene_names <- c("ADAMTS1", "AGL", "ANGPTL4", "CALML5", "CEP131", "CD70", "CD97")
swissPalm(gene_names, species = swisspalmR::species["Equus caballus"])[, 1:5]
#> Query_identifier UniProt_AC UniProt_ID UniProt_status Organism
#> 1 AGL A8BQB4 GDE_HORSE Reviewed Equus caballus
#> 2 CD70 F6XHL7 F6XHL7_HORSE Unreviewed Equus caballus
#> 3 CALML5 F6ZF46 F6ZF46_HORSE Unreviewed Equus caballus
#> 4 ADAMTS1 <NA> <NA> <NA> <NA>
#> 5 ANGPTL4 <NA> <NA> <NA> <NA>
#> 6 CD97 <NA> <NA> <NA> <NA>
#> 7 CEP131 <NA> <NA> <NA> <NA>
I can also further limit SwissPalm to only giving me information when
proteins were predicted to be S-palmitoylated using the
dataset
parameter:
gene_names <- c("ADAMTS1", "AGL", "ANGPTL4", "CALML5", "CEP131", "CD70", "CD97")
swissPalm(
gene_names,
species = swisspalmR::species["Equus caballus"],
dataset = swisspalmR::datasets["Dataset 2: Proteins predicted to be palmitoylated"]
)[, 1:5]
#> Query_identifier UniProt_AC UniProt_ID UniProt_status Organism
#> 1 AGL A8BQB4 GDE_HORSE Reviewed Equus caballus
#> 2 CD70 F6XHL7 F6XHL7_HORSE Unreviewed Equus caballus
#> 3 ADAMTS1 <NA> <NA> <NA> <NA>
#> 4 ANGPTL4 <NA> <NA> <NA> <NA>
#> 5 CALML5 <NA> <NA> <NA> <NA>
#> 6 CD97 <NA> <NA> <NA> <NA>
#> 7 CEP131 <NA> <NA> <NA> <NA>