There are multiple ways to upload raw data to ArrayExpress, here is a short tutorial on how to do so from the command line with the FTP protocol. It is based on this official documentation with small amendments and explanations.
Manual
“Manual” basically means that you will be typing the commands below within your terminal. If your files are large and you do not want risking your upload process be broken due to disconnecting from the server, you can check the alternative section below.
Log into Annotare and create a new experiment submission.
cdinto the directory where the data are stored.Connect to ArrayExpress and enter username (
annotare) password (annotare1) combination.
- Within the ArrayExpress server,
cdinto the directory ArrayExpress assigned for your repo.
putfiles into the remote directory you just navigated to. If usingmput(for multiple files instead of just one) overput, first enterpromptto turn the interactive mode off so that there will be no need to confirm transfers.
Automated
I have never experienced issues with “typical” bulk RNA-seq experiments where total data size is around 200-300 GB and distributed over 20-30 files/samples. If you would like to be able utilize nohup so that your upload process is not disturbed due to disconnecting from the server you can try the following:
- Create a
.netrcfile within your home directory. This will enable bypassing username and password.ftpchecks for this file and uses the username and password combination specified for the listed ftp servers within this file.
- Populate
~/.netrcwith server name, user name and password. For ArrayExpress you can use the following:
- Change the permissions so that just the owner can read the file:
This does not matter if the only entry within ~/.netrc is ArrayExpress’ ftp server info, it is public anyway.
- Navigate to the folder with the raw data to be uploaded to ArrayExpress, create a
bashscript and make it executable:
Normally it is advised for executable bash scripts not to have any extension but I prefer to have the .sh extension for my bash scripts so that I know what I am executing.