LUMI Advanced Course

# LUMI Advanced Course 28.--31.10.2024 9:00--16:00 (CET) Zoom link: https://cscfi.zoom.us/j/65207108811?pwd=Mm8wZGUyNW1DQzdwL0hSY1VIMDBLQT09 :::info Please ask your questions at [the bottom of this document](#EOF) <-- click here ::: --- [TOC] ## General Information - Link to this document: [https://md.sigma2.no/lumi-general-course-oct24](https://md.sigma2.no/lumi-general-course-oct24?both) - [Schedule](https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20241028/schedule/) - Zoom link: https://cscfi.zoom.us/j/65207108811?pwd=Mm8wZGUyNW1DQzdwL0hSY1VIMDBLQT09 - On-site at SURF in Amsterdam - [SURF, Science Park 140, 1098 XG Amsterdam, The Netherlands](https://maps.app.goo.gl/GL1npKVy5Je5UFxJ7) - [More info including an excellent bar guide](https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20241028/#course-organisation) ## Events ## AI workshop __26.-27.11.24, Ostrava (Czech Republic)__ Moving your AI training jobs to LUMI: A Hands-On Workshop 2-day workshop on running pytorch based workflows on LUMI and scaling from 1 to multiple GPUs on one node. Register here: https://www.lumi-supercomputer.eu/events/lumi-ai-workshop-nov2024/ ### Next public HPC coffee break **8.11.24, 13:00--13:45 (CET), 14:00--14:45(EET)** Meet the LUMI user support team, discuss problems, give feedback or suggestions on how to improve services, and get advice for your projects. Usually every last Wednesday in a month. [Join via Zoom](https://cscfi.zoom.us/j/68857034104?pwd=UE9xV0FmemQ2QjZiQVFrbEpSSnVBQT09) ## Slides, exercises & recordings The slides and exercises will be made available during the course on LUMI at `/project/project_465001362/Slides/` and `/project/project_465001362/Exercises/`. Download them from LUMI with `scp` or sftp programs like Filezilla The training material including recordings will be published in the [LUMI Training Materials archive](https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20241028/). These pages also contain pointers to the recordings and a permanent place where the slides will be stored, also when the training project ends. The AMD exercises can be accessed [here]( https://hackmd.io/@sfantao/lumi-training-ams-2024). There is also a [wider range of exercises](https://md.sigma2.no/uploads/16d7dfc4-a716-46a5-9463-a643d01d1712.png) in the AMD Github project that you may choose to try, though they are not specifically tuned for LUMI. ## Q&A ### Course organisation + general questions 1. Why `lumi-workspaces` can just freeze after showing info about one of the projects, and thus not showing the rest? - It is using live calls to Lustre and we do have occasional freezes on the file system. It may have to do with that. There is an alternative command, `lumi-ldap-userinfo`, but that uses cached data so the delay can be up to one hour. And more if the synchronisation fails. I note that at the moment `lumi-workspaces` is particularly problematic so there may be something going on on the system. UPDATE: It turns out there are problems with the flash filesystem, so `lumi-workspaces` hangs when it tries to check the quota on that filesystem. 2. i think annotate is enabled in the screen share, might want to be careful - Thank you. We will check it for the next share. :::warning Did you manage to join the training project? Run `lumi-ldap-userinfo` and check for `project_465001362 - LUST Training / 2024-10-28-31 Advanced LUMI course Amsterdam`. Or maybe a little quicker: run ` groups` and check for `project_465001362` (which does not have any synchronisation delay). Put down an `x` Yes: xxxxxxxxxx No: I'm currently waiting: ::: ### LUMI hardware and architecture 3. Numa node = processors connected to the same socket ? - Not on AMD. There are 4 numa nodes in each socket as each socket is logically splitted in 4 quarters with really uniform memory access, but slightly longer delay to the other quarter. This is a BIOS setting though that is useful for HPC, but some AMD systems will be configured with one NUMA domain per socket (and a different but sligthly slower way of accessing memory) 4. Do we get access to the presentations? - Yes, they are uploaded after the presentations to the project at `/project/project_465001362/Slides/HPE/` and will be archived on LUMI, see the link more towards the top of this page. ### Programming environment and modules 5. Question about java availability 6. You said you never use the GNU toolchain. Is it because it deteriorates the performance? It's usually the most compatible toolchain for scientific codes. - (Question recopied) - I think you misunderstood. We never use the GNU compilers directly, but use them via the wrappers. However, performance wise, gfortran is bad compared to Cray Fortran. And the scientific community should get rid of GNU. We're one of the last communities still using it that much. E.g., all GPU compilers from vendors are based on LLVM with clang for C. For Fortran though, the LLVM ecosystem is still a bit messy. 7. It was mentioned that the Cray compiler uses LLVM as a backend. Would other languages that use LLVM like Julia and Rust have good performance out of the box or can the performance be improved by some configurations? - No really, for C/C++ there are some optimizations at the front-end level (see first presentation of this afternoon), then the standard LLVM is used (17.0.1) as backend. Note that we don't provide the entire LLVM stack. Another LLVM-based compiler is AMD. - Julia is not easy to build yourself so you typically have to do with the build the Julia people provide. Which is a pity. I have no doubts about the quality of the JIT in the Julia that you download, but I have some doubts about proper support for the Slingshot interconnect. - Thanks for the answers! 8. I am mostly interested in training large machine learning models using pytorch. Which parts of this training are most relevant for me? - Performance monitoring tools, in particular the AMD ones, the software stack talk tomorrow, and the demo on the last day. And of course you will also be using Slurm. If you need to extend your container with other software that you need to compile, then a lot more of the presentations become usefull. Moreover, you need to understand why containers that you simply download from Docker don't perform well on LUMI. These elements are in today's talks, but could have been highligted better. I'll try to highlight some in the software stack talk tomorrow. ### Running on Cray EX hardware 9. Is --gres=gpu:8 the same as --gpus-per-node=8 on lumi? - Harvey says yes ### Exercises :::info Exercise material is available on LUMI at `/project/project_465001362/Exercises/HPE/day1` For the duration of the course, the PDF files in that directory can be [viewed directly on the web via this link](https://462000265.lumidata.eu/4day-20241028/files/LUMI-4day-20241028-1_Exercises_day1.pdf) and the [second pdf here](https://462000265.lumidata.eu/4day-20241028/files/LUMI-4day-20241028-1_04a-ProgrammingModelExamples_SLURM.pdf). For running the exercises during the course, use the reservation `lumic_ams` (on the `standard` Slurm partition)) and partition or `source lumi_c.sh` (copy that from `/project/project_465001362/Exercises/HPE/lumi_c.sh`) - Copy the exercise files into your $HOME directory and if needed unpack the tar files with - `tar xf FILE.tar` - Use folder `ProgrammingModels` - Choose a different PrgEnv (cray, gnu or aocc) - Follow the README.md on how to compile ::: 10. I get `Batch job submission failed: No partition specified or system default partition` I have the following setting in the batch script: ```shell # Add any site options needed # SBATCH -A project_465001362 # SBATCH -p standard # SBATCH -q <qos> # SBATCH --reservation=lumic_ams ``` - Why do you specify `--qos`? That should not be necessary. Otherwise try sourcing the setup script at `project/project_465001362/Exercises/HPE/lumi_c.sh`. That sets some environmental variables that take precedence over the SBATCH parameters of the jobs script. - Thanks! ### Overview of Compilers and Parallel Programming Models 11. Together with Kokkos, Raja there is Alpaka as a portability library. Thank you for mentioning :) - Some of the slides probably come from the USA as HPE is an american company and they tend to not know Alpaka unfortunately ;-) 12. Wojtek Hellwing: exerices #2: is this an expected result that when I switch compilers from Cray to GNU I get pi_threads and pi_hybrid tests twice faster execution time? (i.e. cray binaries run slower than gnu compiled versions) - (Alfio) with a single thread? Do yo get any warning? I can speculate that CCE does a default binding, so all threads will use the same cores. In any case, the exercise is not meant to check the performance, rather to get familiar with modules and compilers. - (Kurt) A bit surprising, but I didn't check. The default OpenMP behaviour of CCE is actually very decent. On the other hand, the default in GNU is to use `-O0` so if you don't specify proper optimisation options, the GNU compiler tends to be the slowest. Note though that if you are running on the login nodes, timings can be unreproducible. - Wojtek: Thanks for your insights. I was suprised too. I get the speed-up for all models: serial (just 20% though), thread (OpenMP) and hybrid (mpi+OpenMP). I know it is just a simple test, but this was a funny result. Maybe the test is too small to show any scalability performance gains from Cray profiling. ### OpenMP and OpenAcc offloading 13. The storage space in the home directory is quite limited, and there are relatively few files. When I install custom software or use EasyBuild to install a few packages, I quickly run out of space. How can I manage this situation? and manage the number of files is relatively small.... - In our documentation we discuss several options: https://docs.lumi-supercomputer.eu/storage/#about-the-number-of-files-quota. It depends a little bit on why you run into the quota. If you have to many files because of a Python installation you can consider using containers. If it is during compilation you may consider using the scratch space. If the not the number of files, but the size is the problem you can look at using your project storage. In any case, it may be a good idea to install your Easybuild recipes in the project, for all members, instead of in your home folder. See also: https://docs.lumi-supercomputer.eu/software/installing/easybuild/#preparation-set-the-location-for-your-easybuild-installation. - The home directory should only be used for the things that really belong in a home directory as Linux puts them there (like caches). It should not be used to store data from your project or to install software. ### Advanced Placement 14. In the slide with the `lscpu | grep -Ei "CPU\ "` we saw this distribution of NUMA nodes CPUs. What do the 4 last rows refer to? e.g. "NUMA node0 CPU(s): 0-15, 64-79" - (alfio) these are the core ids, so node0 has cores with ids 0 to 15 and then 64 to 79 (hyperthread cores) - Ah okay, so it does not have to do with the distance of those nodes between each other, right? - No, the "relative" distance is the command `numactl -H` ` 15. Does `--exclusive` flag make the available memory on each node be distributed equally among CPUs defined by `--cpus-per-node`? Or do we still need to set it via `--mem-per-cpu` or something like that? - (alfio) exclusive is only to get the entire node, without sharing with other users. - (Kurt) Defaults set by the sysadmins still apply, so you may want to use, e.g., '--mem=0' to get all CPU memory on the node, or even better, `--mem=224G` or `--mem=480G` for regular LUMI-C and LUMI-G nodes respectively as that would protect you from getting nodes where a system memory leak may have reduced the amount of memory available to a user. Memory is always a pool per node and not limited per task. This is in fact essential to make communication through shared memory possible and is also why `--gpu-bind` in Slurm does not work: It currently creates a so-called cgroup per GPU which makes memory-to-memory communication impossible. 16. Why is `gpu_check` reporting Bus_ID dc? That's a PCI bridge, not a GPU (the GPU is de). Is it just a output issue in the `gpu_check` binary? - (Kurt) I'd have to check the code to see how the Bus_id is determined. It is basically the code of the ORNL hello_jobstep program a bit reworked with some extra output (but the determination of the ID is taken from the hello_jobstep code if I remember correclty). This will be something to look into after the course. It is strange that it is only for this GPU, the other ones correspond to what I see in `lstopo`. Bug located, just a constant in the code with the wrong value. It will be fixed next week or the week after, at the next update of the software stack. :::info #### Icebreaker question of the day How do you install software on LUMI? - Not specifically on LUMI, but in general in clusters and supercomputers I tend to rely on what is provided, and complement it with installing code from source. Plus, using pip and conda if using Python. - I typically use containerized applications with Apptainer. - I compile the codes I use. In case of Python, I use miniconda. - ... ::: #### Exercises :::info Exercise material is available on LUMI at `/project/project_465001362/Exercises/HPE/day2/gpu_perf_binding` For the duration of the course, the PDF files in that directory can be [viewed directly on the web via this link](https://462000265.lumidata.eu/4day-20241028/files/LUMI-4day-20241028-1_Exercises_day2.pdf) For running the exercises during the course, use the reservation `lumic_ams` (on the `standard` Slurm partition)) and partition or `source lumi_c.sh` (copy that from `/project/project_465001362/Exercises/HPE/lumi_c.sh`) - Copy the exercise files into your $HOME directory and if needed unpack the tar files with `cp -r /project/project_465001362/Exercises/HPE/day2/gpu_perf_binding $HOME` - Set the partition, reservation and project `source project/project_465001362/Exercises/HPE/lumi_g.sh` - Followed by `source gpu_env.sh` - Start with the `check` directory ::: ### HIP and ROCm 17. How SIMD registers at CU can be used explicitly inside the kernel? - Do you have something specific in your mind? The compiler automatically uses the registers with your code, depending how many variables you declare, if you have registers spilling etc. 18. Which module is needed to run rocminfo and rocm-smi on LUMI? - None. You just have to be on a GPU node. Both commands are available there. These commands are in `/usr/bin`. There are versions in all rocm modules also. 19. If an array is large and does not fit in device memory, can there be any benefit to sequentially move chunks of the array to the device for the computation? - You can declare the array either host or device, depending on what your algorithm does and send part of it to GPU. The allocation though if it does not fit, it will crash, so it needs a clever approach. Better have arrah_h for host and array_d for device - I mean if we deal with a large matrix vector multiplication and due to the limited memory of device we can not store the array one in the device, even if we use multiple GPUs, is there any suggestion to move chuncks? Indeed moving chuncks requires multiple data transfer. I just liked to know if there is any solution. - For matrix multiplicaiton and similar operations use libraries, do not you your own implementations and some should be parallelized for you by using multiple GPUs. I am not sure if they do not fit at all in a few GPUs, you can use unified memory (GPU+ CPU memory are together) but you will pay the cost of the slower data transfer. #### Exercises :::info Exercise material is available at https://hackmd.io/@sfantao/lumi-training-ams-2024 ::: 20. What could be the reason of this error at Exercises/AMD/HPCTrainingExamples/HIP/vectorAdd? ``` System minor 0 System major 9 agent prop name AMD Instinct MI250X hip Device prop succeeded FAILED: 1048576 errors ``` - In the makefile, declare: `HIPCC_FLAGS = -g -O2 -DNDEBUG --offload-arch=gfx90a` In some systems we need to declare the architecture target if compiler does not understand during the compilation although this was different error but not it seems ok 21. I have tried one of the examples from the HPCTrainingExamples, i.e.`atomics_openmp` but it looks like the reduction is not supported (it should, but it does not work). Following an example: ``` DEVID: 0 SGN:8 ConstWGSize:1024 args: 5 teamsXthrds:( 10X1024) reqd:( 0X 0) lds_usage:136B sgpr_count:28 vgpr_count:16 sgpr_spill_count:68 vgpr_spill_count:44 tripcount:10000 rpc:0 n:__omp_offloading_54bbb604_f2004170_main_l35 AMDGPU fatal error 1: Memory access fault by GPU 4 (agent 0x3e0a40) at virtual address 0x4f0000. Reasons: Unknown (0) ```ROC - I am not sure if this exercise is ready for this training as it requires other modules also. 22. rocminfo ``` ...rocminfo k module is NOT loaded, possibly no GPU devices ``` - First allocate resources, I assume rocm module is loaded, but you need resources first. Also maybe use `srun -n 1 rocminfo` - Note that on LUMI, when you use `salloc`, you get the default behaviour of Slurm, i.e., it just gives you a shell on the node where you called `salloc`, not on the first node of your allocation as some other Slurm clusters do. You're simply executing `rocminfo` on a node without GPU... 23. What should the `HIP_PATH` variable be set to? - For which exercise? - Set it with `export HIP_PATH=$ROCM_PATH`. `ROCM_PATH` is set by the `rocm` module which you can see if you do `module show rocm`. It is only necessary for this exercise. ### LUMI SW stacks 24. So you said that conda cannot be used, but it also says in the file that Conda can be used if it comes in containers? Does that mean that we can use `mamba (miniconda)` in the containers, or can we also use `conda` there? - The problem is when using Anaconda. Conda is not a problem, as long as you use the best practices. See also the LUMI documentation: https://docs.lumi-supercomputer.eu/software/installing/python/. - (Kurt) There are two separate and independent issues. Firstly, any installation with lots of small files should be containerised, and most Python or conda installations are like that. Secondly, there is also a licensing issue specifically with Anaconda. I encourage you to have a look at the license, and [there is a link in the notes](https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20241028/notes_2_07_LUMI_Software_Stacks/#software-policies). Anaconda cannot be legally used on LUMI (or your university cluster or workstation for the same reason). The miniconda program can be used, but not to install from the main Anaconda channel, only from conda-forge and external channels. Other public domain implementations come with their own license and may nor may not be used. 25. Since the list of available modules gets so big and the old ones are shown first, what are the policies of keeping support for outdated software? - We remove whole SW stacks from time to time after major sysytem upgrades, after showing lmod deprecation warnings and then hiding it. Note that the old version issue only exists in the output of `module spider`. With `module avail` you will only see the versions that are relevant for the version of the LUMI stack that you have selected. 26. To install VASP + VTST + 'VASPSOL++' + CPVASP(VASP plugins) @Lumi, how should I proceed with the installation, and which installation methods should I choose? If using EasyBuild (EB), how should I configure the installation files? (When I search using `eb -S VASP`, I cannot find any preconfigured EasyBuild easyconfigs files that include VTST, VASPsol++, or other plugins(CP-VASP). Would it be possible for you to provide such EasyConfigs or help to find them? ) - We cannot fully support VASP and even if there were no licensing issues, as we cannot do all installations for everybody with all plugins etc that a single user may require and have to focus on those combinations for which we expect more demand. You will very likely have to do a manual installation following the VASP installation instructions and knowing how to use the Cray PE compilers. I know someone in the Flemish local support team is looking into such an EasyConfig which may or may not adaptable to Linux. But software with such restrictive licenses as VASP are nearly impossible to support. As we don't have access to the VASP sources due to its license, we cannot do anything and have to rely on external parties with access. ### Perftools 27. Do you need to provision a node on LUMI with special constraints in order to see these CPU/GPU HW counters? I saw some reference to a “WHITELIST” env variable. - We will come back to hardware counters in the next talk. Please hold on to this question and we can check if it is answered after the next talk. ### Exercises :::info [Slide with the exercises is available on the web temporarily](https://462000265.lumidata.eu/4day-20241028/files/LUMI-4day-20241028-3_Exercises_day3.pdf) ::: 28. I am doing the `perftools-lite` exercise. The slurm output says GPU_SUPPORT_ENABLED, but I cannot find any reference to the GPUs in the code or in the Makefile: ```shell MPICH ERROR [Rank 0] [job id 8302257.0] [Wed Oct 30 10:59:58 2024] [nid001026] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked (Other MPI error) aborting job: MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked srun: error: nid001026: task 0: Exited with exit code 255 srun: Terminating StepId=8302257.0 srun: error: nid001026: tasks 1-5,7: Segmentation fault srun: error: nid001026: task 6: Segmentation fault ``` - Exactly which example are you running and which lumi_xx script did you load? - I am running the C example and I sourced `lumi_c.sh`. - Did you load any gpu modules (craype-accel.., rocm) , You should have the standard login modules and then just perftools-lite, You should only have the above modules loaded if you are running the GPU examples on LUMI-G. Unload those modules or log in again and rebuild the example. - I have `rocm/6.0.3` from yesterday. Is that the problem? - How do I get back to the standard modules if I do a `module purge`? - If you used lumib_xxx.sh then just 'exit' the shell and run the script again. Or log in again and start from there. - JY: you may want to check that MPICH_GPU_SUPPORT_ENABLED env variable is not set (otherwise do a "unset MPICH_GPU_SUPPORT_ENABLED"). If set Cray MPICH will check if the GTL library - which is used to do GPU to GPU communications - is linked to the binary. ### MPI Topics on the HPE Cray EX supercomputer 29. For the 2D neighbour exchange example on the slides, isn't it easier to let MPI_Cart_create() deal with the reordering and just use whatever it returns? And then ask the MPI_Cart_shift() for neighbours? - Cart may not create an optimal solution. It may not be aware about things like node-layout and NUMA domains. ### AMD Debugger: ROCgdb 30. In the exercises, no `roctracer.txt` is produced anywhere and all the omnitrace -derived perfetto output files are invalid (according to perfetto.dev at least - I cannot run docker to test locally). - These exercises are meant to RocGDB, not sure what you mean with roctracer.txt? Leading into the next presentation, Perfetto UI will take JSON files (defaults to results.json), not *.txt files. TXT files are meant to be read directly. - Under "Binary rewriting (to be used with MPI codes and decreases overhead)" in the omnitrace section. :::warning If you want to provide some feedback on the current AI situation on LUMI you can participate in our AI questionnaire: https://link.webropolsurveys.com/Participation/Public/c62ffb41-714a-4425-aa37-69634dc22419?displayId=Fin3151145 ::: ### rocprof profiler :::info The AMD exercises can be accessed [here](https://hackmd.io/@sfantao/lumi-training-ams-2024#Rocprof). ::: ### Python :::info #### Ice breaker question of the day Which programming languages are you most using on LUMI? - Python, as an interface to HIP/OpenMP/OpenACC in C/Fortran - Python & C++ - Bash! - Python and C/C++ - fortran & python - Fortran and C++ with MPI. Python for general purposes. ::: ### Python & Performance Optimization at the node level 31. Why is it recommended to run Python codes not using multi-threading/multi-processing still with `srun`? - not sure I understand the question, `srun` allows to run on the compute nodes (otherwise you will run on the login nodes if you use `salloc`) - Note also that in a batch job, there is a difference between calling a program with and without srun. Without srun, it runs in the context of the batch job step, which includes, e.g., all hyperthreads on all cores of the first node of the job available to the job, while with `srun`, the LUMI-default `--hint=nomultithread` would apply. 32. How cache-misses are measured actually? - Using hardware counters. Was that your question? - Question: I mean which tools are used ? I might have missed sorry. - Profiling software will usually show them, if it can access them. 33. Can we programmatically see the size of cache lines and the associativity of the caches? - The cache line is partially standardized by the size of memory transfer to DRAM, and so can be taken as a constant: 64. This can be tested programmatically by disabling pre-fetching. The associativity of caches would most easily be found in hardware manuals.. 34. What block size is recommended for cache blocking for each precision? - As a cache line is 64 bytes, you could say 8 double precision of 16 single precision numbers. But it is more complicated than this. Basically you make your block size such that the "hot" data fits in the L2 data cache, and uses it as optimally as possible. Too small blocks may, e.g., give you gain too much loop overhead or make it impossible to use loop unrolling, which then could make it impossible to use the floating point units efficiently. On LUMI the vector units are half the cache line width and there are two of those, but then it takes 4 or 5 cycles before you can use the result in the next computation, so you'd want to unroll loops to work with 8 or 10 times the vector width in a single loop iteration. The talk was really just an introduction to make you aware of the issues. It takes a longer course to learn all the tricks. PRACE used to organise such courses, I'd expect they will reappear in the EuroHPC program. There are some people form the FAU Erlangen-Nurmberg who are particularly good at teaching this, but when I took that course, it was a three day course by itself rather than a 40 minute talk. ### Optimizing I/O 35. We used the collective MPI I/O in a Fortran code and observed a very slow performance in a multiple node case on GPFS. However, the Intel compiler was faster for the same size. What default settings can be different between openmpi and intel? Can the issue be solved by changing the settings as you mentioned on LUSTRE? We wrote a 3D array to a single binary or hdf file. - we can't really answer questions about other platforms here. - GPFS works in a completely different way than Lustre and has different tuning strategies. It doesn't work with chunk files on OSTs the way Lustre does but spreads data directly across blocks on multiple servers. But those blocks can then be fairly large, causing the actual size of small files to be a lot larger than on Lustre. You'll have to talk to people from the centre. If you are the person from the VUB in Belgium who registered, there is someone at the HPC service with experience in tuning for GPFS (Ward Poelmans). As far as I know, the MPI I/O code is derived from the one in MPICH so in some sense MPICH is closer to the source of the technology, so I wouldn't be that surprised if MPICH does better than Open MPI. ### OmniTrace 36. Where is `omnitrace`? `module spider omnitrace` or `module load omnitrace` do not show anything. - they are part of the rocm stack, so available as part of the rocm module. This is for ROCm 6.2. For older versions look here https://hackmd.io/@sfantao/lumi-training-ams-2024#Omnitrace. There is a module that you can load after including a new path. - Thanks! ### Omniperf 37. How to query for the information about a job which has already finished? - What kind of info are you interested in? - say, running time - `scontrol show job JOBID` - For finished jobs, it results in `slurm_load_jobs error: Invalid job id specified` - Check the many output options of `sacct`, this is for finished jobs. See also the man page of `sacct`. The link for the version that is currently on LUMI, is in the Appendix with documentation that you'll find on the web site of this course at https://lumi-supercomputer.github.io/LUMI-training-materials/4day-20241028/. The `-o` or `--format` command line flag enables you to specify a lot of different fields with information about the job. - Thanks! ### Tools in action: PyTorch. 38. Is this course still available? Moving your AI training jobs to LUMI: A Hands-On Workshop (Registration is not available at the moment but you can register on the waitlist.) - You'll be on the waiting list, so not that much chance that you get in, but it will likely be repeated in late January or February, in Helsinki, Finland. The precise date is not yet fixed though. 39. Could you explain the differences between these three storage directories? 1 /projappl/project_465001362 158M/54G 2.2K/100K 2 /scratch/project_465001362 1.3G/55T 4.4K/2.0M 3 /flash/project_465001362 4.1K/2.2T 1/1.0M Specifically, I’d like to know: - The intended use cases for each storage type. - Whether there are any time limits on file storage, or if files can remain stored indefinitely as long as the project is active. -Taking into account factors such as speed and so on, what is each storage type suitable for? Thank you!... :::info **Please write your questions above this note** ::: ###### EOF