|
|
## Cronjobs
|
|
|
Dentro de las tareas de línea de comandos, existen un conjunto de ellas que necesitan ejecutarse frecuentemente a través del uso de [cronjobs](https://en.wikipedia.org/wiki/Cron#Overview). En sistemas basados en Debian, existen [configuraciones de cronjobs a distintos niveles](https://debian-administration.org/article/56/Command_scheduling_with_cron), pero en particular, nuestras configuraciones de cronjobs las tendremos _centralizadas_ en archivos bajo el directorio `/etc/cron.d/`.
|
|
|
|
|
|
### Cronjob de tareas programadas de DSpace `dspace-config`
|
|
|
```bash
|
|
|
# /etc/cron.d/dspace-config: crontab fragment for dspace
|
|
|
# Edit this file to introduce tasks to be run by cron.
|
|
|
#
|
|
|
# Each task to run has to be defined through a single line
|
|
|
# indicating with different fields when the task will be run
|
|
|
# and what command to run for the task
|
|
|
#
|
|
|
# To define the time you can provide concrete values for
|
|
|
# minute (m), hour (h), day of month (dom), month (mon),
|
|
|
# and day of week (dow) or use '*' in these fields (for 'any').#
|
|
|
# Notice that tasks will be started based on the cron's system
|
|
|
# daemon's notion of time and timezones.
|
|
|
#
|
|
|
# Output of the crontab jobs (including errors) is sent through
|
|
|
# email to the user the crontab file belongs to (unless redirected).
|
|
|
#
|
|
|
# For more information see the manual pages of crontab(5) and cron(8)
|
|
|
#
|
|
|
# m h dom mon dow command
|
|
|
|
|
|
#-----------------
|
|
|
# GLOBAL VARIABLES
|
|
|
#-----------------
|
|
|
# Shell to use
|
|
|
SHELL=/bin/sh
|
|
|
|
|
|
# Add all major 'bin' directories to path
|
|
|
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
|
|
|
|
|
|
# Set JAVA_OPTS with defaults for DSpace Cron Jobs.
|
|
|
JAVA_OPTS="-Xmx1024M -Dfile.encoding=UTF-8"
|
|
|
|
|
|
# Full path of your local DSpace Installation (e.g. /home/dspace or /dspace or similar)
|
|
|
DSPACE=/var/dspace/install
|
|
|
|
|
|
# TODO: GeoIP2 License Key used for download Geolitev2 IP database (https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/)
|
|
|
GEOLITEv2_LICENSE_KEY=XXX-YYY-ZZZ
|
|
|
|
|
|
#--------------
|
|
|
# HOURLY TASKS (Recommended to be run multiple times per day, if possible)
|
|
|
# At a minimum these tasks should be run daily.
|
|
|
#--------------
|
|
|
|
|
|
# Regenerate DSpace Sitemaps every 8 hours (12AM, 8AM, 4PM).
|
|
|
# SiteMaps ensure that your content is more findable in Google, Google Scholar, and other major search engines.
|
|
|
0 0,8,16 * * * dspace $DSPACE/bin/dspace generate-sitemaps > /dev/null
|
|
|
|
|
|
# Update the OAI-PMH index with the newest content 4 times per hour
|
|
|
1 9,12,15,19 * * * dspace $DSPACE/bin/dspace oai import > /dev/null
|
|
|
|
|
|
#----------------
|
|
|
# DAILY TASKS
|
|
|
# (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
|
|
|
#----------------
|
|
|
|
|
|
# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every day
|
|
|
# (This ensures new content is available via OAI-PMH and ensures the OAI-PMH index is optimized for better performance)
|
|
|
0 6 * * * dspace $DSPACE/bin/dspace oai import -o > /dev/null
|
|
|
|
|
|
# Clean and Update the Discovery indexes at midnight every day
|
|
|
# (This ensures that any deleted documents are cleaned from the Discovery search/browse index)
|
|
|
0 0 * * * dspace $DSPACE/bin/dspace index-discovery > /dev/null
|
|
|
|
|
|
# Re-Optimize the Discovery indexes at 12:30 every day
|
|
|
# (This ensures that the Discovery Solr Index is re-optimized for better performance)
|
|
|
30 0 * * * dspace $DSPACE/bin/dspace index-discovery -o > /dev/null
|
|
|
|
|
|
# Mark Web Spiders from DSpace Statistics Solr Index at 01:00 every day
|
|
|
# (This removes any known web spiders from your usage statistics)
|
|
|
0 1 * * * dspace $DSPACE/bin/dspace stats-util -m
|
|
|
|
|
|
# Cleanup & Re-Optimize DSpace Statistics Solr Index at 01:30 every day
|
|
|
# (This ensures that the Statistics Solr Index is re-optimized for better performance)
|
|
|
30 1 * * * dspace $DSPACE/bin/dspace stats-util -f && $DSPACE/bin/dspace stats-util -o > /dev/null
|
|
|
|
|
|
# Send out subscription e-mails at 02:00 every day
|
|
|
# (This sends an email to any users who have "subscribed" to a Collection, notifying them of newly added content.)
|
|
|
0 2 * * * dspace $DSPACE/bin/dspace sub-daily
|
|
|
|
|
|
# Run the media filter at 01:01 and 17:01 every day.
|
|
|
# (This task ensures that thumbnails are generated for newly add images,
|
|
|
# and also ensures full text search is available for newly added PDF/Word/PPT/HTML documents)
|
|
|
1 1,17 * * * dspace $DSPACE/bin/dspace filter-media -m50 -q > /dev/null
|
|
|
|
|
|
# Run any Curation Tasks queued from the Admin UI at 04:00 every day
|
|
|
# (Ensures that any curation task that an administrator "queued" from the Admin UI is executed
|
|
|
# asynchronously behind the scenes)
|
|
|
#0 4 * * * dspace $DSPACE/bin/dspace curate -q admin_ui
|
|
|
|
|
|
#----------------
|
|
|
# WEEKLY TASKS
|
|
|
# (Recommended to be run once per week, but can be run more or less frequently, based on your local needs/policies)
|
|
|
#----------------
|
|
|
# Run the checksum checker at 04:00 every Saturday
|
|
|
# By default it runs through every file (-l) and also prunes old results (-p)
|
|
|
# (This re-verifies the checksums of all files stored in DSpace. If any files have been changed/corrupted, checksums will differ.)
|
|
|
0 4 * * 6 dspace $DSPACE/bin/dspace checker -l -p
|
|
|
# NOTE: LARGER SITES MAY WISH TO USE DIFFERENT OPTIONS. The above "-l" option tells DSpace to check *everything*.
|
|
|
# If your site is very large, you may need to only check a portion of your content per week. The below commented-out task
|
|
|
# would instead check all the content it can within *one hour*. The next week it would start again where it left off.
|
|
|
#0 4 * * 0 $DSPACE/bin/dspace checker -d 1h -p
|
|
|
|
|
|
# Mail the results of the checksum checker (see above) to the configured "mail.admin" at 05:00 every Sunday.
|
|
|
# (This ensures the system administrator is notified whether any checksums were found to be different.)
|
|
|
0 5 * * 0 dspace $DSPACE/bin/dspace checker-emailer
|
|
|
|
|
|
#----------------
|
|
|
# MONTHLY TASKS
|
|
|
# (Recommended to be run once per month, but can be run more or less frequently, based on your local needs/policies)
|
|
|
#----------------
|
|
|
# Permanently delete any bitstreams flagged as "deleted" in DSpace, on the first of every month at 01:00
|
|
|
# (This ensures that any files which were deleted from DSpace are actually removed from your local filesystem.
|
|
|
# By default they are just marked as deleted, but are not removed from the filesystem.)
|
|
|
0 1 1 * * dspace $DSPACE/bin/dspace cleanup > /dev/null
|
|
|
|
|
|
# Actualiza de la BD de Geolite el día 2 de cada mes
|
|
|
#0 5 2 * * dspace wget -qO- "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City&license_key=$GEOLITEv2_LICENSE_KEY&suffix=tar.gz" | tar --wildcards --directory $DSPACE/config/ -xz "GeoLite2-City_*/GeoLite2-City.mmdb" --strip-components 1
|
|
|
|
|
|
#----------------
|
|
|
# YEARLY TASKS (Recommended to be run once per year)
|
|
|
#----------------
|
|
|
# At 2:00AM every January 1, "shard" the DSpace Statistics Solr index.
|
|
|
# This ensures each year has its own Solr index, which improves performance.
|
|
|
# NOTE: This is scheduled here for 2:00AM so that it happens *after* the daily cleaning & re-optimization of this index.
|
|
|
0 2 1 1 * dspace $DSPACE/bin/dspace stats-util -s
|
|
|
``` |
|
|
\ No newline at end of file |