- Updated to Mesos 1.6.1.
- New update strategy added: Variable Batch Update. With this strategy, a job may be updated in
in batches of different sizes. For example, an update which modifies a total of 10 instances may
be done in batch sizes of 2, 3, and 5. The number of updated instances must equal the size of the
current group size in order to move to the next group size. If the number of updated instances is
greater to the sum of all group sizes, the last group size will be used in perpetuity until all
instances are updated.
A new field has been added to
UpdateConfigcalledupdate_strategy. Update strategy may take aQueueUpdateStrategy,BatchUpdateStrategy, or aVariableBatchUpdateStrategyobject.QueueUpdateStrategyandBatchUpdateStrategytake a single integer argument whileVariableBatchUpdateStrategytakes a list of positive integers as an argument. - Users may now set a value for the URI fetcher to rename a downloaded artifact to after it has been downloaded.
- Auto pause feature added to VariableBatch strategy and Batch strategy. With this feature enabled,
when an update is
ROLLING_FORWARD, the update will automatically pause itself right before a new batch is started. (This feature is being released as tested but in beta state. We are looking to collect feedback before we consider it fully stable.) loader.load()now uses memoization on the config file path so that we only load and process each config file once.- Instances run with custom executors will no longer show links to thermos observer.
- Add observer command line option
--disable_task_resource_collectionto disable the collection of CPU, memory, and disk metrics for observed tasks. This is useful in setups where metrics cannot be gathered reliable (e.g. when using PID namespaces) or when it is expensive due to hundreds of active tasks per host. - Added flag
-sla_aware_kill_non_prodwhich allows operators to enable SLA aware killing for non-production jobs. Jobs are considered non-production when they are preemptable and/or revocable.
- Deprecated use of Thrift fields
JobUpdateSettings.waitForBatchCompletionandJobUpdateSettings.updateGroupSize. Please set the properJobUpdateSettings.updateStrategyinstead. Note that these same constructs, as represented in the Aurora DSL, are still valid as they will be converted to the new field automatically by the client for backwards compatibility. - Backfill code for finding a matching tier to for a Job has been removed. Tier will now be set
when a Job is received by the scheduler. If no tier is provided, the default tier defined in
-tier_config.
-
Updated to Mesos 1.5.0.
-
Introduce ability for tasks to specify custom SLA requirements via the new
SlaPolicystructs. There are 3 different SLA Policies that are currently supported -CountSlaPolicy,PercentageSlaPolicyandCoordinatorSlaPolicy. SLA policies based on count and percentage express the required number ofRUNNINGinstances as either a count or percentage in addition to allowing the duration-window for which these requirements have to be satisfied. For applications that need more control over how SLA is determined, a custom SLA calculator can be configured a.k.a Coordinator. Any action that can affect SLA, will first check with the Coordinator before proceeding.IMPORTANT: The storage changes required for this feature will make scheduler snapshot backwards incompatible. Scheduler will be unable to read snapshot if rolled back to previous version. If rollback is absolutely necessary, perform the following steps:
- Stop all host maintenance requests via
aurora_admin host_activate. - Ensure a new snapshot is created by running
aurora_admin scheduler_snapshot <cluster> - Rollback to previous version
Note: The
Coordinatorinterface required for theCoordinatorSlaPolicyis experimental at this juncture and is bound to change in the future. - Stop all host maintenance requests via
-
Introduced ability for updates to be 'SLA-aware', or only update instances if it is within SLA, using the new
sla_awarefield inUpdateConfig. See the bullet point above for an explanation of custom SLA requirements.NOTE: SLA-aware updates will use the desired config's SLA, not the existing config.
Three additional scheduler options have been added to support this feature:
max_update_action_batch_size (default: 300): the number of update actions to process in a batch.sla_aware_kill_retry_min_delay (default: 1mins): the minimum amount of time to wait before retrying an SLA-aware kill (using a truncated binary backoff).sla_aware_kill_retry_max_delay (default: 5mins): the maximum amount of time to wait before retrying an SLA-aware kill (using a truncated binary backoff).
- Deprecated the
aurora_admin host_draincommand used for maintenance. With this release the SLA computations are moved to the scheduler and it is no longer required for the client to compute SLAs and watch the drains. The scheduler persists any host maintenance request and performs SLA-aware drain of the tasks, before marking the host asDRAINED. So maintenance requests survive across scheduler fail-overs. Use the newly introducedaurora_admin sla_host_drainto skip the SLA computations on the admin client. - Removed resource fields (
numCpus,ramMb,diskMb) from ResourceAggregate.
- Updated to Mesos 1.4.0.
- Added experimental support for the Mesos partition-aware APIs. The key idea is a new ScheduleStatus
PARTITIONED that represents a task in an unknown state. Users of Aurora can have per-job policies
of whether or not to reschedule and how long to wait for the partition to heal. Backwards
compatibility with existing behavior (i.e. transition to LOST immediately on a partition) is
maintained. The support is experimental due to bugs found in Mesos that would cause issues in
a production cluster. For that reason, the functionality is behind a new flag
-partition_awarethat is disabled by default. When Mesos support is improved and the new behavior is vetted in production clusters, we'll enable this by default. - Added the ability to inject custom offer holding and scheduling logic via the
-offer_set_modulescheduler flag. To take advantage of this feature, you will need to implement theOfferSetinterface. - Added
executor_configfield to the Job object of the DSL which will populateJobConfiguration.TaskConfig.ExecutorConfig. This allows for using custom executors defined through the--custom_executor_configscheduler flag. See our custom-executors documentation for more information. - Added support in Thermos Observer for delegating disk usage monitoring to Mesos agent. The feature
can be enabled via
--enable_mesos_disk_collectorflag, in which case Observer will use the agent's containers HTTP API to query the amount of used bytes for each container. Note that disk isolation should be enabled in Mesos agent. This feature is not compatible with authentication enabled agents.
- Removed the ability to recover from SQL-based backups and snapshots. An 0.20.0 scheduler will not be able to recover backups or replicated log data created prior to 0.19.0.
- Removed task level resource fields (
numCpus,ramMb,diskMb,requestedPorts). - Removed the
-offer_order_modulesscheduler flag related to custom injectable offer orderings, since this will now be subsumed under customOfferSetimplementations (see the comment above):
- Added the ability to configure the executor's stop timeout, which is the maximum amount of time the executor will wait during a graceful shutdown sequence before continuing the 'Forceful Termination' process (see here for details).
- Added the ability to configure the wait period after calling the graceful shutdown endpoint and
the shutdown endpoint using the
graceful_shutdown_wait_secsandshutdown_wait_secsfields inHttpLifecycleConfigrespectively. Previously, the executor would only wait 5 seconds between steps (adding up to a total of 10 seconds as there are 2 steps). The overall waiting period is bounded by the executor's stop timeout, which can be configured using the executor'sstop_timeout_in_secsflag. - Added the
thrift_method_interceptor_modulesscheduler flag that lets cluster operators inject custom Thrift method interceptors. - Increase default ZooKeeper session timeout from 4 to 15 seconds.
- Added option
-zk_connection_timeoutto control the connection timeout of ZooKeeper connections. - Added scheduler command line argument
-hold_offers_forever, suitable for use in clusters where Aurora is the only framework. This setting disables other options such as-min_offer_hold_time, and allows the scheduler to more efficiently cache scheduling attempts. - The scheduler no longer uses an internal H2 database for storage.
- There is a new Scheduler UI which, in addition to the facelift, provides the ability to inject your own custom UI components.
- Introduce a metadata field in the Job object of the DSL, which will populate TaskConfig.metadata.
- Removed the deprecated command line argument
-zk_use_curator, removing the choice to use the legacy ZooKeeper client. - Removed the
rewriteConfigsthrift API call in the scheduler. This was a last-ditch mechanism to modify scheduler state on the fly. It was considered extremely risky to use since its inception, and is safer to abandon due to its lack of use and likelihood for code rot. - Removed the Job environment validation from the command line client. Validation was moved to the
the scheduler side through the
allowed_job_environmentsoption. By default allowing any ofdevel,test,production, and any value matching the regular expressionstaging[0-9]*. - Removed scheduler command line arguments related to the internal H2 database, which is no longer
used:
-use_beta_db_task_store-enable_db_metrics-slow_query_log_threshold-db_row_gc_interval-db_lock_timeout-db_max_active_connection_count-db_max_idle_connection_count-snapshot_hydrate_stores-enable_h2_console
- Update to Shiro 1.2.5
- Update to Mesos 1.2.0. Please upgrade Aurora to 0.18 before upgrading Mesos to 1.2.0 if you rely on Mesos filesystem images.
- Add message parameter to
killTasksRPC. - Add
prune_tasksendpoint toaurora_admin. Seeaurora_admin prune_tasks -hfor usage information. - Add support for per-task volume mounts for Mesos containers to the Aurora config DSL.
- Added the
-mesos_driverflag to the scheduler with three possible options:SCHEDULER_DRIVER,V0_MESOS,V1_MESOS. The first uses the original driver and the latter two use two new drivers fromlibmesos.V0_MESOSuses theSCHEDULER_DRIVERunder the hood andV1_MESOSuses a new HTTP API aware driver. Users that want to use the HTTP API should useV1_MESOS. Performance sensitive users should stick with theSCHEDULER_DRIVERorV0_MESOSdrivers. - Add observer command line options to control the resource collection interval for observed tasks. See here for details.
- Added support for reserving agents during job updates, which can substantially reduce update times in clusters with high contention for resources. Disabled by default, but can be enabled with enable_update_affinity option, and the reservation timeout can be controlled via update_affinity_reservation_hold_time.
- Add
task scpcommand to the CLI client for easy transferring of files to/from/between task instances. See here for details. Currently only fully supported for Mesos containers (you can copy files from the Docker container sandbox but you cannot send files to it). - Added ability to inject your own scheduling logic, via a lazy Guice module binding. This is an
alpha-level feature and not subject to backwards compatibility considerations. You can specify
your custom modules using the
task_assigner_modulesandpreemption_slot_finder_modulesoptions. - Added support for resource bin-packing via the '-offer_order' option. You can choose from
CPU,MEMORY,DISK,RANDOMorREVOCABLE_CPU. You can also compose secondary sorts by combining orders together: e.g. to bin-pack by CPU and MEMORY you could supply 'CPU,MEMORY'. The current default isRANDOM, which has the strong advantage that users can (usually) relocate their tasks due to noisy neighbors or machine issues with a task restart. When you have deterministic bin-packing, they may always end up on the same agent. So be careful enabling this without proper monitoring and remediation of host failures. - Modified job update behavior to create new instances, then update existing instances, and then kill unwanted instances. Previously, a job update would modify each instance in the order of their instance ID.
- Added ability to whitelist TaskStateChanges in the webhook configuration file. You can specify a list of desired TaskStateChanges(represented by their task statuses) to be sent to a configured endpoint.
- Upgraded Mesos to 1.1.0.
- Added a new flag
--snapshot_hydrate_storesthat controls which H2-backed stores to write fully hydrated into the Scheduler snapshot. Can lead to significantly lower snapshot times for large clusters if you set this flag to an empty list. Old behavior is preserved by default, but see org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl for which stores we currently have duplicate writes for. - A task's tier is now mapped to a label on the Mesos
TaskInfoproto. - The Aurora client is now using the Thrift binary protocol to communicate with the scheduler.
- Introduce a new
--ipoption to bind the Thermos observer to a specific rather than all interfaces. - Fix error that prevents the scheduler from being launched with
-enable_revocable_ram. - The Aurora Scheduler API supports volume mounts per task for the Mesos
Containerizer if the scheduler is running with the
-allow_container_volumesflag.
- The executor will send SIGTERM to processes that self daemonize via double forking.
- Resolve docker tags to concrete identifiers for DockerContainerizer, so that job configuration
is immutable across restarts. The feature introduces a new
{{docker.image[name][tag]}}binder that can be used in the Aurora job configuration to resolve a docker image specified by itsname:tagto a concrete identifier specified by itsregistry/name@digest. It requires version 2 of the Docker Registry. - Use
RUNNINGstate to indicate that the task is healthy and behaving as expected. Job updates can now rely purely on health checks rather thanwatch_secstimeout when deciding an individial instance update state, by settingwatch_secsto 0. A service will remain inSTARTINGstate utilmin_consecutive_successesconsecutive health checks have passed. - The default logging output has been changed to remove line numbers and inner class information in exchange for faster logging.
- Support the deployment of the Aurora scheduler behind HTTPS-enabled reverse proxies: By launching
scheduler via
-serverset_endpoint_name=httpsyou can ensure the Aurora client will correctly discover HTTPS support via the ZooKeeper-based discovery mechanism. - Scheduling performance has been improved by scheduling multiple tasks per scheduling round.
- Preemption slot search logic is modified to improve its performance.
- Multiple reservations are made per task group per round.
- Multiple reservations are evaluated per round.
- New scheduler metrics are added to facilitate monitoring and performance studies (AURORA-1832).
- Upgraded Mesos to 1.0.0. Note: as part of this upgrade we have switched from depending on the mesos.native egg for Thermos in favor of the stripped down mesos.executor egg. This means users launching Docker tasks with the Mesos DockerContainerizer are no longer required to use images that include all of Mesos's dependencies.
- Scheduler command line behavior has been modified to warn users of the deprecation of
productionattribute inJobthrift struct. The scheduler is queried for tier configurations and the user's choice oftierandproductionattributes is revised, if necessary. Iftieris already set, theproductionattribute might be adjusted to match thetierselection. Otherwise,tieris selected based on the value ofproductionattribute. If a matching tier is not found, thedefaulttier from tier configuration file (tiers.json) is used. - The
/offersendpoint has been modified to display attributes of resource offers as received from Mesos. This has affected rendering of some of the existing attributes. Furthermore, it now dumps additional offer attributes including reservations and persistent volumes. - The scheduler API now accepts both thrift JSON and binary thrift. If a request is sent without a
Content-Typeheader, or aContent-Typeheader ofapplication/x-thriftorapplication/jsonorapplication/vnd.apache.thrift.jsonthe request is treated as thrift JSON. If a request is sent with aContent-Typeheader ofapplication/vnd.apache.thrift.binarythe request is treated as binary thrift. If theAcceptheader of the request isapplication/vnd.apache.thrift.binarythen the response will be binary thrift. Any other value forAcceptwill result in thrift JSON. - Scheduler is now able to launch jobs using more than one executor at a time. To use this feature
the
-custom_executor_configflag must point to a JSON file which contains at least one valid executor configuration as detailed in the configuration documentation. - Add rollback API to the scheduler and new client command to support rolling back active update jobs to their initial state.
- The scheduler flag
-zk_use_curatornow defaults totrueand care should be taken when upgrading from a configuration that does not pass the flag. The scheduler upgrade should be performed by bringing all schedulers down, and then bringing upgraded schedulers up. A rolling upgrade would result in no leading scheduler for the duration of the roll which could be confusing to monitor and debug. - A new command
aurora_admin reconcile_tasksis now available on the Aurora admin client that can trigger implicit and explicit task reconciliations. - Add a new MTTS (Median Time To Starting) metric in addition to MTTA and MTTR.
- In addition to CPU resources, RAM resources can now be treated as revocable via the scheduler
commandline flag
-enable_revocable_ram. - Introduce UpdateMetadata fields in JobUpdateRequest to allow clients to store metadata on update.
- Changed cronSchedule field inside of JobConfiguration schema to be optional for compatibility with Go.
- Update default value of command line option
-framework_nameto 'Aurora'. - Tasks launched with filesystem images and the Mesos unified containerizer are now fully isolated from the host's filesystem. As such they are no longer required to include any of the executor's dependencies (e.g. Python 2.7) within the task's filesystem.
- The job configuration flag
productionis now deprecated. To achieve the same scheduling behavior thatproduction=trueused to provide, users should elect atierfor the job with attributespreemptible=falseandrevocable=false. For example, thepreferredtier in the default tier configuration file (tiers.json) matches the above criteria. - The
ExecutorInfo.sourcefield is deprecated and has been replaced with a label namedsource. It will be removed from Mesos in a future release. - The scheduler flag
-zk_use_curatorhas been deprecated. If you have never set the flag and are upgrading you should take care as described in the note above. - The
keyargument ofgetJobUpdateDetailshas been deprecated. Use thequeryargument instead. - The --release-threshold option on
aurora job restarthas been removed.
- New scheduler commandline argument -enable_mesos_fetcher to allow job submissions to contain URIs which will be passed to the Mesos Fetcher and subsequently downloaded into the sandbox. Please note that enabling job submissions to download resources from arbitrary URIs may have security implications.
- Upgraded Mesos to 0.28.2.
-
Upgraded Mesos to 0.27.2
-
Added a new optional Apache Curator backend for performing scheduler leader election. You can enable this with the new
-zk_use_curatorscheduler argument. -
Adding --nosetuid-health-checks flag to control whether the executor runs health checks as the job's role's user.
-
New scheduler command line argument
-offer_filter_durationto control the time after which we expect Mesos to re-offer unused resources. A short duration improves scheduling performance in smaller clusters, but might lead to resource starvation for other frameworks if you run multiple ones in your cluster. Uses the Mesos default of 5s. -
New scheduler command line option
-framework_nameto change the name used for registering the Aurora framework with Mesos. The current default value is 'TwitterScheduler'. -
Added experimental support for launching tasks using filesystem images and the Mesos unified containerizer. See that linked documentation for details on configuring Mesos to use the unified containerizer. Note that earlier versions of Mesos do not fully support the unified containerizer. Mesos 0.28.x or later is recommended for anyone adopting task images via the Mesos containerizer.
-
Upgraded to pystachio 0.8.1 to pick up support for the new Choice type.
-
The
containerproperty of aJobis now a Choice of either aContainerholder, or a direct reference to either aDockerorMesoscontainer. -
New scheduler command line argument
-ipto control what ip address to bind the schedulers http server to. -
Added experimental support for Mesos GPU resource. This feature will be available in Mesos 1.0 and is disabled by default. Use
-allow_gpu_resourceflag to enable it.IMPORTANT: once this feature is enabled, creating jobs with GPU resource will make scheduler snapshot backwards incompatible. Scheduler will be unable to read snapshot if rolled back to previous version. If rollback is absolutely necessary, perform the following steps:
- Set
-allow_gpu_resourceto false - Delete all jobs with GPU resource (including cron job schedules if applicable)
- Wait until GPU task history is pruned. You may speed it up by changing the history retention
flags, e.g.:
-history_prune_threshold=1minsand-history_max_per_job_threshold=0 - In case there were GPU job updates created, prune job update history for affected jobs from
/h2consoleendpoint or reduce job update pruning thresholds, e.g.:-job_update_history_pruning_threshold=1minsand-job_update_history_per_job_threshold=0 - Ensure a new snapshot is created by running
aurora_admin scheduler_snapshot <cluster> - Rollback to previous version
- Set
-
Experimental support for a webhook feature which POSTs all task state changes to a user defined endpoint.
-
Added support for specifying the default tier name in tier configuration file (
tiers.json). Thedefaultproperty is required and is initialized with thepreemptibletier (preemptibletier tasks can be preempted but their resources cannot be revoked).
- Deprecated
--restart-thresholdoption in theaurora job restartcommand to match the job updater behavior. This option has no effect now and will be removed in the future release. - Deprecated
-framework_namedefault argument 'TwitterScheduler'. In a future release this will change to 'aurora'. Please be aware that depending on your usage of Mesos, this will be a backward incompatible change. For details, see MESOS-703. - The
-thermos_observer_rootcommand line arg has been removed from the scheduler. This was a relic from the time when executor checkpoints were written globally, rather than into a task's sandbox. - Setting the
containerproperty of aJobto aContainerholder is deprecated in favor of setting it directly to the appropriate (i.e.DockerorMesos) container type. - Deprecated
numCpus,ramMbanddiskMbfields inTaskConfigandResourceAggregatethrift structs. Useset<Resource> resourcesto specify task resources or quota values. - The endpoint
/slavesis deprecated. Please use/agentsinstead. - Deprecated
productionfield inTaskConfigthrift struct. Usetierfield to specify task scheduling and resource handling behavior. - The scheduler
resources_*_ram_gbandresources_*_disk_gbmetrics have been renamed toresources_*_ram_mbandresources_*_disk_mbrespectively. Note the unit change: GB -> MB.
- Upgraded Mesos to 0.26.0
- Added a new health endpoint (/leaderhealth) which can be used for load balancer health checks to always forward requests to the leading scheduler.
- Added a new
aurora job addclient command to scale out an existing job. - Upgraded the scheduler ZooKeeper client from 3.4.6 to 3.4.8.
- Added support for dedicated constraints not exclusive to a particular role. See here for more details.
- Added a new argument
--announcer-hostnameto thermos executor to override hostname in service registry endpoint. See here for details. - Descheduling a cron job that was not actually scheduled will no longer return an error.
- Added a new argument
-thermos_home_in_sandboxto the scheduler for optionally changing HOME to the sandbox during thermos executor/runner execution. This is useful in cases where the root filesystem inside of the container is read-only, as it moves PEX extraction into the sandbox. See here for more detail. - Support for ZooKeeper authentication in the executor announcer. See here for details.
- Scheduler H2 in-memory database is now using MVStore In addition, scheduler thrift snapshots are now supporting full DB dumps for faster restarts.
- Added scheduler argument
-require_docker_use_executorthat indicates whether the scheduler should accept tasks that use the Docker containerizer without an executor (experimental). - Jobs referencing invalid tier name will be rejected by the scheduler.
- Added a new scheduler argument
--populate_discovery_info. If set to true, Aurora will start to populate DiscoveryInfo field on TaskInfo of Mesos. This could be used for alternative service discovery solution like Mesos-DNS. - Added support for automatic schema upgrades and downgrades when restoring a snapshot that contains a DB dump.
- Removed deprecated (now redundant) fields:
Identity.roleTaskConfig.environmentTaskConfig.jobNameTaskQuery.owner
- Removed deprecated
AddInstancesConfigparameter toaddInstancesRPC. - Removed deprecated executor argument
-announcer-enable, which was a no-op in 0.12.0. - Removed deprecated API constructs related to Locks:
- removed RPCs that managed locks
acquireLockreleaseLockgetLocks
- removed
Lockparameters to RPCscreateJobscheduleCronJobdescheduleCronJobrestartShardskillTasksaddInstancesreplaceCronTemplate
- removed RPCs that managed locks
- Task ID strings are no longer prefixed by a timestamp.
- Changes to the way the scheduler reads command line arguments
- Removed support for reading command line argument values from files.
- Removed support for specifying command line argument names with fully-qualified class names.
- Upgraded Mesos to 0.25.0.
- Upgraded the scheduler ZooKeeper client from 3.3.4 to 3.4.6.
- Added support for configuring Mesos role by passing
-mesos_roleto Aurora scheduler at start time. This enables resource reservation for Aurora when running in a shared Mesos cluster. - Aurora task metadata is now mapped to Mesos task labels. Labels are prefixed with
org.apache.aurora.metadata.to prevent clashes with other, external label sources. - Added new scheduler flag
-default_docker_parametersto allow a cluster operator to specify a universal set of parameters that should be used for every container that does not have parameters explicitly configured at the job level. - Added support for jobs to specify arbitrary ZooKeeper paths for service registration. See here for details.
- Log destination is configurable for the thermos runner. See the configuration reference for details on how to configure destination per-process. Command line options may also be passed through the scheduler in order to configure the global default behavior.
- Env variables can be passed through to task processes by passing
--preserve_envto thermos. - Changed scheduler logging to use logback. Operators wishing to customize logging may do so with standard logback configuration
- When using
--read-json, aurora can now load multiple jobs from one json file, similar to the usual pystachio structure:{"jobs": [job1, job2, ...]}. The older single-job json format is also still supported. aurora config listcommand now supports--read-json- Added scheduler command line argument
-shiro_after_auth_filter. Optionally specify a class implementing javax.servlet.Filter that will be included in the Filter chain following the Shiro auth filters. - The
addInstancesthrift RPC does now increase job instance count (scale out) based on the task template pointed by instancekey.
- Deprecated
AddInstancesConfigargument inaddInstancesthrift RPC. - Deprecated
TaskQueryargument inkillTasksthrift RPC to disallow killing tasks across multiple roles. The new safer approach is usingJobKeywithinstancesinstead. - Removed the deprecated field 'ConfigGroup.instanceIds' from the API.
- Removed the following deprecated
HealthCheckConfigclient-side configuration fields:endpoint,expected_response,expected_response_code. These are now set exclusively in like-named fields ofHttpHealthChecker. - Removed the deprecated 'JobUpdateSettings.maxWaitToInstanceRunningMs' thrift api field (
UpdateConfig.restart_threshold in client-side configuration). This aspect of job restarts is now
controlled exclusively via the client with
aurora job restart --restart-threshold=[seconds]. - Deprecated executor flag
--announcer-enable. Enabling the announcer previously required both flags--announcer-enableand--announcer-ensemble, but now only--announcer-ensemblemust be set.--announcer-enableis a no-op flag now and will be removed in future version. - Removed scheduler command line arguments:
-enable_cors_support. Enabling CORS is now implicit by setting the argument-enable_cors_for.-deduplicate_snapshotsand-deflate_snapshots. These features are good to always enable.-enable_job_updatesand-enable_job_creation-extra_modules-logtostderr,-alsologtostderr,-vlog,-vmodule, anduse_glog_formatter. Removed in favor of the new logback configuration.
- Upgraded Mesos to 0.24.1.
- Added a new scheduler flag 'framework_announce_principal' to support use of authorization and rate limiting in Mesos.
- Added support for shell-based health checkers in addition to HTTP health checkers. In concert with
this change the
HealthCheckConfigschema has been restructured to more cleanly allow configuring varied health checkers. - Added support for taking in an executor configuration in JSON via a command line argument
--custom_executor_configwhich will override all other the command line arguments and default values pertaining to the executor. - Log rotation has been added to the thermos runner. See the configuration reference for details on how configure rotation per-process. Command line options may also be passed through the scheduler in order to configure the global default behavior.
- The client-side updater has been removed, along with the CLI commands that used it: 'aurora job update' and 'aurora job cancel-update'. Users are encouraged to take advantage of scheduler-driven updates (see 'aurora update -h' for usage), which has been a stable feature for several releases.
- The following fields from
HealthCheckConfigare now deprecated:endpoint,expected_response,expected_response_codein favor of setting them as part of anHttpHealthChecker. - The field 'JobUpdateSettings.maxWaitToInstanceRunningMs' (UpdateConfig.restart_threshold in client-side configuration) is now deprecated. This setting was brittle in practice, and is ignored by the 0.11.0 scheduler.
- Upgraded Mesos to 0.23.0. NOTE: Aurora executor now requires openssl runtime dependencies that were not previously enforced. You will need libcurl available on every Mesos slave (or Docker container) to successfully launch Aurora executor. See here for more details on Mesos runtime dependencies.
- Resource quota is no longer consumed by production jobs with a dedicated constraint (AURORA-1457).
- The Python build layout has changed:
- The
apache.thermospackage has been removed. - The
apache.gen.aurorapackage has been renamed toapache.aurora.thrift. - The
apache.gen.thermospackage has been renamed toapache.thermos.thrift. - A new
apache.thermos.runnerpackage has been introduced, providing thethermos_runnerbinary. - A new
apache.aurora.kerberospackage has been introduced, containing the Kerberos-supporting versions ofauroraandaurora_admin(kauroraandkaurora_admin). - Most BUILD targets under
src/mainhave been removed, see here for details.
- The
- Removed the
--rootoption from the observer. - Thrift
ConfigGroup.instanceIdsfield has been deprecated. Use ConfigGroup.instances instead. - Deprecated
SessionValidatorandCapabilityValidatorinterfaces have been removed. AllSessionKey-typed arguments are now nullable and ignored by the scheduler Thrift API.
- Now requires JRE 8 or greater.
- GC executor is fully replaced by the task state reconciliation (AURORA-1047).
- The scheduler command line argument
-enable_legacy_constraintshas been removed, and the scheduler no longer automatically injectshostandrackconstraints for production services. (AURORA-1074) - SLA metrics for non-production jobs have been disabled by default. They can
be enabled via the scheduler command line. Metric names have changed from
...nonprod_msto...ms_nonprod(AURORA-1350).
- A new command line argument was added to the observer:
--mesos-rootThis must point to the same path as--work_diron the mesos slave. - Build targets for thermos and observer have changed, they are now:
src/main/python/apache/aurora/tools:thermossrc/main/python/apache/aurora/tools:thermos_observer