Checkpoint and Migration.
校验点和任务迁移
Where programs can be linked with Condor libaries, users of Condor may be assured that their jobs will eventually complete, even in the ever changing environment that Condor utilizes. As a machine running a job submitted to Condor becomes unavailable, the job can be checkpointed. The job may continue after migrating to another machine. Condor's periodic checkpoint feature periodically checkpoints a job even in lieu of migration in order to safeguard the accumulated computation time on a job from being lost in the event of a system failure such as the machine being shutdown or a crash.
对于可以同Condor库连接的程序,即使Condor所处环境发生改变,Condor用户也可以确定他们的任务能够最终完成。当一台正在运行某项任务的机器变为不可用的时候,可以为这项任务建立校验点。因而这个任务在迁移到另一台机器之后就可以继续运行。Condor的周期性校验功能会定期对任务建立校验点并以次作为任务迁移的基点,由此保护一项任务的累积计算成果使其免于因为某次系统故障(比如关机或者死机)而丢失。
Remote System Calls.
远程系统调用
Despite running jobs on remote machines, the Condor standard universe execution mode preserves the local execution environment via remote system calls. Users do not have to worry about making data files available to remote workstations or even obtaining a login account on remote workstations before Condor executes their programs there. The program behaves under Condor as if it were running as the user that submitted the job on the workstation where it was originally submitted, no matter on which machine it really ends up executing on.
不管任务是不是运行在远程机器上,Condor标准运行模式会通过远程系统调用来保持本地执行环境。用户不必担心会把数据文件开放给远程机器甚至不需要在Condor执行任务前在远程机器上建立任何登陆账号。Condor下的程序,不管这个任务究竟是在哪一台机器上真正运行,其运行方式就好像它一直是以提交该任务的用户身份在提交该任务的原始机器上运行一样。
No Changes Necessary to User's Source Code.
用户的源代码不需要更改
No special programming is required to use Condor. Condor is able to run non-interactive programs. The checkpoint and migration of programs by Condor is transparent and automatic, as is the use of remote system calls. If these facilities are desired, the user only re-links the program. The code is neither recompiled nor changed.
使用Condor不需要额外的编程工作。Condor可以运行非交互式的程序。由Condor对程序实施的校验和迁移,就像对远程系统调用的使用一样,都是透明与自动的。用户只要重连接程序即可获得这种便捷性。而代码既不需要重编译也不需要更改。
Pools of Machines can be Hooked Together.
机群可以挂接在一起
Flocking is a feature of Condor that allows jobs submitted within a first pool of Condor machines to execute on a second pool. The mechanism is flexible, following requests from the job submission, while allowing the second pool, or a subset of machines within the second pool to set policies over the conditions under which jobs are executed.
聚结是Condor的一项独特功能,它允许在Condor机器的第一个机群中提交的任务在第二个机群中执行。这种机制根据任务提交中的相关请求,可以灵活设定究竟让第二个机群,还是第二个机群的一个机器子集来执行相关策略。
Jobs can be Ordered.
任务可以被排序
The ordering of job execution required by dependencies among jobs in a set is easily handled. The set of jobs is specified using a directed acyclic graph, where each job is a node in the graph. Jobs are submitted to Condor following the dependencies given by the graph.
由一组任务间的依赖关系所确定的任务执行顺序是很容易控制的。任务集是通过一个有向无环图来定义的,每个任务就是图中的一个节点。任务也是依照图中所给出的前后依赖顺序提交给Condor的。
Condor Enables Grid Computing.
允许网格计算
As grid computing becomes a reality, Condor is already there. The technique of glidein allows jobs submitted to Condor to be executed on grid machines in various locations worldwide. As the details of grid computing evolve, so does Condor's ability, starting with Globus-controlled resources.
因为网格计算已经成为现实,而Condor又是现成的平台。所以glidein技术能够把提交给Condor的任务放到位于世界各地的分散网格机器上执行。网格技术在发展,通过使用Globus-controlled资源,Condor的功能也得以提升。
Sensitive to the Desires of Machine Owners.
优先满足机器拥有者的愿望
The owner of a machine has complete priority over the use of the machine. An owner is generally happy to let others compute on the machine while it is idle, but wants it back promptly upon returning. The owner does not want to take special action to regain control. Condor handles this automatically.
机器拥有者对于机器的使用有着绝对优先权。一个拥有者通常愿意把机器的空闲时间贡献给其它计算,但是在他回来的时候希望能马上收回使用权。拥有者也不喜欢采用特殊的步骤才能收回控制权。Condor对此进行自动控制。
ClassAds.
分类广告
The ClassAd mechanism in Condor provides an extremely flexible, expressive framework for matchmaking resource requests with resource offers. Users can easily request both job requirements and job desires. For example, a user can require that a job run on a machine with 64 Mbytes of RAM, but state a preference for 128 Mbytes, if available. A workstation owner can state a preference that the workstation runs jobs from a specified set of users. The owner can also require that there be no interactive workstation activity detectable at certain hours before Condor could start a job. Job requirements/preferences and resource availability constraints can be described in terms of powerful expressions, resulting in Condor's adaptation to nearly any desired policy.
Condor中的ClassAd机制提供了一个极为灵活并且表达能力超强的框架用以对资源申请和资源提供者进行匹配。用户可以很容易的申明任务的最低和理想需求。例如,用户可以要求某项任务运行在带有64兆RAM的机器上,但是可能的话,最好使用128兆的机器。一个工作站拥有者则可以声明所属机器优先运行来自特定用户群的任务。拥有者还可以要求在若干个小时内都没有交互式动作的情况下Condor才可以启动任务。任务的最低/理想需求以及资源可用性都能通过功能强大的表达式加以描述,由此Condor几乎能胜任任何目标描述。
未经作者允许,请勿转载译文
译者联系方式:[email protected]; [email protected]
本文地址:http://com.8s8s.com/it/it24836.htm