- new -maxidlejobs throttle [for CDF (Igor Sfiligoi); see Message-Id
  <5.2.0.9.2.20031229065542.03a5aec0@cobalt.cs.wisc.edu>]

- one-at-a-time node constraint feature for Ewa Deelman (miron has
  thoughts on how this should be done more generally)

- add check for situation where dagman thinks there should be jobs in
  the queue but there are none... abort w/rescue file

- Fix so that we don't pull the userlog filename from a submit file
  until the job is actually submitted (so submit files can be created
  late)

- document, document, document
  - check CVS logs...

- scalability testing
- add recovery scenarios to test suite
- test stupid dags
  - no ready jobs
  - cycles
  - no jobs

- add an option to have DAGMan clean up all its temp/log files when
  it's done, if it finished successfully

- Add DAGManJobID to submit event so that two DAGMans with jobs
  writing to the same log can keep each other straight

? recursive-DAG-aware tools (condor_submit DAG-hunting, etc.)

=== new features

- replace _done kludge in script.C with a new job state?

- from skoranda:

1) A DAG file is at some level a "meta language", and so I think it
   would be useful to have some rudimentary programm language
   constructs.  Variables or macros, for example, would be
   useful. Then I could write something like

   HOME_PATH = /home/skoranda/LDR/libexec

   Job JobA HOME_PATH/JobA.sub
   Job JobB HOME_PATH/JobB.sub

   PARENT JobA CHILD JobB

2) It seems from experiment that when DAGman is limited by the number
   of jobs that can run at any given time (-MaxJobs 2 in my case) that
   it proceeds through the DAG by running all jobs at the first level
   in the tree, then the second level, and so on.

   I would prefer to have DAGman work down all the way to the lowest
   level or leaf before moving to the next branch. Perhaps there could
   be a flag to toggle these two modes?


=== Miron Ideas ===

- when you recover the log, every time you see a submit, make sure the
  waiting queue of that job is empty

- interface dagman with a db [low priority]



=== Tests to Write ===

to test

parsing:
	- DAGs which define >1 PRE and POST scripts for a given node
	- DAGs which define a PRE and POST script without a script name

running:


recovery:

other:



=== Dynamic Dag Support ===

dynamic dag modification

use daemoncoer socket to accept requests for modification
dagman can modify the dag

write a new temporary dag file
make sure that it's consistent, etc
get a commit command that swaps the new dag file for the old (while not processing new events)

start with add/remove
think about cases we can do safely, and those we reject (e.g., adding parents of already-submitted nodes)
in the future, look at rejected cases
in the future, add an undo

recovery?

race conditions

confirmation of changes to requestor (2 phase commit?)

what if we add a new parent to a node that hasn't completed?
- either consider it done or re-run

keep a clear audit trail
- miron: think of the dag file as a db, keep meta info about the nodes
- keep deleted nodes as comments, comment new nodes, etc.

- dag files will need to become unique to dag instances (since they will change them)
  - never have 2 dagmans using the same dag

check for duplicate node names, etc (everything we do in parse.C)


1/15/03:

1) perform add/del/etc internally
	- mark dynamic nodes with a bool (+maybe user+timestamp?)
2) write record to DagLog file
3) resume

when producing rescue files, add comments above dynamic nodes (w/user+timestamp?)

recovery from Rescue
	- same as before!

recovery from crash:
	1) parse DAG [same]
	2) apply DAGLog changes [new]
	3) read userlog [same]

- miron: when dagman starts running, it creates a shadow dag file
  (initially a copy) with current dag info
	- think carefully ~ recovery (e.g. crash before shadow file
	  exists, etc.)

