Implementation of "Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning" (https://arxiv.org/abs/1804.05448)
Use example has been provided in slurm.sh
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Implementation of "Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning" (https://arxiv.org/abs/1804.05448)
Use example has been provided in slurm.sh