Skip to content

Support for L4 GPUs #57

@hartikainen

Description

@hartikainen

I see references to L4 GPUs, e.g. here:

L4_24TH = 68
. This doesn't seem to work though because it's not been implemented. A simple test case in xmanager/cloud/vertex_test.py could look something like:

Details
  def test_get_machine_spec_l4(self):
    job = xm.Job(
        executable=local_executables.GoogleContainerRegistryImage('name', ''),
        executor=local_executors.Vertex(
            requirements=xm.JobRequirements(L4_24TH=2)
        ),
        args={},
    )
    machine_spec = vertex.get_machine_spec(job)
    self.assertDictEqual(
        machine_spec,
        {
            'machine_type': 'g2-standard-4',
            'accelerator_type': vertex.aip_v1.AcceleratorType.NVIDIA_L4,
            'accelerator_count': 2,
        },
    )

This gives an error:

Details
python -m xmanager.cloud.vertex_test
..........E.Creating CustomJob
CustomJob created. Resource name: <MagicMock name='WrappedClient().create_custom_job().name' id='4726134768'>
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('<MagicMock name='WrappedClient().create_custom_job().name' id='4726134768'>')
View Custom Job:
<MagicMock name='_dashboard_uri()' id='4726135440'>
Job launched at: <MagicMock name='_dashboard_uri()' id='4726135440'>
.
======================================================================
ERROR: test_get_machine_spec_l4 (__main__.VertexTest.test_get_machine_spec_l4)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/xmanager/main/xmanager/cloud/vertex_test.py", line 166, in test_get_machine_spec_l4
    machine_spec = vertex.get_machine_spec(job)
  File "/Users/user/xmanager/main/xmanager/cloud/vertex.py", line 305, in get_machine_spec
    spec['accelerator_type'] = aip_v1.AcceleratorType[accelerator_type]
                               ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/enum.py", line 791, in __getitem__
    return cls._member_map_[name]
           ~~~~~~~~~~~~~~~~^^^^^^
KeyError: 'NVIDIA_TESLA_L4_24TH'

----------------------------------------------------------------------
Ran 13 tests in 0.003s

FAILED (errors=1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions