I'm digging back into my musty past here so bear with me...
The D-A conversion is done automatically by the hardware on the card itself. Refresh rates and resolution can be controlled programmatically by setting values on the VGA controller chip itself. An easy way are BIOS interrupt calls.
In terms of pixel rendering, why waste the effort of building something when it is supported by all VGA cards in one form or another? For VESA VGA cards, you put the card into the appropriate resolution/color depth you want. Again, BIOS calls can accomplish this. How the pixels get displayed depend on the mode.
Graphics modes on VGA cards are memory-mapped. The contents of the video memory (i.e. Trident VGA with 1MB RAM, that 1MB RAM is the video memory) determine what will be displayed on the screen. The amount of memory on the card also determines the available resolutions and color depths.
Many modes on the VGA card are indexed palettes. For example, 640x480x16 colors or 256 colors. In the indexed modes, you load the RGB color information into the color palette. Palettes are used to conserve what was at the time expensive memory. Color values are 0-indexed. For 16 colors, there would be 16 24 bit entries in the palette (8 bits for each RGB component) and for 256 colors, there would be 256 entries.
Video memory is laid out as contiguous space. The first byte of video memory corresponds to (0,0), the top left pixel on the screen. As addresses advance, so do the pixels. Memory is read by the VGA DAC and renders pixels from top-left, moving left to right, to bottom right.
Here's where it gets fun. The color depth determines how many pixels a byte of video memory represents. For monochrome, a single byte represents 8 pixels. IF a bit is set, it is black. If it is off, it is white. For 16 color modes, one byte equals two pixels. The high and low nibbles (4 bits) represent the index value for a single color. The 0-15 value is read out of the high or low nibble. looked up in the palette to determine what color that value represents and the appropriate color values are sent to the DAC for display as the memory is read. Same applies for the 256 color mode. 256 color modes are the easiest to program since one byte represents one pixel of a specific indexed color.
Higher color modes like 4096, 65536 and 16.7M color modes work similarly. Except now, it requires multiple bytes to represent individual pixels. For 4096 (12 bit) color, it requires 3 bytes to represent two pixels. 4096 colors are typically not found on PCs, I use it for reference.
The 65K and 16.7M modes, unlike the low color modes, are typically not palette but rather direct color modes. This means instead of indexes into a fixed palette of colors, the RGB color information is encoded directly into the values stored in memory. As a result, these modes are easier to program since a palette doesn't need to be set up beforehand. For 16.7M colors, 3 bytes are required to represent a single pixel.
That's where video memory affects what resolutions and color depths are available. Typical VGA cards had 128kb or 256kb RAM onboard. On standard VGA cards, 256 colors at 640x480 wasn't possible since it requires 300kb to render. The non-standard 640x400 resolution could display 256 colors on a 256kb card. 320x200x256 was the most heavily used on VGA for games and easy to program since it used 64kb to render, making it natural for DOS programmers used to 64kb segment addressing.
It also shows why 800x600 was limited to 16 colors since it would require 480kb to render 256 colors but only 240kb to render 16 colors.
Super VGA and XGA allowed for the higher resolutions and color depths. Today, moden cards typically use 24 or 32 bits per pixel. It doesn't take much to see how much memory is required to support the high resolution, high color modes we are used to today. 1280x1024x16.7M today requires between 4MB and 5MB of video memory, which back then often exceeded the total RAM in the computer.
As to your question to what the motherboard sends to the card, it's pretty straightforward:
1) Program called Int 13h to initialize the card and place it into the proper resolution. Let's assume 320x200x256. It would also initialize the color palette.
2) Depending on the mode, a specifc region of memory will be mapped into the high memory address space on DOS that serves as the start of video memory. The mode determines where this is.
3) The program points itself to the top of video memory and begins writing bytes to memory. As bytes are written (depending on the speed), the VGA DAC will update the signals and the resulting pixels will be displayed on the monitor.
For 16 color and monochrome modes, the program would be responsible for doing the logical bit operations to turn on the 2 or 8 pixels respectively represented by each byte and write it to memory.
That's really it. Forgive me if some of the information is inaccurate as I am drawing from my college days and memory when I used to do video programming. Hopefully this helps.
Matt