POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 const axios = require('axios'); const fs = require('fs'); const path = require('path'); async function toB64(imgPath) { const data = fs.readFileSync(path.resolve(imgPath)); return Buffer.from(data).toString('base64'); } const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/hunyuan-3d-2"; const data = { "image": "toB64('https://i.ibb.co/8nbymYTS/hunyuan-image.png')", "octree_resolution": 256, "num_inference_steps": 30, "guidance_scale": 5.5, "seed": 12467, "face_count": 40000, "texture": false }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


imageimage ( default: 1 )

Input Image.


textstr ( default: 1 )

Prompt to render (optional when image is used)


octree_resolutionenum:str ( default: 256 )

Higher resolution gives betterquality but slower processing.

Allowed values:


num_inference_stepsint ( default: 30 ) Affects Pricing

Number of inference steps.

min : 20,

max : 50


guidance_scalefloat ( default: 5.5 )

Scale for classifier-free guidance

min : 1,

max : 15


seedint ( default: -1 )

Seed for image generation.

min : -1,

max : 999999999999999


face_countint ( default: 40000 )

Only used if texture=true, maximum number of faces in the mesh.

min : 5000,

max : 100000


texturebool ( default: 1 )

Whether to apply texture to the generated mesh.


meshstr *

GLB file, only needed if modifying existing mesh

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Hunyuan3D 2.0

Hunyuan3D 2.0 is an advanced 3D synthesis system designed for generating high-resolution, textured 3D assets. It consists of two main components: Hunyuan3D-DiT, a shape generation model, and Hunyuan3D-Paint, a texture synthesis mode. The system uses a two-stage pipeline: first, a bare mesh is created, and then a texture map is synthesized for the mesh. This approach decouples the complexities of shape and texture generation and allows for texturing of both generated and handcrafted meshes.

Key Features of Hunyuan3D 2.0

Shape Generation

  • It is a large-scale flow-based diffusion model.

  • It uses a Hunyuan3D-ShapeVAE autoencoder to capture fine-grained details on meshes. The ShapeVAE employs vector sets and an importance sampling method to extract representative features and capture details such as edges and corners. It uses 3D coordinates and the normal vector of point clouds sampled from the surface of 3D shapes as inputs for the encoder and instructs the decoder to predict the Signed Distance Function (SDF) of the 3D shape, which can be further decoded into triangle mesh.

  • A dual-single stream transformer is built on the latent space of the VAE with a flow-matching objective.

  • It is trained to predict object token sequences from a user-provided image and the predicted tokens are further decoded into a polygon mesh with the VAE decoder.

  • It uses a pre-trained image encoder (DINOv2 Giant) to extract conditional image tokens, processing images at 518 x 518 resolution. The background of the input image is removed, and the object is resized and repositioned to remove the negative impacts of the background.

Texture Synthesis

  • It uses a mesh-conditioned multi-view generation pipeline.

  • It uses a three-stage framework: pre-processing, multi-view image synthesis, and texture baking.

  • An image delighting module is used to convert the input image to an unlit state to produce light-invariant texture maps. The multi-view generation model is trained on white-light illuminated images, enabling illumination-invariant texture synthesis.

  • A geometry-aware viewpoint selection strategy is employed, selecting 8 to 12 viewpoints for texture synthesis.

  • A double-stream image conditioning reference-net, a multi-task attention mechanism, and a strategy for geometry and view conditioning are also used in this pipeline.

  • It uses a multi-task attention mechanism with reference and multi-view attention modules to ensure consistency across generated views.

  • It conditions the model with multi-view canonical normal maps and coordinate maps. A learnable camera embedding is used to boost the viewpoint clue for the multi-view diffusion model.

  • It uses dense-view inference with a view dropout strategy to enhance 3D perception.

  • A single-image super-resolution model is used to enhance texture quality, and an inpainting approach is applied to fill any uncovered patches on the texture map.

  • It is compatible with text- and image-to-texture generation, utilizing T2I models and conditional generation modules

Performance and Evaluation

  • Hunyuan3D 2.0 outperforms previous state-of-the-art models in geometry detail, condition alignment, and texture quality.

  • Hunyuan3D-ShapeVAE surpasses other methods in shape reconstruction.

  • Hunyuan3D-DiT produces more accurate condition following results compared to other methods.

  • The system produces high-quality textured 3D assets.

  • User studies demonstrate the superiority of Hunyuan3D 2.0 in terms of visual quality and adherence to image conditions