High dynamic range imaging records natural scenes with closer to actual luminance distributions, at the cost of increased storage requirements and display demands. Consequently, HDR image compression for perceptually optimal storage and display is crucial, yet it remains inadequately addressed. In this work, we take steps towards this goal. Specifically, we learn to compress HDR images into two bitstreams for storage, one of which is used to generate low dynamic range (LDR) images for display purposes conditioned on the maximum luminance of the scene, while the other serves as side information to aid HDR image reconstruction from the generated LDR image. To measure the perceptual quality of the displayable LDR image, we employ the normalized Laplacian pyramid distance (NLPD), a perceptually quality metric that supports the use of the input HDR image as reference. To measure the perceptual quality of the reconstructed HDR image, we employ a newly proposed HDR quality metric based on a simple inverse display model that enables high-fidelity dynamic range expansion at all luminance levels. Comprehensive qualitative and quantitative comparisons on various HDR scenes demonstrate the perceptual optimality of our learned HDR image compression system for both displayable LDR images and reconstructed HDR images at all bit rates.