Game of Thrones Official Models - King Mag the Mighty Figurine

£9.995
FREE Shipping

Game of Thrones Official Models - King Mag the Mighty Figurine

Game of Thrones Official Models - King Mag the Mighty Figurine

RRP: £19.99
Price: £9.995
£9.995 FREE Shipping

In stock

We accept the following payment methods

Description

Blurriness in Images: Many images extracted from the TV show displayed varying degrees of blur, which negatively impacts the training process, and possibly forces the model to be able to generate mainly blurry images. I wanted to use an algorithm to automatically filter out and discard these blurry images. My attempt can be seen in the images_filter_blurry.py script where I tried three distinct algorithms to identify and filter out face blur. Unfortunately, my tests on a sample dataset didn't establish a reliable correlation between the blur score from the algorithm and the actual perceptual blurriness upon manual inspection. Attempts at combining these algorithms didn't yield better results. While some articles point to dedicated models trained for blur detection, I wasn't able to acquire such a model for my tests. Then I used a WebUI extension with the WD14 tagger to append the rest of the captions automatically. Given that the video source used an HDR format with a unique color profile, the above command ensures a correct color representation for the extracted images. Additionally, the command aims to retrieve only distinct frames. However, using 4K resolution might have affected the extraction of distinct frames. Darkness - Even with my efforts to counter the dataset's dark bias by introducing random saturation, generated characters often appear slightly too dark. Using "game of thrones" in the prompt often results in darker images. However, using "game of thrones" in a negative prompt tends to produce brighter images. Training with more episodes might lessen this dark bias, but this remains to be verified.

Multipliers: GOT subjects with a significant number of images - trained 8 images per subject per epoch, subjects with fewer images - 4/2 images per subject per epoch. Mixing I'm uncertain if the training strategy I implemented is the best approach. My goal was to test a pre-trained TE strategy, but it remains unclear whether it's superior or inferior to the combined TE+Unet training. Moving forward, I plan to start with a TE+Unet training phase and subsequently freeze the TE while continuing Unet training - without disregarding the Unet progress from the initial phase. Training included 9k images focused on characters' faces, 50 subjects in total, and 4k images from different scenes. Additionally, 30k images were used as regularization images - medieval-themed images as well as half of the ❤️‍🔥 Divas dataset. Ultimately, the training was stabilized with 💖 Babes 2.0 model. Automation Goal - I aspire to fully automate the entire process of converting video to an SD model. However, challenges like blurriness and the absence of a reliable face-to-name classification make it currently infeasible. The need for manual filtering and captioning makes the process both lengthy and labor-intensive. I'm optimistic that future advancements will allow for a more streamlined video-to-SD-model conversion. This would potentially speed up the creation of fast and high-quality fan fiction, visual novels, concept art, and, given advancements in image-to-video technology, even aid in creating videos, music clips, short films, and movies. I also added a few thousand regularization images, mainly medieval-themed and nature-only images. There are scripts in my repository that can help to obtain such images. These images were captioned automatically. Validation

Download 3D files from Game of Thrones

I'm using the EveryDream2 trainer, which runs on a remote server from vast.ai. For this model, I've exclusively used RTX 4090 GPUs. Although there are numerous settings in the training process that can be adjusted, I'll only mention a few most important settings: the Unet learning rate 7e-7, Text Encoder (TE) learning rate 5e-8, and for the scheduler, pulsing cosine with a 0.5-2 epochs cycle. I also enabled the tag shuffling option. Images dedicated to validation should be placed in a separate folder. They are not used for training but are vital to monitor that the training process is genuinely learning and not merely overfitting on the dataset. I set aside 20 random images from the faces of the 10 subjects with the most images. Darkness

Overall, the model's development spanned three weeks, with GPU training on an RTX 4090 taking 3.5 days. Dataset preparation First, I obtained a 4K (3840 x 2160px) version of the first three episodes of the show. 4K images allow for the extraction of relatively small faces from frames that maintain a resolution higher than 768x768px, which is our base training resolution. The aspect ratio doesn't have to be 1:1, as training will automatically scale down the images to fit the target training area. Extracting images Face Classification: the training process requires that all images of a specific individual be stored in a single directory to be able to control the number of images used for training for each subject. I tried to use automating face-to-name classification in sort_images_by_faces.py script. While it had a small success, the high rate of misclassifications meant a manual review became inevitable. Given this, I found it more efficient to manually categorize images in a single directory rather than navigate through 50 separate ones.Data preparation presented two primary challenges: dealing with blurry images and effectively classifying face-to-name. While the researchers certainly produced the lengthy study as fans, the out-of-the-ordinary simulation has important implications for the science behind climate study. [ See the Effects of Climate Change Across Earth (Video)] The model 👑 G ame of Thrones is based on the first three episodes of HBO's TV show Game of Thrones. As a fan of the show, I thought it would be interesting to reimagine it with a Stable Diffusion (SD) model. The main goal of the model is to replicate the show's characters with high fidelity. Given the large number of characters, interactions, and scenes it presents, it was quite a challenging endeavor. The images showcased above are the outcomes of the model.

Captioning was done in a few steps with the help of my scripts: captions_commands.py and captions_helper.py. The primary objective of the training is character training, with a focus on faces. Therefore I had to extract all faces from the initial set of 41k images. In my GitHub repository, there is a script crop_to_face.py that I used to extract all the faces into a separate folder, with a command: python3 crop_to_face.py --source_folder "/path_to_source/S01E01-03_extract/" --target_folder "/path_to_target/S01E01-03_faces/" To obtain images from the video, I used ffmpeg, extracting four frames from each second of the video using the following command for each episode: ffmpeg -hwaccel cuda -i "/path_to_source/video_S01E01.mkv" -vf "setpts=N/FRAME_RATE/TB,fps=4,mpdecimate=hi=8960:lo=64:frac=0.33,zscale=t=linear:npl=100,format=gbrpf32le,zscale=p=bt709,tonemap=tonemap=hable:desat=0,zscale=t=bt709:m=bt709:r=tv,format=yuv420p" -pix_fmt yuv420p -q:v 3 "/path_to_target/S01E01_extract/s01_e01_%06d.jpg" For faces, they were already separated into folders - folder names were used as the first tag in captions. I used my script with a graphical component to add additional names, when images included another face.Besides training faces, I wanted the model to be familiar with outfits and scenes. To achieve this, I used a subset of the frames extracted initially, without cropping them. Using the move_random_files.py script on the 41k images from the initial extraction, to move 5k random images as the foundation for scenes. I manually filtered these selected images during the captioning stage. Captioning Multipliers: GOT subjects with a significant number of images - trained 40 images per subject per epoch, subjects with fewer images - 8/4 images per subject per epoch. Stage 2 This model is based on ❤️‍🔥 Divas model - original training, remixed recipe, and half of the dataset used for regularization. Multipliers: GOT subjects with a significant number of images - trained 30 images per subject per epoch, subjects with fewer images - 8/4 images per subject per epoch. Stage 3 In this training, I wanted to test the theory suggesting it's more effective for the TE to be pre-trained initially, and for the Unet to be trained later with frozen and pre-trained TE. Stage 1



  • Fruugo ID: 258392218-563234582
  • EAN: 764486781913
  • Sold by: Fruugo

Delivery & Returns

Fruugo

Address: UK
All products: Visit Fruugo Shop