Driving an 8-bit parallel 8080 bus using I2S

Deouss
Posts: 425
Joined: Tue Mar 20, 2018 11:36 am

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby Deouss » Mon Jun 03, 2019 1:27 am

I heard on some hacker forums Bituni simple ripped that code off of someone else's project. Not sure but this code looks a bit sloppy and it is heavy mixed with Arduino. If you look at I2S technical reference info there are a lot of functionalities that are untouched by most of the code out there. I'm not even sure if whole I2S is used properly in most of esp projects because it chokes at 20MHz+.
Definitely some updates and code polishing are needed. Also if you just go with inline asm for direct register write to multiple pins you will probably achieve speeds faster than I2S. Maybe Espressif will show us more examples in the future.

RetroZvoc
Posts: 9
Joined: Tue Nov 27, 2018 1:21 am

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby RetroZvoc » Mon Jun 03, 2019 5:40 am

Deouss wrote: I heard on some hacker forums Bituni simple ripped that code off of someone else's project. Not sure but this code looks a bit sloppy and it is heavy mixed with Arduino. If you look at I2S technical reference info there are a lot of functionalities that are untouched by most of the code out there. I'm not even sure if whole I2S is used properly in most of esp projects because it chokes at 20MHz+.
Definitely some updates and code polishing are needed. Also if you just go with inline asm for direct register write to multiple pins you will probably achieve speeds faster than I2S. Maybe Espressif will show us more examples in the future.
I've already tried the GPIO w1ts/w1tc byte permutation trick with a lookup table that some YouTuber made and it's not really fast. It goes up to 38FPS and the tearing is very significant and ugly. It also doesn't utilize DMA, but it's all just bitbanging. I don't think that assembly would help there since C/C++ most probably optimizes that code as much as possible since it isn't some digital_write bloat or things like that. I could be wrong, maybe the compiler doesn't compile the code into a direct register write, but I don't know how to get the disassembly out of an already compiled C/C++ code.

The idea is inevitably to use I2S with DMA. However, as far as I could see from my perspective, the key is that Bitluni used a feature in his I2S implementation that splits the 32-bit buffer into a pair of 16-bit buffers which increases the bandwidth effectively to 40MHz. That unfortunately caused him to have jittery pixels until he utilized the precision clock for calibration at 500MHz and above ("out of spec values" @ 580MHz) which led him to far faster speeds and results.

So, what we'd need to do is make up the code that does this precision clock + DMA + I2S binding, but without looking into Bitluni's code so that we don't get "stained" with the ShareAlike license's virality. So, could you tell me what code approximately connects this clock thing with the I2S bus? Also, I couldn't fully read through everything that was written here and, frankly, I don't want to overwhelm myself with all/any of that. I just want to start out fresh on doing this myself with a summary of things to be cautious of and possibly help from some of you. Things like which bytes are sent in which order in which condition and such. This is important for me because I don't have any electronic measurement equipment, but only microcontrollers, displays, resistors, capacitors and buttons. You guys have oscilloscopes and various things and you've probably gone through some of the conclusions. Most probably my findings here could help some of you so you could try to use them for your own testings and that way we can build this thing altogether.

Now, I'm certainly sure that the LCD display has some maximum speed and that by looking into the datasheet to try out different speeds, I could see what the fastest speed is. Maybe we don't need 580MHz like how Bitluni used for his VGA implementation. If my calculations are correct, 320*240*2*60=9216000 means that we'd need approximately a bit less than 10MHz while utilizing the "pair of 16-bit buffers" thing in order to have a 320x240x16-bit@60FPS display. This could be a great contribution to the LittleVGL library in order to make it superfast and not rely so much on the HSPI and the waitings and the rendering buffers.

However, regarding my game console project ideas, I saw an NES emulator written in C++ and SDL2 under the MIT license. I could use that code to rewrite the emulator for my game console so that way it has less bloat than the nofrendo GPL'd emulator and which has faster rendering. As for my own fantasy console, my rendering engine idea has been very simple. It's a tile-based engine akin to the SNES/GBA's with the 16-bit RGAB5515 color format, many 8x8/16x16/32x32 8-way rotateable sprites per scaline, 16 color-per-tile tilesets with 32 pallette sets, 4 scrollable and matrix-skewable nametables and etc., except that my engine renders layers one by one per each scanline onto a scanline buffer along with the sprites without looking at which pixel is transparent and which isn't (which SNES had to do, but we don't since we have ESP32 which is like 20x faster and has more memory). Then, the DMA copies the scanline buffer onto the screen while another scanline is being rendered. And the process is repeated until the whole screen is rendered. I've been into NES and SNES emulators for approximately a decade and I cannot resist but to want to finally make something useful off of it.

I think that a full 60FPS 320x240 16-bit color ESP32 game console that doesn't waste too much time and memory on video is the holy grail of microcontrollers. Let's go for it!

Deouss
Posts: 425
Joined: Tue Mar 20, 2018 11:36 am

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby Deouss » Mon Jun 03, 2019 11:52 am

RetroZvoc wrote:I've already tried the GPIO w1ts/w1tc byte permutation trick with a lookup table that some YouTuber made and it's not really fast. It goes up to 38FPS and the tearing is very significant and ugly. It also doesn't utilize DMA, but it's all just bitbanging. I don't think that assembly would help there since C/C++ most probably optimizes that code as much as possible since it isn't some digital_write bloat or things like that. I could be wrong, maybe the compiler doesn't compile the code into a direct register write, but I don't know how to get the disassembly out of an already compiled C/C++ code.
Interesting - maybe you could share the code for wts combo. Anyways - compiler will optimize code and may use slower memory addressing instruction plus loops. If you write pixel code in asm properly - you can pair certain operations into one cycle - considering Xtensa has if Im not mistaken 7-stage pipeline. So it is not just bitbanging but direct register and memory addressing. You can probably squeeze all pixels of lcd into very small number of cycles doing inline repeated assembly - not looped. It is old Amiga/Atari technique
I don't know where that 580Mhz came from. As far as TR says - I2S is DMA and Espressif devs know better how all the implementations are working and if anything can be improved for better addressing and byte order sequencing at the output.
I2S needs some revising for its api improvements. As I see SDK is still very fresh.

Xavi92
Posts: 45
Joined: Thu Mar 28, 2019 2:26 pm

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby Xavi92 » Mon Jun 03, 2019 5:06 pm

I'm pretty much having the same issues you have described as well. I am using a ILI9481-based (480x320 pixels) display along with I2S+DMA. I have played around with the APLL_CLK with no success so far, so I am sticking with the default clock source for the time being. Even if the screen seems to get drawn rather quickly (~35Hz), there are often some glitches that I cannot explain. Even adjusting a different clock divider value changes the resulting screen, too!

Results with I2S1.clkm_div_num = 4:
https://www.youtube.com/watch?v=LNzpnrKPnFo

Results with I2S1.clkm_div_num = 32:
https://youtu.be/Y7exI4tEGok

Is there any reason why this could be happening? How can I configure APLL_CLK in order to get more precise clock output?

Code: Select all

#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "i2sparallel_custom.h"
#include "driver/periph_ctrl.h"
#include "driver/i2s.h"
#include "esp32/rom/lldesc.h"
#include "soc/soc.h"
#include "gfx.h"
#include "soc/rtc.h"
#include "esp_log.h"
#include "gpiolcd.h"
#include "global_defs.h"

static void i2s_setup_peripheral(void);
static void i2s_setup_clock(void);
static void i2s_isr(void *const params);

static void i2s_setup_clock(void)
{
    /* ***************************************************************
     * I2Sn clock frequency is calculated as follows:
     *
     * fi2s = fPLL / (N + (b/a))
     *
     * Where:
     *  fPLL: selected clock frequency. Two options are available:
     *          PLL_D2_CLK, rated at 160 MHz.
     *          APLL_CLK (frequency?)
     *
     *  N: CLKM_DIV_NUM
     *  b: CLKM_DIV_B
     *  a: CLKM_DIV_A
     *
     * On the other hand, BCK clock frequency is calculated as follows:
     *
     * I2SnO_BCK_out = fi2s / M
     *
     * Where:
     *
     *  M : BCK_DIV_NUM
     *
     * Note: in this case we are using LCD master transmitting mode.
     * ***************************************************************/

    //rtc_clk_apll_enable(true, 0, 0, 9, 0);

    /* Enable use of APLL. */
    I2S1.clkm_conf.clka_en = 1;

    /* Set BCK TX clock rate. */
    I2S1.sample_rate_conf.tx_bck_div_num = 4;
    I2S1.clkm_conf.clkm_div_b = 0;

    /* On the original example, it was set to zero. Why? */
    I2S1.clkm_conf.clkm_div_a = 1;

    /* Set clock frequency denominator. */
    I2S1.clkm_conf.clkm_div_num = 4;

    /* Activate I2S1 clock. */
    I2S1.clkm_conf.clk_en = 1;

    /* Enable interrupt trigger when a packet has been sent. */
    I2S1.int_ena.out_total_eof = 1;

#if 0
    /* Enable interrupt trigger when a descriptor error is found. */
    I2S1.int_ena.out_dscr_err = 1;

    I2S1.int_ena.out_eof = 1;
    /* Why isn't this bit enabled on the example code? */
    I2S1.int_ena.out_done = 1;
#endif
}

static void i2s_setup_peripheral(void)
{
    enum
    {
        I2S_TX_CHAN_MODE_MONO = 2,
        I2S_TX_FIFO_MODE_16_BIT_DUAL = 0,
        I2S_TX_FIFO_MODE_16_BIT_SINGLE = 1
    };

    /* Enable I2S1 peripheral before modifying any register. */
    periph_module_enable(PERIPH_I2S1_MODULE);

    /* Set I2S1 in LCD mode. */
    I2S1.conf2.lcd_en = 1;

    /* Clear TX slave mode so LCD master
     * transmission mode is enabled. */
    I2S1.conf.tx_slave_mod = 0;

    I2S1.conf1.tx_stop_en = 1;
    I2S1.conf1.tx_pcm_bypass = 1;

    /* Set TX channel mode. */
    I2S1.conf_chan.tx_chan_mod = I2S_TX_CHAN_MODE_MONO;

    /* Reset I2S1 TX FIFO buffer. */
    I2S1.conf.tx_fifo_reset = 1;
    I2S1.conf.tx_fifo_reset = 0;

    /* Set TX FIFO mode. */
    I2S1.fifo_conf.tx_fifo_mod = I2S_TX_FIFO_MODE_16_BIT_SINGLE;

    /* This bit must always be set, according to TRM. */
    I2S1.fifo_conf.tx_fifo_mod_force_en = 1;

    /* These bits must always be set, according to TRM
     * documentation, when working in LCD mode so LCD
     * master transmitting data frame form 2, where 1
     * byte is transmitted each time WR is asserted,
     * is used. */
    I2S1.conf2.lcd_tx_sdx2_en = 0;
    I2S1.conf2.lcd_tx_wrx2_en = 1;
}

static const enum LCDPins lcd_pins[] =
{
    DB0,
    DB1,
    DB2,
    DB3,
    DB4,
    DB5,
    DB6,
    DB7
};

void i2s_setup_gpio(void)
{
    {
        uint32_t signal_idx = I2S1O_DATA_OUT0_IDX;

        foreach (pin, lcd_pins)
        {
            /* Route each LCD data pin into I2S1 output signal. */
            gpio_matrix_out(*pin, signal_idx++, false, false);
        }
    }

    /* According to TRM, I2S WS signal needs to be inverted. */
    gpio_matrix_out(WR, I2S1O_WS_OUT_IDX, true, false);
}

void i2s_reset_gpio(void)
{
    foreach (pin, lcd_pins)
    {
        /* Unroute each LCD data pin into I2S1 output signal. */
        gpio_matrix_out(*pin, 0x100, false, false);
    }

    /* According to TRM, I2S WS signal needs to be inverted. */
    gpio_matrix_out(WR, 0x100, false, false);
}

enum
{
    OWNER_CPU,
    OWNER_DMA
};

lcd_word dma_buffers[2][DMA_MAX_SIZE / 2];

static lldesc_t dma_descriptor =
{
    .size = DMA_MAX_SIZE,
    .length = DMA_MAX_SIZE,
    .buf = NULL,
    .owner = OWNER_DMA,
    .eof = 1
};

static void i2s_setup_dma(void)
{
    /* Reset DMA AHB interface. */
    I2S1.lc_conf.ahbm_rst = 1;
    I2S1.lc_conf.ahbm_rst = 0;

    /* Reset in DMA FSM. */
    I2S1.lc_conf.in_rst = 1;
    I2S1.lc_conf.in_rst = 0;

    /* Reset out DMA FSM. */
    I2S1.lc_conf.out_rst = 1;
    I2S1.lc_conf.out_rst = 0;

    /* Set owner bit. */
    I2S1.lc_conf.check_owner = 1;

    /* Transmit data in burst mode. */
    I2S1.lc_conf.out_data_burst_en = 1;

    /* Transfer outlink descriptor in burst mode. */
    I2S1.lc_conf.outdscr_burst_en = 1;

    /* Enable DMA operation over I2S1. */
    I2S1.fifo_conf.dscr_en = 1;

    /* Set up DMA descriptor address. */
    I2S1.out_link.addr = ((uint32_t)(&dma_descriptor)) & I2S_OUTLINK_ADDR;
}

static void i2s_setup_fifo(void)
{
    /* Reset DMA AHB interface FIFO buffer. */
    I2S1.lc_conf.ahbm_fifo_rst = 1;
    I2S1.lc_conf.ahbm_fifo_rst = 0;

    /* Reset I2S1 TX FIFO buffer. */
    I2S1.conf.tx_fifo_reset = 1;
    I2S1.conf.tx_fifo_reset = 0;

    /* Reset I2S1 TX channel. */
    I2S1.conf.tx_reset = 1;
    I2S1.conf.tx_reset = 0;
}

static void IRAM_ATTR i2s_isr(void *const params)
{
    I2S1.conf.tx_start = 0;
    I2S1.out_link.stop = 1;

    static uint8_t buffer_selector;
    /* Retrieve semaphore handle for currently selected DMA buffer. */
    const SemaphoreHandle_t semaphore = xSemaphore[buffer_selector];

    BaseType_t xHigherPriorityTaskWoken = pdFALSE;

    if (I2S1.int_st.out_eof || I2S1.int_st.out_done || I2S1.int_st.out_total_eof)
    {
        buffer_selector ^= 1;
    }

    /* Clear interrupt flags. */
    I2S1.int_clr.val = I2S1.int_st.val;

    /* Inform gfx transmission has ended. */
    xQueueSendToBackFromISR(draw_isr_queue, UNUSED(bool), &xHigherPriorityTaskWoken);

    xSemaphoreGiveFromISR(semaphore, &xHigherPriorityTaskWoken);

    if (xHigherPriorityTaskWoken)
    {
        portYIELD_FROM_ISR();
    }
}

static void i2s_setup_isr(void)
{
    enum
    {
        ESP_INTR_FLAGS_NONE
    };

    intr_handle_t int_handle;

    /* Configure interrupt for I2S1. */
    const esp_err_t ret = esp_intr_alloc
    (
        /* Interrupt source */  ETS_I2S1_INTR_SOURCE,
        /* Interrupt flags */   ESP_INTR_FLAGS_NONE,
        /* Interrupt handler */ i2s_isr,
        /* Parameters */        NULL,
        /* Return handle */     &int_handle
    );

    if (ret == ESP_OK)
    {
        /* This interrupt is not located in IRAM. */
        esp_intr_set_in_iram(int_handle, true);

        /* Enable interrupt. */
        esp_intr_enable(int_handle);
    }
    else
    {
        /* Could not initialize I2S1 interrupt handler. */
    }
}

void i2s_draw(const lcd_word *const buffer, const size_t length)
{
    dma_descriptor.buf = (uint8_t*)buffer;
    dma_descriptor.length = length;

    i2s_setup_dma();
    i2s_setup_fifo();

    /* Start transmission. */
    I2S1.out_link.start = 1;
    I2S1.conf.tx_start = 1;
}

void i2s_setup(void)
{
    /* Setup I2S1 peripheral and its registers. */
    i2s_setup_peripheral();

    /* Setup GPIO pins used by I2S1. */
    i2s_setup_gpio();

    /* Setup DMA channel for I2S1. */
    i2s_setup_dma();

    /* Setup I2S1 clock. */
    i2s_setup_clock();

    /* Reset I2S1 FIFO buffers. */
    i2s_setup_fifo();

    /* Setup interrupt handler. */
    i2s_setup_isr();
}

RetroZvoc
Posts: 9
Joined: Tue Nov 27, 2018 1:21 am

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby RetroZvoc » Tue Jun 04, 2019 12:00 pm

Deouss wrote:
Mon Jun 03, 2019 11:52 am
RetroZvoc wrote:I've already tried the GPIO w1ts/w1tc byte permutation trick with a lookup table that some YouTuber made and it's not really fast. It goes up to 38FPS and the tearing is very significant and ugly. It also doesn't utilize DMA, but it's all just bitbanging. I don't think that assembly would help there since C/C++ most probably optimizes that code as much as possible since it isn't some digital_write bloat or things like that. I could be wrong, maybe the compiler doesn't compile the code into a direct register write, but I don't know how to get the disassembly out of an already compiled C/C++ code.
Interesting - maybe you could share the code for wts combo. Anyways - compiler will optimize code and may use slower memory addressing instruction plus loops. If you write pixel code in asm properly - you can pair certain operations into one cycle - considering Xtensa has if Im not mistaken 7-stage pipeline. So it is not just bitbanging but direct register and memory addressing. You can probably squeeze all pixels of lcd into very small number of cycles doing inline repeated assembly - not looped. It is old Amiga/Atari technique
I don't know where that 580Mhz came from. As far as TR says - I2S is DMA and Espressif devs know better how all the implementations are working and if anything can be improved for better addressing and byte order sequencing at the output.
I2S needs some revising for its api improvements. As I see SDK is still very fresh.
Well, I don't mind actually accessing the IO registers of the I2S and DMA peripherals manually since I actually know AVR assembly and since I did some 6502 homebrew NES assembly. Or if at least I could know which address has which register so that I can directly write by C++ code. So, considering that, I'd like to try and implement this myself.

Now, since you mentioned this 7-stage pipeline, it could be that my compiler didn't want to accidentally cause an atomicity error by having a register be written before the whole pipeline sequence is finished. So maybe it could be possible according to what I see from what you're saying to make things 7x faster by using asm. Now, I don't know how inline-asm works and what the opcodes are for the Xtensa in Arduino IDE C++ compiler. It's all superweird when even the toolchain seems like a bloat :D

Also, I unrolled my loop manually. I don't know why it won't go faster. Well, at least I know I could try to use this ASM thing. I hope there's a single instruction that can do this

Code: Select all

*(gpio_thing)=permutation_array[pixel_byte];
kind of an addressing mode in a single cycle. I think there needs to be 3 such instructions to make this work.

https://www.youtube.com/watch?v=GdNBqktJJDY This is the video of the permutation thing.

https://youtu.be/G70CZLPjsXU?t=266 Here Bitluni talks about the I2S bus thing and the pixel jittering (which many of you might have seen being manifested as mixed bytes) and how he overclocked the APLL. 5:10 is when he mentions 580MHz.

User avatar
rudi ;-)
Posts: 1698
Joined: Fri Nov 13, 2015 3:25 pm

Re: Driving an 8-bit parallel 8080 bus using I2S

Postby rudi ;-) » Thu Sep 26, 2019 8:52 pm

Xavi92 wrote:
Mon Jun 03, 2019 5:06 pm
How can I configure APLL_CLK in order to get more precise clock output?
perhabs this helps:
see here
there

best wishes
rudi ;-)
-------------------------------------
love it, change it or leave it.
-------------------------------------
問候飛出去的朋友遍全球魯迪

Who is online

Users browsing this forum: No registered users and 120 guests